Catalogue Search | MBRL

The National Archives of the Netherlands and archiving government websites

by Posthumus, Antal in Government archives , National libraries

2022

Archiwum Narodowe Holandii i archiwizacja rządowych stron internetowych Archiwum narodowe Holandii, jako stała agencja rządowa i archiwum rządu centralnego (ministerstw i ich agencji), ma prawny obowiązek zabezpieczania akt rządowych. Rola archiwum nie polega na aktywnym tworzeniu kolekcji zarchiwizowanych stron internetowych poprzez ich samodzielne wybieranie i gromadzenie. Różni się tym od innych narodowych archiwów, bibliotek i innych (między-)narodowych instytucji dziedzictwa zajmujących się archiwizacją stron internetowych. Archiwum w swoich działaniach skupia się na doradzaniu twórcom dokumentacji – ministerstwom i ich agencjom – w kwestiach tworzenia i przekazywania rejestrów rządowych w formie zarchiwizowanych, publicznych stron internetowych. Jednym z przykładów tego typu wsparcia było wydanie w 2018 r. dobrze przyjętych wytycznych dotyczących archiwizacji stron internetowych. Zostały one także wykorzystane jako część wymagań europejskiego przetargu publicznego w 2021 r., dotyczącego utworzenia centralnej platformy mającej pozyskać około 1500 publicznych stron internetowych rządu centralnego Holandii. W artykule zostaną również przedstawione nasze doświadczenia i spostrzeżenia dotyczące integracji procesów pozyskiwania, przechowywania, zarządzania, zabezpieczania i zapewnienia dostępu do zarchiwizowanych publicznych stron internetowych holenderskiego rządu centralnego z istniejącą infrastrukturą oraz zarządzaniem danymi w repozytorium cyfrowym Archiwum narodowego Holandii (w skrócie w-depot). The national Archives of the netherlands, as a permanent government agency and official archive for the central government (ministries and their agencies), has the legal duty, laid down in the Archiefwet, to secure the future of government records. Within this context, our role does not involve actively forming a collection of archived websites through selecting and harvesting these ourselves. This is a key difference between us and other national archives, national libraries and other (inter-)national heritage institutions. guidelines and a central platform for archiving government websites. Such a mandate requires an environment in which the processes, in relation to one another, can take place in a controlled manner. A significant part of making it happen was the effort we’ve put (and continue to do so) into advising the producers of records – ministries and their agencies – as to how they should create, and eventually transfer, archived public websites that are a specific form of government records. One example of the type of support we offer was a very well received set of guidelines on archiving websites that we issued in 2018. Those guidelines were also used as part of the requirements in a public European tender (2021). The objective of the tender: implementation of a central harvesting platform to harvest approximately 1500 public websites of the Central government. This article will also present our experiences and insights into integrating the processes of ingestion, storage, management and preservation of and providing access to archived public websites of the Dutch Central government into the existing infrastructure and workflows of our trusted digital repository (e-depot in short).

Journal Article

Share this book

Add to My Shelf

Design of an Enhanced Web Archiving System for Preserving Content Integrity with Blockchain

by Hwang, Hyun Cheon , Park, Ji Su , Shon, Jin Gon in Archiving , Blockchain , Cryptography

2020

A Web archive system is a traditional subject for preserving web content for the future and the importance is getting more significant due to the explosive growth of web content. The reference model for an open archival information system (OAIS) has been advising guidance for a long-term archiving system and most organizations that archive web content follow this guidance. In addition, the web archive (WARC) ISO standard is for web content archiving. However, there is no way to secure content integrity, and it is hard to identify the original. Because of limitations, a web archive system has a weakness against the dispute of content integrity. In this paper, we proposed the blockchain linked (BCLinked) web archiving system, which uses blockchain technology and an extended WARC field to keep a web content integrity metadata into a blockchain. Furthermore, we designed the BCLinked web archiving system, and we confirmed the proposed system secures content integrity through the experiment.

Journal Article

Share this book

Add to My Shelf

Kako nastajajo spletni arhivi: tehnični vidiki zajemanja spletnih vsebin / How are web archives created? Technical aspects of web content capture

by Janko Klasinc in web archives , web archiving , web heritage

2025

Web archives are collections produced by libraries and other heritage institutions to permanently preserve online heritage. They often contain large amounts of material stored from the web through the use of web crawlers. From a usage perspective, they are often unpredictable, nontransparent and inconsistent data sources that contain numerous content gaps. In addition to the various social, legislative and institutional circumstances under which they are created, their specific characteristics are largely defined by the heterogeneous, ephemeral and fluid nature of the world wide web. Because they present numerous challenges to their users, it is important for them to be aware of the circumstances that influence the nature of web archives and, consequently, the opportunities and pitfalls of using archived data. To shed light on the background of these relatively poorly understood data sources, this paper, through a review of foundational and other relevant literature, describes primarily the technical aspects of web archives creation. It focuses on the fundamental characteristics of the world wide web in the context of preservation, different approaches to capturing web content, their limitations and the impact of these circumstances on the nature of web archives, which differ in many ways from more traditional and established data sources.

Journal Article

Share this book

Add to My Shelf

Web archives as research infrastructure for digital societies: the case study of Arquivo.pt

by Gomes, Daniel

2022

Archiwum internetu jako infrastruktura badawcza społeczeństwa cyfrowego: studium przypadku Arquivo.pt Ludzkość jest dominującym gatunkiem na Ziemi. nasza przewaga ma źródło w unikalnej zdolności organizowania się na dużą skalę dla osiągnięcia wspólnych celów. W społeczeństwie cyfrowym wszelka organizacja wymaga przekazywania informacji, a współcześnie jej większość jest publikowana wyłącznie online. Problem stanowi to, iż informacja online znika bardzo szybko, już po kilku miesiącach. Zależność ludzkości od informacji online jest bardzo duża i wciąż aktualna, a konsekwencje utraty perspektywy historycznej w odniesieniu do danych online nie zostały dotąd zbadane. Archiwa internetowe są cyfrowymi systemami przechowywania, które gromadzą, zachowują i udostępniają historyczne dane stron internetowych. Są one używane przez badaczy. Jednakże archiwa internetowe, aby służyć społeczeństwu cyfrowemu, powinny być także wykorzystywane przez szerszy krąg użytkowników. Arquivo.pt jest publicznym archiwum internetowym, uruchomionym w 2007 r., które umożliwia prowadzenie badań i dostęp do danych historycznych stron internetowych, zachowanych od lat dziewięćdziesiątych XX w. W artykule zaprezentowano portal Arquivo.pt jako studium przypadku dotyczące infrastruktury badawczej rozwijanej do obsługi szerokiego grona użytkowników na poziomie krajowym i międzynarodowym. Artykuł prezentuje najważniejsze wnioski mogące przysłużyć się powstawaniu i szybszemu rozwojowi innych inicjatyw archiwizacji Internetu. Opisuje także istniejące narzędzia i podejścia umożliwiające badanie historycznych zbiorów internetowych. Wreszcie, prezentuje wyzwania wiążące się z tworzeniem archiwów internetowych oraz propozycje działań w tym zakresie. Humans are the dominant species on Earth. Our advantage comes from our unique capacity of organising at large scale to reach common goals. In digital societies, organising requires communicating information and these days, most of it is published exclusively online. The problem is that online information disappears quickly, after a few months. Humanity’s dependence on online information is strong but still recent and the consequences of losing the historical perspective over online data are yet to be seen. Web archives are digital preservation systems that collect, store and provide access to historical web data. Scientific researchers have been using web archives. However, web archives should also be used by the wider public so that they may serve digital societies. Arquivo.pt is a public web archive started in 2007 that enables search and access to historical information preserved from the Web since the 1990s. This article presents Arquivo. pt as a case study for a research infrastructure that has been developed to serve wider communities at national and international levels. The article shares the main lessons learned so that other web archiving initiatives may arise and be developed at a faster pace. It describes the existing tools and activities which enable exploration of historical web-archived collections. Finally, it presents challenges related to creating web archives and proposes actions to address them.

Journal Article

Share this book

Add to My Shelf

After Microlog

by Campbell, Graeme , Lake, Michelle , McGoveran, Catherine in Academic libraries , Annual reports , Archives & records

2019

This article explores the history and development of Microlog, a key subscription-based source of Canadian municipal, provincial, territorial, and federal government publications from 1979 to 2018. Microlog was distributed as microfiche and accompanied by a printed and later online index. Microlog’s impact is discussed, alongside the recent termination of the product in 2018 and the implications of this change for long-term access to government information in Canada. A discussion of a selection of alternative continuing sources of Canadian government information follows, offering these tools not as direct successors to Microlog, but as important components of the remaining landscape of Canadian government information resources. The decentralization of government information collection and preservation efforts in Canada is discussed, in particular highlighting the lack of a clear successor resource to replace Microlog in the wake of its disappearance.

Journal Article

Share this book

Add to My Shelf

Avoiding spoilers: wiki time travel with Sheldon Cooper

by Nelson, Michael L , Van de Sompel, Herbert , Jones, Shawn M in Archives & records , Browsing , Computer mediated communication

2018

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—information that gives away key plot points before the intended time of the show’s writers. Because the wiki history is indexed by revisions, finding specific dates can be tedious, especially for pages with hundreds or thousands of edits. A wiki’s history interface does not permit browsing across historic pages without visiting current ones, thus revealing spoilers in the current page. Enterprising fans can resort to web archives and navigate there across wiki pages that were live prior to a specific episode date. In this paper, we explore the use of Memento with the Internet Archive as a means of avoiding spoilers in fan wikis. We conduct two experiments: one to determine the probability of encountering a spoiler when using Memento with the Internet Archive for a given wiki page, and a second to determine which date prior to an episode to choose when trying to avoid spoilers for that specific episode. Our results indicate that the Internet Archive is not safe for avoiding spoilers, and therefore we highlight the inherent capability of fan wikis to address the spoiler problem internally using existing, off-the-shelf technology. We use the spoiler use case to define and analyze different ways of discovering the best past version of a resource to avoid spoilers. We propose Memento as a structural solution to the problem, distinguishing it from prior content-based solutions to the spoiler problem. This research promotes the idea that content management systems can benefit from exposing their version information in the standardized Memento way used by other archives. We support the idea that there are use cases for which specific prior versions of web resources are invaluable.

Journal Article

Share this book

Add to My Shelf

SCIENTISTS RACE TO SAVE VITAL HEALTH DATABASES AMID TRUMP CHAOS

by Mallapaty, Smriti in Backups , Datasets , Demographics

2025

Journal Article

Share this book

Add to My Shelf

Web Archiving – between Past, Present, and Future

by Brügger, Niels in archived web material, strategy of archiving ‐ actively created subjective reconstruction , archiving of web, writing web history‐ understanding the Internet, new Internet forms , future of web archiving ‐ cooperation, web‐archiving institutions and Internet research communities

2011

This chapter contains sections titled: Web Archiving and Archiving Strategies A Brief History of Web Archiving The Archived Web Document Web Philology and the Use of Archived Web Material The Future of Web Archiving References

Book Chapter

Share this book

Add to My Shelf

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

by Grover, Claire , Jones, Shawn M. , Klein, Martin in Analysis , Biology and Life Sciences , Computer and Information Sciences

2016

Increasingly, scholarly articles contain URI references to \"web at large\" resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource's content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.

Journal Article

Share this book

Add to My Shelf

Averting the Digital Dark Age

by Milligan, Ian in History of engineering and technology , TECHNOLOGY & ENGINEERING , Technology & Engineering / History

2024

How the internet's memory infrastructure developed—averting a \"digital dark age\"—and introduced a golden age of historical memory.In early 1996, the web was ephemeral. But by 2001, the internet was forever. How did websites transform from having a brief life to becoming long-lasting? Drawing on archival material from the Internet Archive and exclusive interviews, Ian Milligan's Averting the Digital Dark Age explores how Western society evolved from fearing a digital dark age to building the robust digital memory we rely on today. By the mid-1990s, the specter of a \"digital dark age\" haunted libraries, portending a bleak future with no historical record that threatened cyber obsolescence, deletion, and apathy. People around the world worked to solve this impending problem. In San Francisco, technology entrepreneur Brewster Kahle launched his scrappy nonprofit, Internet Archive, filling tape drives with internet content. Elsewhere, in Washington, Canberra, Ottawa, and Stockholm, librarians developed innovative new programs to safeguard digital heritage. Cataloging worries among librarians, technologists, futurists, and writers from WWII onward, through early practitioners, to an extended case study of how September 11 prompted institutions to preserve thousands of digital artifacts related to the attacks, Averting the Digital Dark Age explores how the web gained a long-lasting memory. By understanding this history, we can equip our society to better grapple with future internet shifts.

eBook

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter