Catalogue Search | MBRL

Open Information and Exceptions Policy of the Natural History Museum, London

by Livermore, Laurence , Scott, Ben , Smith, Vincent in Copyright , data management , Freedom of information

2024

There have been few, if any, open data and information management policies openly published from natural science collections. This paper contextualises the rationale for publishing the Open Information and Exceptions Policy of the Natural History Museum, London and provides the policy itself. The policy outlines how the Natural History Museum puts the principle of 'open by default' into practice; and includes sections on purpose and scope, relationship to relevant legislation (which always takes precedence over the policy), the categories of possible exceptions to open information release, what happens when exceptions are declared, relations to UK government information security classifications and definition of terms.

Journal Article

Share this book

Add to My Shelf

Costbook of the digitisation infrastructure of DiSSCo

by Livermore, Laurence , Hardy, Helen , Hardisty, Alex in Capital costs , Collections , cost effectiveness

2020

There has been little work to compare and understand the operating costs of digitisation using a standardised approach. This paper discusses a first attempt at gathering digitisation cost information from multiple institutions and analysing the data. This paper has been written: for other digitisation managers who want to breakdown and compare project costs; as a potential baseline for future digitisation projects; as a starting point for prioritising research and development to reduce digitisation costs.

Journal Article

Share this book

Add to My Shelf

Technical capacities of digitisation centres within ICEDIG participating institutions

by Livermore, Laurence , Cocks, Naomi , Smith, Vincent in Data collection , Digitisation Equipment , Digitization

2020

DiSSCo, the Distributed System of Scientific Collections, is seeking to centralise certain infrastructure and activities relating to the digitisation of natural science collections. Deciding what activities to distribute, what to centralise, and what geographic level of aggregation (e.g. regional, national or pan European) is most appropriate for each task, was one of the challenges set out within the EC-funded ICEDIG project. In this paper we present the results of a survey of several European collections to establish current digitisation capacity, strengths and skills associated with existing digitisation infrastructure. Our results indicate that most of the institutions surveyed are engaged in large-scale digitisation of collections and that this is usually being undertaken by dedicated teams of digitisers within each institution. Some cross institutional collaboration is happening, but this is still the exception for a variety of funder and practical reasons. These results inform future work that establishes a set of principles to determine how digitisation infrastructure might be most efficiently organised across European organisations in order to maximise progress on the digitisation of the estimated 1.5 billion specimens held within European natural science collections.

Journal Article

Share this book

Add to My Shelf

Systematic Design of a Natural Sciences Collections Digitisation Dashboard

by Addink, Wouter , Santos, Celia , Smith, Vincent in Access to information , biodi , Biodiversity

2024

This paper describes the design and build of a pilot Natural Sciences Collections Digitisation Dashboard (CDD). The CDD will become a key service for the Distributed System of Scientific Collections Research Infrastructure (DiSSCo) and aims to improve the discoverability of natural science collections (NSCs) held in European institutions, both digitised and undigitised. Furthermore, it will serve as a dynamic visual assessment tool for strategic decision-making, including the prioritisation of digitisation. The CDD pilot includes high-level information from nine European NSCs, covering the number of objects, taxonomic scope, storage type, chronostratigraphy (Earth Science Collections), geographical region and level of detail in digitisation. This information is structured through a standardised Collection Classification Scheme, which uses high-level categorisation to describe physical natural science collections.

Journal Article

Share this book

Add to My Shelf

Identification of provisional Centres of Excellence for digitisation of European natural science collections

by Livermore, Laurence , Dixey, Katherine , Hardy, Helen in Audiences , best practice , Collections

2020

Digitisation of natural science collections is fundamental to the vision for the Distributed System of Scientific Collections (DiSSCo), and given the low proportion of collections digitally accessible, it is proposed that ‘Centres of Excellence’ be developed to accelerate the creation of digital copies of original specimens. Within the ICEDIG project, a team of scientists from across the consortium explored the concept of Centres of Excellence and have constructed a toolset to help identify these centres to support the development of DiSSCo. This report documents this process and describes the toolset.

Journal Article

Share this book

Add to My Shelf

DiSSCo Prepare Project: Increasing the Implementation Readiness Levels of the European Research Infrastructure

by Addink, Wouter , Livermore, Laurence , Curral, Luís in Biodiversity , Botanical gardens , Consortia

2023

The Distributed System of Scientific Collections (DiSSCo) is a new world-class Research Infrastructure (RI) for Natural Science Collections. The DiSSCo RI aims to create a new business model for one European collection that digitally unifies all European natural science assets under common access, curation, policies and practices that ensure that all the data is easily Findable, Accessible, Interoperable and Reusable (FAIR principles). DiSSCo represents the largest ever formal agreement between natural history museums, botanic gardens and collection-holding institutions in the world. DiSSCo entered the European Roadmap for Research Infrastructures in 2018 and launched its main preparatory phase project (DiSSCo Prepare) in 2020. DiSSCo Prepare is the primary vehicle through which DiSSCo reaches the overall maturity necessary for its construction and eventual operation. DiSSCo Prepare raises DiSSCo’s implementation readiness level (IRL) across the five dimensions: technical, scientific, data, organisational and financial. Each dimension of implementation readiness is separately addressed by specific Work Packages (WP) with distinct targets, actions and tasks that will deliver DiSSCo’s Construction Masterplan. This comprehensive and integrated Masterplan will be the product of the outputs of all of its content related tasks and will be the project’s final output. It will serve as the blueprint for construction of the DiSSCo RI, including establishing it as a legal entity. DiSSCo Prepare builds on the successful completion of DiSSCo’s design study, ICEDIG and the outcomes of other DiSSCo-linked projects such as SYNTHESYS+ and MOBILISE. This paper is an abridged version of the original DiSSCo Prepare grant proposal. It contains the overarching scientific case for DiSSCo Prepare, alongside a description of our major activities.

Journal Article

Share this book

Add to My Shelf

Research Infrastructure Contact Zones: a framework and dataset to characterise the activities of major biodiversity informatics initiatives

by Addink, Wouter , Smith, Vincent , Miller, Joe in alignme , Biodiversity , Bioinformatics

2022

The landscape of biodiversity data infrastructures and organisations is complex and fragmented. Many occupy specialised niches representing narrow segments of the multidimensional biodiversity informatics space, while others operate across a broad front, but differ from others by data type(s) handled, their geographic scope and the life cycle phase(s) of the data they support. In an effort to characterise the various dimensions of the biodiversity informatics landscape, we developed a framework and dataset to survey these dimensions for ten organisations (DiSSCo, GBIF, iBOL, Catalogue of Life, iNaturalist, Biodiversity Heritage Library, GeoCASe, LifeWatch, eLTER ELIXIR), relative to both their current activities and long-term strategic ambitions. The survey assessed the contact between the infrastructure organisations by capturing the breadth of activities for each infrastructure across five categories (data, standards, software, hardware and policy), for nine types of data (specimens, collection descriptions, opportunistic observations, systematic observations, taxonomies, traits, geological data, molecular data and literature) and for seven phases of activity (creation, aggregation, access, annotation, interlinkage, analysis and synthesis). This generated a dataset of 6,300 verified observations, which have been scored and validated by leading members of each infrastructure organisation. The resulting data allow high-level questions about the overall biodiversity informatics landscape to be addressed, including the greatest gaps and contact between organisations.

Journal Article

Share this book

Add to My Shelf

Rethinking Collection Management Data Models

by Collier, Ben , Woodburn, Matt in biodiversity , Data models , herbaria

2022

The data modelling of physical natural history objects has never been trivial, and the need for greater interoperability and adherence to multiple standards and internal requirements has made the task more challenging than ever. The Natural History Museum’s internal RECODE (Rethinking Collections Data Ecosystems; see Dupont et al. 2022) programme has taken the approach of creating a data model to fit these internal and external requirements, rather than try and force an existing data model to work with our next generation collections management system (CMS) requirements. In this regard, community standards become vitally important, and existing and emerging standards and models like Spectrum, Darwin Core, Access to Biological Collection Data (ABCD) (Extended for Geosciences (EFG)), Latimer Core and The Conceptual Reference Model from the International Committee for Documentation (CIDOC CRM) have and will be used heavily to inform this work. The poster will provide a starting point for: publicly sharing and discussing the work that the RECODE programme has done; eliciting ideas that members of the community may have regarding its continuing improvement. We have concentrated on creating a backbone for the data model, from collecting, through the object curation to the scientific identification. This has yielded two significant outcomes: The Collection Object: Traditional CMS data models treat each specimen as a single record in the database. The RECODE model recognises that there are a number of different concepts that need their own entities: C ollected material: the specimens collected in the field are not always fully identified or separated into discrete items. Stored object : the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects. I dentifiable item : what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). The Collection Level Description (CLD): This is a construct to enable the attachment of descriptive and quantitative data to groups of collection objects, rather than individual collection object. There will always be a need for an inventory which represents the basic holdings, organisation and indexing of collections as well as a variety of use cases for grouping collection objects and attaching information at the group level. The Collection Object: Traditional CMS data models treat each specimen as a single record in the database. The RECODE model recognises that there are a number of different concepts that need their own entities: C ollected material: the specimens collected in the field are not always fully identified or separated into discrete items. Stored object : the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects. I dentifiable item : what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). C ollected material: the specimens collected in the field are not always fully identified or separated into discrete items. Stored object : the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects. I dentifiable item : what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). The Collection Level Description (CLD): This is a construct to enable the attachment of descriptive and quantitative data to groups of collection objects, rather than individual collection object. There will always be a need for an inventory which represents the basic holdings, organisation and indexing of collections as well as a variety of use cases for grouping collection objects and attaching information at the group level. The next challenge is to integrate the concepts more closely with each other to provide the best possible description of the collection and make it as shareable as possible. Some of the current challenges being addressed are: An object group may represent a heterogenous group of objects. There will be multiple parallel CLD schemes for different purposes. Different attributes and metrics will be relevant to different schemes. For some use cases, we need to be able to quantify relationships between an object group and its attributes as well as attaching metrics to the object group itself. We also need to be able to reflect relationships between object groups. An object group may represent a heterogenous group of objects. There will be multiple parallel CLD schemes for different purposes. Different attributes and metrics will be relevant to different schemes. For some use cases, we need to be able to quantify relationships between an object group and its attributes as well as attaching metrics to the object group itself. We also need to be able to reflect relationships between object groups. These challenges necessitate a data model that has a considerable degree of flexibility but enables rules and constraints to be introduced as appropriate for the different use cases. It is also important that, wherever possible, the model uses the same attributes as individual collection objects, to allow object groups to be implicitly linked to collection object records through common attributes as well as explicitly linked within the model. The aim of the conceptual model is to reflect these requirements.

Journal Article

Share this book

Add to My Shelf

No Pain No Gain: Standards mapping in Latimer Core development

by Buschbom, Jutta , Webbink, Kate , Norton, Ben in Biodiversity , exercise , Mapping

2023

Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning. As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC’s scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups. During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time. Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a ‘bag of terms’, with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process. In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group.

Journal Article

Share this book

Add to My Shelf

A Data Standard for Dynamic Collection Descriptions

by Droege, Gabriele , Webbink, Kate , Grant, Sharon in Biodiversity , Collections , Consortia

2021

The utopian vision is of a future where a digital representation of each object in our collections is accessible through the internet and sustainably linked to other digital resources. This is a long term goal however, and in the meantime there is an urgent need to share data about our collections at a higher level with a range of stakeholders (Woodburn et al. 2020). To sustainably achieve this, and to aggregate this information across all natural science collections, the data need to be standardised (Johnston and Robinson 2002). To this end, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Interest Group has developed a data standard for describing collections, which is approaching formal review for ratification as a new TDWG standard. It proposes 20 classes (Suppl. material 1) and over 100 properties that can be used to describe, categorise, quantify, link and track digital representations of natural science collections, from high-level approximations to detailed breakdowns depending on the purpose of a particular implementation. The wide range of use cases identified for representing collection description data means that a flexible approach to the standard and the underlying modelling concepts is essential. These are centered around the ‘ObjectGroup’ (Fig. 1), a class that may represent any group (of any size) of physical collection objects, which have one or more common characteristics. This generic definition of the ‘collection’ in ‘collection descriptions’ is an important factor in making the standard flexible enough to support the breadth of use cases. For any use case or implementation, only a subset of classes and properties within the standard are likely to be relevant. In some cases, this subset may have little overlap with those selected for other use cases. This additional need for flexibility means that very few classes and properties, representing the core concepts, are proposed to be mandatory. Metrics, facts and narratives are represented in a normalised structure using an extended MeasurementOrFact class, so that these can be user-defined rather than constrained to a set identified by the standard. Finally, rather than a rigid underlying data model as part of the normative standard, documentation will be developed to provide guidance on how the classes in the standard may be related and quantified according to relational, dimensional and graph-like models. So, in summary, the standard has, by design, been made flexible enough to be used in a number of different ways. The corresponding risk is that it could be used in ways that may not deliver what is needed in terms of outputs, manageability and interoperability with other resources of collection-level or object-level data. To mitigate this, it is key for any new implementer of the standard to establish how it should be used in that particular instance, and define any necessary constraints within the wider scope of the standard and model. This is the concept of the ‘collection description scheme,’ a profile that defines elements such as: which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. Various factors might influence these decisions, including the types of information that are relevant to the use case, whether quantitative metrics need to be captured and aggregated across collection descriptions, and how many resources can be dedicated to amassing and maintaining the data. This process has particular relevance to the Distributed System of Scientific Collections (DiSSCo) consortium, the design of which incorporates use cases for storing, interlinking and reporting on the collections of its member institutions. These include helping users of the European Loans and Visits System (ELViS) (Islam 2020) to discover specimens for physical and digital loans by providing descriptions and breakdowns of the collections of holding institutions, and monitoring digitisation progress across European collections through a dynamic Collections Digitisation Dashboard. In addition, DiSSCo will be part of a global collections data ecosystem requiring interoperation with other infrastructures such as the GBIF (Global Biodiversity Information Facility) Registry of Scientific Collections, the CETAF (Consortium of European Taxonomic Facilities) Registry of Collections and Index Herbariorum. In this presentation, we will introduce the draft standard and discuss the process of defining new collection description schemes using the standard and data model, and focus on DiSSCo requirements as examples of real-world collection descriptions use cases.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter