Catalogue Search | MBRL

Joint statement by CETAF, SPNHC and BHL on DATA within scientific publications: clarification of noncopyrightability

by Agosti, Donat , Rinaldo, Constance , Buschbom, Jutta in Academic publications , Automation , Biodiversity

2023

The EU and other states have made legislative efforts to clarify data mining in copyrightable works, but the situation remains obscure and confusing, especially in a globalised field where international legislation can contribute to opacity. The present paper aims at asserting a common position of three communities representing biodiversity sciences and data specialists on this issue and to propose common and best practice guidelines so that they become universally accepted rules. As scientific data users, we take the standpoint that scientific data are not copyrightable and, furthermore, they can be accessed, shared and reused freely. Thus, once legal access has been gained to copyrighted publications, the data within those scholarly publications can be considered to be open data that is freely extractable. This set of recommendations has been reached specifically for scientific use and societal benefits.

Journal Article

Share this book

Add to My Shelf

Joint statement on best practices for the citation of authorities of scientific names in taxonomy by CETAF, SPNHC and BHL

by Kvacek, Jiri , Buschbom, Jutta , Rinaldo, Constance in authorities , Authorship , Best practice

2022

This joint statement aims at encouraging all authors, publishers and editors involved in scientific publishing to give the bibliographic source of the authorities of taxonomic names. This initiative, written by members of the three communities, has been approved by the executive boards of the SPNHC (Society for the Preservation of Natural History Collections), CETAF (Consortium of European Taxonomic Facilities) and BHL (Biodiversity Heritage Library).

Journal Article

Share this book

Add to My Shelf

Permits, contracts and their terms for biodiversity specimens

by Droege, Gabi , Buschbom, Jutta , Zimkus, Breda in Access and Benefit Sharing , Anthropological collec , Biodiversity

2024

We present two different typologies of legal/contractual information in the context of natural history objects: the Biodiversity Permit/Contract Typology categorises permits and contracts, and the Typology of Legal/Contractual Terms for Biodiversity Specimens categorises the terms within permits and contracts. The Typologies have been developed under the EU-funded SYNTHESYS+ project with the participation of experts from outside the consortium. The document further addresses a possible technical integration of these typologies into the Distributed System of Scientific Collections (DiSSCo). The implementation in the DiSSCo data model is outlined and a concrete use case is presented to show how conditions, e.g. the Typology of Legal/Contractual Terms, can be introduced into the DiSSCo Electronic Loans and Visits System (ElViS). Finally, we give an outlook on the next steps to develop the typologies into a standard that supports compliance with legal and contractual obligations within the wider community of natural science collections.

Journal Article

Share this book

Add to My Shelf

Everywhere Everyone Everything All at Once: Integrating Data Infrastructures and Analysis Workflows for the Upscaling to Global Genetic Monitoring

by Pavlova, Alexandra , Buschbom, Jutta , Häffner, Eva in Agentic artificial intelligence , Biodiversity , Conservation

2025

Effective decision-making on biodiversity restoration would greatly benefit from baseline data on intraspecific genetic diversity, the ability to integrate it across species for each location, and efficient systems for monitoring changes of genetic diversity in response to management interventions and environmental dynamics. This requires large sets of well-curated information-rich FAIR (Findable, Accessible, Interoperable, Reusable) Digital Objects (FDOs; Schultes and Wittenburg 2019), implemented as, for instance, Digital Extended Specimens (DES), which are representing digitized field samples, their derived genomic data and associated information (Hardisty et al. 2022). These use-case-driven sets of structured (meta)data from many providers need to be merged, modified and further extended on demand. Existing workflows and work environments have to be redesigned to accelerate this process and achieve seamless integration to be able to scale to the needs of efficient and effective worldwide monitoring under the United Nations Kunming-Montreal Global Biodiversity Framework. Contributing to data and infrastructure development processes in support of global monitoring is also the global effort to generate high-quality reference genomes. The Earth BioGenome communities, including the European Reference Genome Atlas (ERGA) and Biodiversity Genomics Europe (BGE) communities, drive the development of standardization and harmonization for well-designed sampling and comprehensive FAIR and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) (meta)data (Buzan et al. 2025). Their goal is reproducible analytical pipelines that can be assembled on the fly for implementing sophisticated statistical approaches and algorithms. Such research-focused pipelines prepare the development and application of globally adopted workflow templates for the calculation of Essential Biodiversity Variables (EBVs). These templates are under development within the Group on Earth Observations Biodiversity Observation Network (GEO BON) (Lumbierres et al. 2025). Their output can be used for globally aligned and interpretable planning, monitoring, reporting and reviewing (see CBD/COP/16/L.33). Achieving global agreement on a (small) set of jointly used interoperable vocabularies and ontologies for (meta)data and machine-actionable operations is a communication, community-capacity development, and negotiation process that requires significant resources, engagement across sociocultural groups and geographies, patience and time. Ongoing processes towards these goals continue to be organized and promoted by, e.g., Biodiversity Information Standards (TDWG), the Global Biodiversity Information Facility (GBIF), GEO BON, and the Ocean Biodiversity Information System (OBIS), as well as large continental networks. We propose to bridge the gap before standardization and harmonization are in place and thereby facilitate and accelerate such efforts. Taking a pragmatic approach, our objective is to contribute a lightweight \"pocket\" dataspace that provides interoperability and data governance in connection with a digital platform for global genetic monitoring (Fig. 1). The pocket dataspace would allow existing platforms and tools to be connected easily through community-provided mappings between workflow element-specific formats, terms, data and operations stored in an open repository. This approach would enable users to take advantage of the core strengths of existing software products and the expertise of their associated communities, while quickly sharing data and their work between specialized solutions. At the same time, data would be FAIRified and CAREd-for, promoting attribution, transparency and responsibility. The functions of the pocket dataspace can be prerequisites for a transition to machine-actionable operations usable to agentic AI. As a general-purpose interlinking and translation component, the pocket dataspace aims to be the missing link between distributed, federated, non-standard-compliant and undocumented data, governance regimes, provenance logs and software output, and the need for transparent, well-governed and versatile conservation applications. One of these conservation applications will be the proposed platform for monitoring global genetic diversity. The platform will aggregate and visualize externally-linked population-genetic data and summary metrics that are the results of analysis pipelines enabled by, e.g., the pocket dataspace. Its aim is to provide visualization and support dataset and analysis management for local to global conservation efforts. It would store uploaded or linked genetic diversity metrics, perform selected automated analyses for continuously updated genetic diversity measures, as well as provide a starting point for user-designed analyses. The objective of our initial use case is to analyze three basic measures of population-genetic diversity based on genome-wide sequencing data as a first step towards operationalizing genetic monitoring at scale. Together, the pocket dataspace and monitoring platform for genetic diversity data have the goal to support a digital ecosystem that is foremost flexible, requiring low investments by users, and be able to quickly integrate both inter- and transdisciplinary data as well as existing powerful platforms and well-tested analytical pipelines and functionality.

Journal Article

Share this book

Add to My Shelf

Non-Copyrightability of Data in Scientific Publications: A Free-for-All or a Global Commons Partnership?

by Buschbom, Jutta , Bénichou, Laurence , Agosti, Donat in Biodiversity , Collaboration , Community

2024

Scientific publications provide a wealth of peer-reviewed, high-quality data that have been maintained over time, resulting in data persistence. As data repositories with rich provenance information, publications are indispensable sources for the integration and extension of networks of interlinked Findable, Accessible, Interoperable and Reusable (FAIR*1) bio/geodiversity data. In this way, they form pivotal fact- and knowledge-based contributions to applications that address the biodiversity crisis. The mobilization of data preserved in scientific publications is hindered, however, by distinct copyright legislation contexts for publications versus the data that they contain. Moreover, legislations concerning copyright continue to lack harmonization across jurisdictions, their interpretation is difficult, and the applicable legal national scope can be uncertain. We clarify and highlight that data within scientific publications are not copyrightable and thus can be openly and freely reused once legal access has been gained to their enclosing publication*2. To ensure that publications are as accessible as possible, a joint statement supported by the Biodiversity Heritage Library (BHL), the Consortium of European Taxonomic Facilities (CETAF) and the Society for the Preservation of Natural History Collections (SPNHC) (Benichou et al. 2023) recommends that authors and publishers make their works as accessible as possible by using a CC-BY license or preferably waive copyright (CC0) to their publications. Explicitly associating a public domain mark (PDM, e.g., the PDM from Creative Commons) to their published data, provides users with certainty about reusability. Yet, by setting works and bio/geodiversity data into the public domain, they do not become a free-for-all. We stress that data need to be associated with clear provenance information in alignment with scientific best practices and the scientific community's social norms. This includes providing detailed attribution to authors of cited works and reused data. Proposed data governance labels, for example, modeled after the Local Contexts labels developed by the international Indigenous Peoples and Local Communities (IPLC) community, would enable authors to communicate social and ethical contexts and applicable rules to data users for ensuring the sustainability of a shared environmental and data commons. Categories of Local Contexts labels that are of interest and applicable in the sciences are, for example, those that communicate (1) correct citation information and ask for attribution when knowledge and/or data are reused (Traditional Knowledge label (TK) Attribution), (2) an interest in being recognized and acknowledged due to a significant relationship with and responsibility for samples and data (Biocultural label (BC) Provenance), (3) the verification of the data and their context following a community protocol (TK Verified), (4) that non-commercial use (TK Non-Commercial/BC Non-Commercial) or (5) outreach activities (TK Outreach/BC Outreach) are generally permitted, while for other uses direct contact and engagement is required, or (6) an openness to collaboration and partnerships (TK Collaboration/BC Collaboration). There are concerns about the tension between the goal of achieving open data (e.g., Anonymous 2014) to enable and promote open science (e.g., UNESCO 2021) and, at the same time, imposing restrictions on these data in the form of governance labels. Furthermore, while the reference of the publication through which data are published, as well as more specifically bibliographic references cited for specific data within the publication, provide sufficient information for attribution and provenance, much more fine-grained and nuanced contextual information (e.g., in the form of metadata) is needed for assuring responsible reuse. Such context-providing metadata unlock the full potential of the data and enable their reusability. This can be done using machine-actionable markup tags in combination with human-readable labels that inform machines and human users about the semantics of the data as well as their ethical and social dimensions that govern responsible and sustainable reuse. Future work is needed to discover, differentiate and define the quality and scope of the appropriate contexts that are necessary and sufficient for being able to fully and responsibly reuse the data in different situations.

Journal Article

Share this book

Add to My Shelf

Bridging Language Barriers: Lessons from the French Translation of Latimer Core

by Buschbom, Jutta , Webbink, Kate , Saliba, Elie in Biodiversity , Collaboration , Cultural heritage and museology

2025

Internationalization of standards documentation is essential for pursuing global interoperability through the adoption of data standards that can be understood and competently applied throughout the world and across sociocultural contexts. Ratified by Biodiversity Information Standards (TDWG) in 2024, the Latimer Core (LtC) data standard focuses on the representation and discovery of natural science collections (Woodburn et al. 2022). The first complete translation of the LtC standard documentation*1 was published in June 2025 into French facilitating access to the standard for francophone communities. Translating biodiversity data standards such as Latimer Core into French presents a series of intertwined linguistic and technical challenges. The rigor of the translation effort depends on consistent terminology inside a given standard, and is achieved through careful reuse of formulas, such as 'recommended best practice', and the support of translation management tools such as Crowdin*2 to ensure uniformity. In addition to intra-standard consistency, the reuse of Darwin Core terms, which were translated prior to the ratification of Latimer Core (see Saliba et al. (2025)), requires caution when retranslating definitions, to ensure homogeneity across standards. Aside from linguistic elements discussed in part in Saliba et al. (2025), challenges like documenting translation work remain. The latter is largely informal, relying on collaborative platforms and personal notes, which underscore the potential need for more structured, reproducible workflows, especially in the context of multiple translators working together on a given language, notably those with strong regional variants. Similarly, no universal threshold has been defined for the minimum content needed to achieve a “functional” translation. An incremental approach, beginning with labels and definitions and progressively expanding to webpage elements such as headers and footers, non-normative complementary information and supplementary documentation seems to emerge as good practice. To address these issues and others, a recommendation document aimed at defining good practices and workflows for translating standards is being prepared. Finally, the Latimer Core maintenance group is experimenting with having a point of contact for translation to act as a bridge between translators, the standard maintenance group, and users. The point of contact can answer domain-specific questions, gather feedback from users and report errors to the relevant translator. Ensuring that TDWG standards are available in French is a good way to broaden participation among underrepresented scientific communities across Africa, the Caribbean, the Pacific and other francophone regions. Beyond opening doors for these audiences, the translation process itself offers a unique opportunity for contributors to deepen their understanding of a standard while making it, and subsequently connected standards, accessible to others. Far from being a mere technical task, translation is an intellectually rewarding and collaborative endeavor that amplifies the global relevance of TDWG’s work—ultimately enriching both the standards and the communities they serve.

Journal Article

Share this book

Add to My Shelf

No Pain No Gain: Standards mapping in Latimer Core development

by Buschbom, Jutta , Webbink, Kate , Norton, Ben in Biodiversity , exercise , Mapping

2023

Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning. As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC’s scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups. During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time. Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a ‘bag of terms’, with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process. In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group.

Journal Article

Share this book

Add to My Shelf

Building the Digital Extended Specimen: A case study of invasive European frog-bit (Hydrocharis morsus-ranae L.)

by Karim, Talia , Buschbom, Jutta , Hansen, Sara in Anura , Aquatic plants , Biodiversity

2021

The Extended Specimen was first described by Webster (2017). He defined a “constellation of specimen preparations and data types,” centered around an occurrence of an organism, which captures the breadth of empirical facts about an organism’s phenotype, genotype, and ecology in space and time. The Extended Specimen Network was embraced by the collections community in the Biodiversity Collections Network Extended Specimen Report (Lendemer et al. 2020) and the National Academies of Science, Engineering, and Medicine Future of Collections report (Lendemer et al. 2020, National Academies of Science, Engineering, and Medicine 2020). Several global discussions are underway to build a common definition of the Digital Extended Specimen (DES) and elucidate next steps in building the infrastructure to support Digital Extended Specimens and their network of associated data (including efforts among Distributed System of Scientific Collections (DiSSCo), Biodiversity Collections Network (BCoN), GBIF’s Alliance for Biodiversity Knowledge, TDWG's Task Group on Minimum Information about a Digital Specimen (MIDS), and others.) At the foundation of the DES is the occurrence of an organism in time and space, which is represented by physical specimens or observations serving as tokens of reality. Tokens are translated to digital records, which can be extended through a network of linkages between them and with derived and associated data, e.g. project methodologies, environmental conditions, habitat characteristics, and associated taxa. For digital records to be integrated with the larger network of Digital Extended Specimens, they must become FAIR digital objects that are Findable, Accessible, Interoperable, and Reusable (Wilkinson et al. 2016). By translating the Digital Extended Specimen concept to the local project scale, we provide opportunities to move beyond a theoretical understanding of the DES and towards a practical framework for its implementation. Here we present and discuss the power, limits, and questions in the implementation of the Digital Extended Specimen framework by applying it to the case study of an invasive aquatic plant in the Laurentian Great Lakes region. European frog-bit ( Hydrocharis morsus-ranae L.; EFB) is native to western and northern Eurasia and invasive in North America and India. Dense mats of EFB may hinder commercial and recreational use of waterways and decrease light, dissolved oxygen, and native species diversity. We describe a multi-taxonomic study that examined EFB along with associated plant species, animal species, and environmental characteristics (Monfils et al. 2021). The integration of such diverse types of empirical data is a necessary prerequisite for determining the factors associated with EFB establishment, the impacts of EFB on native coastal wetland ecosystems, and the development of suitable management regimes for the conservation of native biodiversity. Data gathered from this study are housed in a local database. In our database, we consider both physical specimens and recorded observations as tokens of concrete occurrences of EFB, which define the base units. These tokens are linked to their collection events, which provide environmental and sampling context, as well as co-occurrences of other taxa including plants, invertebrates, fish, anurans, reptiles, and birds. Digitally linked, these extensions of each digital representation of a collected token provide not only empirical evidence of an EFB occurrence, but also directly connect it with all additionally sampled, derived, and associated information. Through this network of extensions we gain a more holistic understanding of EFB’s species associations, habitats, and ecosystem impacts at the level of populations and communities. The application of the Digital Extended Specimen framework at the project level illustrates how the DES can be used in a real-world context and highlights challenges in translating the concept from a theoretical to a practical perspective.

Journal Article

Share this book

Add to My Shelf

Identification of single nucleotide polymorphisms in different Populus species

by Buschbom, Jutta , Fladung, Matthias in Agriculture , Biomedical and Life Sciences , Biosynthesis

2009

Partial sequences of six genes in 54 trees belonging to five different Populus species were analyzed for occurrence of single nucleotide polymorphisms (SNPs). Genes selected are involved in wood formation and quality (CAD), defence reactions (PPO), hormone biosynthesis (GA20ox), or transcription factors (CBF1, TB1, LFY). The number of polymorphisms identified for each gene fragment varied between different genes, and also between exon and intron regions. The six genes resolved the phylogenetic relationships between the five species to different degrees. Only PPO is resolving all five Populus species as monophyletic in all three phylogenetic approaches. CAD resolves all species with the exception of Populus tremula. For CBF1 and TB1, a monophyletic group consisting of P. tremula and P. tremuloides is resolved by some of the reconstruction approaches. Indels in three out of the six genes analyzed were detected in “consensus”-sequence comparisons between the Populus species. In the CAD-like and LFY genes, these were found only in introns but in the case of TB1 gene these were also found in coding regions. Sizes of the indels range from 1 up to 8 nucleotides. This study confirms the main split between section Leuce and a group-combining sections Aigeiros and Tacamahaca that is perfectly supported with 100% bootstrap supports and 1.0 posterior probability. The SNP markers developed as well as the indels identified can be used for differentiation of Populus species and characterization of hybrids.

Journal Article

Share this book

Add to My Shelf