Catalogue Search | MBRL

OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system

by Senderov, Viktor , Agosti, Donat , Simov, Kiril in Algorithms , Analysis , Biodiversity

2018

Background The biodiversity domain, and in particular biological taxonomy, is moving in the direction of semantization of its research outputs. The present work introduces OpenBiodiv-O, the ontology that serves as the basis of the OpenBiodiv Knowledge Management System. Our intent is to provide an ontology that fills the gaps between ontologies for biodiversity resources, such as DarwinCore-based ontologies, and semantic publishing ontologies, such as the SPAR Ontologies. We bridge this gap by providing an ontology focusing on biological taxonomy. Results OpenBiodiv-O introduces classes, properties, and axioms in the domains of scholarly biodiversity publishing and biological taxonomy and aligns them with several important domain ontologies (FaBiO, DoCO, DwC, Darwin-SW, NOMEN, ENVO). By doing so, it bridges the ontological gap across scholarly biodiversity publishing and biological taxonomy and allows for the creation of a Linked Open Dataset (LOD) of biodiversity information (a biodiversity knowledge graph) and enables the creation of the OpenBiodiv Knowledge Management System. A key feature of the ontology is that it is an ontology of the scientific process of biological taxonomy and not of any particular state of knowledge. This feature allows it to express a multiplicity of scientific opinions. The resulting OpenBiodiv knowledge system may gain a high level of trust in the scientific community as it does not force a scientific opinion on its users (e.g. practicing taxonomists, library researchers, etc.), but rather provides the tools for experts to encode different views as science progresses. Conclusions OpenBiodiv-O provides a conceptual model of the structure of a biodiversity publication and the development of related taxonomic concepts. It also serves as the basis for the OpenBiodiv Knowledge Management System.

Journal Article

Share this book

Add to My Shelf

Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data

by Cellinese, Nico , Agosti, Donat , Deck, John in Biodiversity , Data processing , experts

2015

Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.

Journal Article

Share this book

Add to My Shelf

From taxonomic literature to cybertaxonomic content

by Agosti, Donat , Sautter, Guido , Penev, Lyubomir in Biodiversity , Biological Science Disciplines - methods , Biology

2012

Keywords: cybertaxonomy, open access publishing, semantic content, XML markup

Journal Article

Share this book

Add to My Shelf

Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples

by Agosti, Donat , Roberts, David , Smith, Vincent in Biodiversity , Biotechnology , botany

2010

The concept of semantic tagging and its potential for semantic enhancements to taxonomic papers is outlined and illustrated by four exemplar papers published in the present issue of ZooKeys. The four papers were created in different ways: (i) written in Microsoft Word and submitted as non-tagged manuscript (doi: 10.3897/zookeys.50.504); (ii) generated from Scratchpads and submitted as XML-tagged manuscripts (doi: 10.3897/zookeys.50.505 and doi: 10.3897/zookeys.50.506); (iii) generated from an author's database (doi: 10.3897/zookeys.50.485) and submitted as XML-tagged manuscript. XML tagging and semantic enhancements were implemented during the editorial process of ZooKeys using the Pensoft Mark Up Tool (PMT), specially designed for this purpose. The XML schema used was TaxPub, an extension to the Document Type Definitions (DTD) of the US National Library of Medicine Journal Archiving and Interchange Tag Suite (NLM). The following innovative methods of tagging, layout, publishing and disseminating the content were tested and implemented within the ZooKeys editorial workflow: (1) highly automated, fine-grained XML tagging based on TaxPub; (2) final XML output of the paper validated against the NLM DTD for archiving in PubMedCentral; (3) bibliographic metadata embedded in the PDF through XMP (Extensible Metadata Platform); (4) PDF uploaded after publication to the Biodiversity Heritage Library (BHL); (5) taxon treatments supplied through XML to Plazi; (6) semantically enhanced HTML version of the paper encompassing numerous internal and external links and linkouts, such as: (i) vizualisation of main tag elements within the text (e.g., taxon names, taxon treatments, localities, etc.); (ii) internal cross-linking between paper sections, citations, references, tables, and figures; (iii) mapping of localities listed in the whole paper or within separate taxon treatments; (v) taxon names autotagged, dynamically mapped and linked through the Pensoft Taxon Profile (PTP) to large international database services and indexers such as Global Biodiversity Information Facility (GBIF), National Center for Biotechnology Information (NCBI), Barcode of Life (BOLD), Encyclopedia of Life (EOL), ZooBank, Wikipedia, Wikispecies, Wikimedia, and others; (vi) GenBank accession numbers autotagged and linked to NCBI; (vii) external links of taxon names to references in PubMed, Google Scholar, Biodiversity Heritage Library and other sources. With the launching of the working example, ZooKeys becomes the first taxonomic journal to provide a complete XML-based editorial, publication and dissemination workflow implemented as a routine and cost-efficient practice. It is anticipated that XML-based workflow will also soon be implemented in botany through PhytoKeys, a forthcoming partner journal of ZooKeys. The semantic markup and enhancements are expected to greatly extend and accelerate the way taxonomic information is published, disseminated and used.

Journal Article

Share this book

Add to My Shelf

Integrating and visualizing primary data from prospective and legacy taxonomic literature

by King, David , Agosti, Donat , Patterson, David in Araneae , Biodiversity , Biodiversity informatics

2015

Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

Journal Article

Share this book

Add to My Shelf

Streamlining taxonomic publication: a working example with Scratchpads and ZooKeys

by Scott, Ben , Agosti, Donat , Roberts, David in Authorship , Identification and classification , Internet

2010

We describe a method to publish nomenclatural acts described in taxonomic websites (Scratchpads) that are formally registered through publication in a printed journal (ZooKeys). This method is fully compliant with the zoological nomenclatural code. Our approach supports manuscript creation (via a Scratchpad), electronic act registration (via ZooBank), online and print publication (in the journal ZooKeys) and simultaneous dissemination (ZooKeys and Scratchpads) for nomenclatorial acts including new species descriptions. The workflow supports the generation of manuscripts directly from a database and is illustrated by two sample papers published in the present issue.

Journal Article

Share this book

Add to My Shelf

XML schemas and mark-up practices of taxonomic literature

by King, David , Agosti, Donat , Morse, David in Analysis , data analysis , Data processing

2011

We review the three most widely used XML schemas used to mark-up taxonomic texts, TaxonX, TaxPub and taXMLit. These are described from the viewpoint of their development history, current status, implementation, and use cases. The concept of \"taxon treatment\" from the viewpoint of taxonomy mark-up into XML is discussed. TaxonX and taXMLit are primarily designed for legacy literature, the former being more lightweight and with a focus on recovery of taxon treatments, the latter providing a much more detailed set of tags to facilitate data extraction and analysis. TaxPub is an extension of the National Library of Medicine Document Type Definition (NLM DTD) for taxonomy focussed on layout and recovery and, as such, is best suited for mark-up of new publications and their archiving in PubMedCentral. All three schemas have their advantages and shortcomings and can be used for different purposes.

Journal Article

Share this book

Add to My Shelf

EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases

by Agosti, Donat , Faulwetter, Sarah , Teaca, Adrian in Biodiversity , biodiversity data , data arch

2016

The objective of Workpackage 4 of the European Marine Observation and Data network (EMODnet) is to fill spatial and temporal gaps in European marine species occurrence data availability by carrying out data archaeology and rescue activities. To this end, a workshop was organised in the Hellenic Center for Marine Research Crete (HCMR), Heraklion Crete, (8–9 June 2015) to assess possible mechanisms and guidelines to mobilise legacy biodiversity data. Workshop participants were data managers who actually implement data archaeology and rescue activities, as well as external experts in data mobilisation and data publication. In particular, current problems associated with manual extraction of occurrence data from legacy literature were reviewed, tools and mechanisms which could support a semi-automated process of data extraction were explored and the re-publication of the data, including incentives for data curators and scientists were reflected upon.

Journal Article

Share this book

Add to My Shelf

The Plazi Workflow: The PDF prison break for biodiversity data

by Agosti, Donat , Catapano, Terry , Egloff, Willi in animals , automation , Biodiversity

2019

The Swiss NGO Plazi (http://plazi.org) has developed an automated workflow for liberating data, including images and text, from new taxonomic publications issued in PDF format. This stepwise process extracts, article metadata, illustrations and their captions, bibliographic references, scientific names, named geographic entities such as coordinates and country names, collection codes, and finally, taxonomic treatments. These extracted data are enhanced and published in TreatmentBank (http://plazi.org) and deposited in Biodiversity Literature Repository (https:/biolitrepo.org) respectively, in which a Digital Object Identifier (DataCite DOI) is minted for articles as well as their contained figures and taxon treatments, each linked to each other in their metadata. This input is complemented by the import of Journal Article Tag Suite/Taxpub XML based publications from Pensoft publishers (e.g. Zookeys, Journal of Hymenoptera Research; https://pensoft.net/browse_journals) that are semantically enhanced during their journal production workflow. Upon import, materials citation are discovered and parsed, and the taxonomic treatments added to TreatmentBank where a persistent identifier is minted. From TreatmentBank data from taxonomic treatments, including occurence data from cited specimens, are submitted to GBIF (http://gbif.org), or are accessible via API. Treatments and material citations from more than 26,200 articles have been registered. The articles can be found on GBIF using the Digital Object Identifier in the search field. Plazi, together with Pensoft Publishers, has processed over 26,000 articles containing more than 284,000 taxonomic treatments, 190,000 images, 50,000 georeferenced materials citations, together comprising an estimated 100 million facts. Through the support of the Arcadia Fund (https://www.arcadiafund.org.uk/) Plazi's processing is expanding to cover a sufficient number of journals to liberate the data of over 50% of the new described animal species annually. This will complement an existing service provided to the Muséum National d’Histoire Naturelle, Paris, to convert the European Journal of Taxonomy and their other journals (http://sciencepress.mnhn.fr/en/periodiques/adansonia/40/1) to JATS/TaxPub (https://www.ncbi.nlm.nih.gov/books/NBK47081), as well as an increasing portfolio of journals published in JATS/TaxPub by Pensoft Ltd.

Journal Article

Share this book

Add to My Shelf

The Open Biodiversity Knowledge Management (eco-)System: Tools and Services for Extraction, Mobilization, Handling and Re-use of Data from the Published Literature

by Agosti, Donat , Senderov, Viktor , Sautter, Guido in Biodiversity , computer software , ecosystems

2018

The Open Biodiversity Knowledge Management System (OBKMS) is an end-to-end, eXtensible Markup Language (XML)- and Linked Open Data (LOD)-based ecosystem of tools and services that encompasses the entire process of authoring, submission, review, publication, dissemination, and archiving of biodiversity literature, as well as the text mining of published biodiversity literature (Fig. 1). These capabilities lead to the creation of interoperable, computable, and reusable biodiversity data with provenance linking facts to publications. OBKMS is the result of a joint endeavour by Plazi and Pensoft lasting many years. The system was developed with the support of several biodiversity informatics projects - initially (Virtual Biodiversity Research and Access Network for Taxonomy) ViBRANT, and then followed by pro-iBiosphere, European Biodiversity Observation Network (EU BON), and Biosystematics, informatics and genomics of the big 4 insect groups (BIG4). The system includes the following key components: ARPHA Journal Publishing Platform: a journal publishing platform based on the TaxPub XML extension for National Library of Medicine (NLM)’s Journal Publishing Document Type Definition (DTD) (Version 3.0). Its advanced ARPHA-BioDiv component deals with integrated biodiversity data and narrative publishing (Penev et al. 2017). GoldenGATE Imagine: an environment for marking up, enhancing, and extracting text and data from PDF files, supporting the TaxonX XML schema. It has specific enhancements for articles containing descriptions of taxa (\"taxonomic treatments\") in the field of biological systematics, but its core features may be used for general purposes as well. Biodiversity Literature repository (BLR): a public repository hosted at Zenodo (CERN) for published articles (PDF and XML) and images extracted from articles. Ocellus/Zenodeo: a search interface for the images stored at BLR. TreatmentBank: an XML-based repository for taxonomic treatments and data therein extracted from literature. The OpenBiodiv knowledge graph: a biodiversity knowledge graph built according to the Linked Open Data (LOD) principles. Uses the RDF data model, the SPARQL Protocol and RDF Query Language (SPARQL) query language, is open to the public, and is powered by the OpenBiodiv-O ontology (Senderov et al. 2018). OpenBiodiv portal: Semantic search and browser for the biodiversity knowledge graph. Multiple semantic apps packaging specific views of the biodiviersity knowledge graph. Supporting tools: Pensoft Markup Tool (PMT) ARPHA Writing Tool (AWT) ReFindit R libraries for working with RDF and for converting XML to RDF (ropenbio, RDF4R). Plazi RDF converter, web services and APIs. ARPHA Journal Publishing Platform: a journal publishing platform based on the TaxPub XML extension for National Library of Medicine (NLM)’s Journal Publishing Document Type Definition (DTD) (Version 3.0). Its advanced ARPHA-BioDiv component deals with integrated biodiversity data and narrative publishing (Penev et al. 2017). GoldenGATE Imagine: an environment for marking up, enhancing, and extracting text and data from PDF files, supporting the TaxonX XML schema. It has specific enhancements for articles containing descriptions of taxa (\"taxonomic treatments\") in the field of biological systematics, but its core features may be used for general purposes as well. Biodiversity Literature repository (BLR): a public repository hosted at Zenodo (CERN) for published articles (PDF and XML) and images extracted from articles. Ocellus/Zenodeo: a search interface for the images stored at BLR. TreatmentBank: an XML-based repository for taxonomic treatments and data therein extracted from literature. The OpenBiodiv knowledge graph: a biodiversity knowledge graph built according to the Linked Open Data (LOD) principles. Uses the RDF data model, the SPARQL Protocol and RDF Query Language (SPARQL) query language, is open to the public, and is powered by the OpenBiodiv-O ontology (Senderov et al. 2018). OpenBiodiv portal: Semantic search and browser for the biodiversity knowledge graph. Multiple semantic apps packaging specific views of the biodiviersity knowledge graph. Semantic search and browser for the biodiversity knowledge graph. Multiple semantic apps packaging specific views of the biodiviersity knowledge graph. Supporting tools: Pensoft Markup Tool (PMT) ARPHA Writing Tool (AWT) ReFindit R libraries for working with RDF and for converting XML to RDF (ropenbio, RDF4R). Plazi RDF converter, web services and APIs. Pensoft Markup Tool (PMT) ARPHA Writing Tool (AWT) ReFindit R libraries for working with RDF and for converting XML to RDF (ropenbio, RDF4R). Plazi RDF converter, web services and APIs. As part of OBKMS, Plazi and Pensoft offer the following services beyond supplying the software toolkit: Digitization through imaging and text capture of paper-based or digitally born (PDF) legacy literature. XML markup of both legacy and newly published literature (journals and books). Data extraction and markup of taxonomic names, literature references, taxonomic treatments and organism occurrence records. Export and storage of text, images, and structured data in data repositories. Linking and semantic enhancement of text and data, bibliographic references, taxonomic treatments, illustrations, organism occurrences and organism traits. Re-packaging of extracted information into new, user-demanded outputs via semantic apps at the OpenBiodiv portal. Re-publishing of legacy literature (e.g., Flora, Fauna, and Mycota series, important biodiversity monographs, etc.). Semantic open access publishing (including data publishing) of journal and books. Integration of biodiversity information from legacy and newly published literature into interoperable biodiversity repositories and platforms (Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EOL), Species-ID, Plazi, Wikidata, and others). Digitization through imaging and text capture of paper-based or digitally born (PDF) legacy literature. XML markup of both legacy and newly published literature (journals and books). Data extraction and markup of taxonomic names, literature references, taxonomic treatments and organism occurrence records. Export and storage of text, images, and structured data in data repositories. Linking and semantic enhancement of text and data, bibliographic references, taxonomic treatments, illustrations, organism occurrences and organism traits. Re-packaging of extracted information into new, user-demanded outputs via semantic apps at the OpenBiodiv portal. Re-publishing of legacy literature (e.g., Flora, Fauna, and Mycota series, important biodiversity monographs, etc.). Semantic open access publishing (including data publishing) of journal and books. Integration of biodiversity information from legacy and newly published literature into interoperable biodiversity repositories and platforms (Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EOL), Species-ID, Plazi, Wikidata, and others). In this presentation we make the case for why OpenBiodiv is an essential tool for advancing biodiversity science. Our argument is that through OpenBiodiv, biodiversity science makes a step towards the ideals of open science (Senderov and Penev 2016). Furthermore, by linking data from various silos, OpenBiodiv allows for the discovery of hidden facts. A particular example of how OpenBiodiv can advance biodiversity science is demonstrated by the OpenBiodiv's solution to \"taxonomic anarchy\" (Garnett and Christidis 2017). \"Taxonomic anarchy\" is a term coined by Garnett and Christidis to denote the instability of taxonomic names as symbols for taxonomic meaning. They propose an \"authoritarian\" top-down approach to stablize the naming of species. OpenBiodiv, on the other hand, relies on taxonomic concepts as integrative units and therefore integration can occur through alignment of taxonomic concepts via Region Connection Calculus (RCC-5) (Franz and Peet 2009). The alignment is \"democratically\" created by the users of system but no consensus is forced and \"anarchy\" is avoided by using unambiguous taxonomic concept labels (Franz et al. 2016) in addition to Linnean names.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter