Catalogue Search | MBRL

The ParlaMint corpora of parliamentary proceedings

by Barkarson, Starkaður , Osenova, Petya , Pančur, Andrej in Communication , Communications technology , Computer science

2023

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.

Journal Article

Share this book

Add to My Shelf

The data-driven Bulgarian WordNet: BTBWN

by Simov, Kiril , Osenova, Petya in Bulgarian language , Bulgarian WordNet , Computerized corpora

2018

The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both — syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval.

Journal Article

Share this book

Add to My Shelf

OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science

by Senderov, Viktor , Simov, Kiril , Zhelezov, Georgi in Biodiversity , biodiversity informatics , Cognition & reasoning

2019

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.

Journal Article

Share this book

Add to My Shelf

OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system

by Senderov, Viktor , Agosti, Donat , Simov, Kiril in Algorithms , Analysis , Biodiversity

2018

Background The biodiversity domain, and in particular biological taxonomy, is moving in the direction of semantization of its research outputs. The present work introduces OpenBiodiv-O, the ontology that serves as the basis of the OpenBiodiv Knowledge Management System. Our intent is to provide an ontology that fills the gaps between ontologies for biodiversity resources, such as DarwinCore-based ontologies, and semantic publishing ontologies, such as the SPAR Ontologies. We bridge this gap by providing an ontology focusing on biological taxonomy. Results OpenBiodiv-O introduces classes, properties, and axioms in the domains of scholarly biodiversity publishing and biological taxonomy and aligns them with several important domain ontologies (FaBiO, DoCO, DwC, Darwin-SW, NOMEN, ENVO). By doing so, it bridges the ontological gap across scholarly biodiversity publishing and biological taxonomy and allows for the creation of a Linked Open Dataset (LOD) of biodiversity information (a biodiversity knowledge graph) and enables the creation of the OpenBiodiv Knowledge Management System. A key feature of the ontology is that it is an ontology of the scientific process of biological taxonomy and not of any particular state of knowledge. This feature allows it to express a multiplicity of scientific opinions. The resulting OpenBiodiv knowledge system may gain a high level of trust in the scientific community as it does not force a scientific opinion on its users (e.g. practicing taxonomists, library researchers, etc.), but rather provides the tools for experts to encode different views as science progresses. Conclusions OpenBiodiv-O provides a conceptual model of the structure of a biodiversity publication and the development of related taxonomic concepts. It also serves as the basis for the OpenBiodiv Knowledge Management System.

Journal Article

Share this book

Add to My Shelf

Special Thematic Section on Semantic Models for Natural Language Processing (Preface)

by Simov, Kiril , Osenova, Petya in Natural language processing , Semantics

2018

With the availability of large language data online, cross-linked lexical resources (such as BabelNet, Predicate Matrix and UBY) and semantically annotated corpora (SemCor, OntoNotes, etc.), more and more applications in Natural Language Processing (NLP) have started to exploit various semantic models. The semantic models have been created on the base of LSA, clustering, word embeddings, deep learning, neural networks, etc., and abstract logical forms, such as Minimal Recursion Semantics (MRS) or Meaning Representation (AMR), etc. Additionally, the Linguistic Linked Open Data Cloud has been initiated (LLOD Cloud) which interlinks linguistic data for improving the tasks of NLP. This cloud has been expanding enormously for the last four-five years. It includes corpora, lexicons, thesauri, knowledge bases of various kinds, organized around appropriate ontologies, such as LEMON. The semantic models behind the data organization as well as the representation of the semantic resources themselves are a challenge to the NLP community. The NLP applications that extensively rely on the above discussed models include Machine Translation, Information Extraction, Question Answering, Text Simplification, etc.

Journal Article

Share this book

Add to My Shelf

A Reservoir Computing Approach to Word Sense Disambiguation

by Koprinkova-Hristova, Petia , Popov, Alexander , Simov, Kiril in Accuracy , Artificial Intelligence , Computation

2023

Reservoir computing (RC) has emerged as an alternative approach for the development of fast trainable recurrent neural networks (RNNs). It is considered to be biologically plausible due to the similarity between randomly designed artificial reservoir structures and cortical structures in the brain. The paper continues our previous research on the application of a member of the family of RC approaches—the echo state network (ESN)—to the natural language processing (NLP) task of Word Sense Disambiguation (WSD). A novel deep bi-directional ESN (DBiESN) structure is proposed, as well as a novel approach for exploiting reservoirs’ steady states. The models also make use of ESN-enhanced word embeddings. The paper demonstrates that our DBiESN approach offers a good alternative to previously tested BiESN models in the context of the word sense disambiguation task having smaller number of trainable parameters. Although our DBiESN-based model achieves similar accuracy to other popular RNN architectures, we could not outperform the state of the art. However, due to the smaller number of trainable parameters in the reservoir models, in contrast to fully trainable RNNs, it is to be expected that they would have better generalization properties as well as higher potential to increase their accuracy, which should justify further exploration of such architectures.

Journal Article

Share this book

Add to My Shelf

Syntactic-Semantic Treebank for Domain Ontology Creation

by Simov, Kiril , Osenova, Petya in Computerized corpora , Dictionaries , Language and Literature Studies

2011

This paper focuses on the creation of a domain treebank for the purposes of compiling a domain ontology. The domain treebank is viewed as a suitable resource for extracting of semantic relations from syntactic structures. First, the steps for ontology building are considered. Then, the processing over glossaries and standards is described with regard to their syntactic annotation. The utility of deriving semantic knowledge from the Treebank is also illustrated via the basic phrases. The idea is that the domain knowledge is represented in the domain data, but via treebanking more linguistic patterns can be extracted, which to be mapped to concepts and relations in a domain ontology.

Journal Article

Share this book

Add to My Shelf

OpenBiodiv-O Ontology: Bridging the Gap Between Biodiversity Data and Biodiversity Publishing

by Georgiev, Teodor , Penev, Lyubomir , Senderov, Viktor in Biodiversity , data collection , information management

2019

Communication of research findings is the last and arguably the most influential step of the scientific process. This is especially true for biodiversity science, in which new species descriptions and introduction of new taxonomic names happens through publication, as governed by the International Codes. Despite the strict rules for naming new taxa and revising existing taxonomic nomenclatures within scholarly literature, there is no system for keeping track of these changes and information often remains locked within the text of thousands of scattered journal articles. This talk presents OpenBiodiv-O, the first ontology which conceptually models the biodiversity publishing domain and through its application in the semantic graph database OpenBiodiv contributes to knowledge management of this domain. In combination with already existing ontologies for biodiversity and publishing (e.g. DarwinCore-based ontologies, SPAR ontologies), resource types introduced by OpenBiodiv-O help to create a link between these two domains. The ontology models the general structure of a research article, including sections specific to taxonomic articles, such as the treatment section, as well as other conceptual entities from taxonomy, like scientific names and taxonomic concepts. Thus, OpenBiodiv-O links scientific names to the corresponding article section in which they are mentioned via the class Taxonomic Name Usage and helps to discover hidden relationships between names. In addition, OpenBiodiv-O models the article metadata, such as the author names, affiliations and unique identifiers. The orcid class from the recently introduced Datacite ontology within OpenBiodiv-O models the ORCID of authors and will enable the future disambiguation of authors and linking with other platforms using ORCID. OpenBiodiv-O has been applied to the biodiversity knowledge graph OpenBiodiv, which is based on a Linked Open Dataset, created from Pensoft's journal articles and Plazi's treatments. Publishing of semantically enhanced scholarly literature as XML enables the conversion of semi-structured narrative into connected Resource Description Framework (RDF) statements. The ontology serves as a skeleton for the transformation of more than 729 million statements into a Linked Open Dataset. Reusing of existing ontologies within OpenBiodiv-O helps to establish a link between OpenBiodiv-O and other ontologies and facilitates federated querying between OpenBiodiv and other knowledge graphs. The application of OpenBiodiv-O towards a working solution for the biodiversity publishing domain demonstrates the potential of ontology modelling for data organisation and management.

Journal Article

Share this book

Add to My Shelf

New Applications of “Ontology-to-Text Relation” Strategy for Bulgarian Language

by Simov, Kiril , Staykova, Kamenka , Osenova, Petya in annotation grammars , Annotations , Bulgaria

2012

The paper presents new applications of the Ontology-to-Text Relation Strategy to Bulgarian Iconographic Domain. First the strategy itself is discussed within the triple ontology-terminological lexicon-annotation grammars, then - the related works. Also, the specifics of the semantic annotation and evaluation over iconographic data are presented. A family of domain ontologies over the iconographic domain are created and used. The evaluation against a gold standard shows that this strategy is good enough for more precise, but shallow results, and can be supported further by deep parsing techniques.

Journal Article

Share this book

Add to My Shelf

OpenBiodiv Poster: an Implementation of a Semantic System Running on top of the Biodiversity Knowledge Graph

by Senderov, Viktor , Agosti, Donat , Simov, Kiril in automation , Biodiversity , data collection

2017

We presentOpenBiodiv- an implementation of the Open Biodiversity Knowledge Management System. The need for an integrated information system serving the needs of the biodiversity community can be dated at least as far back as the sanctioning of theBouchout declarationin 2007. The Bouchout declaration proposes to make biodiversity knowledge freely available as Linked Open Data (LOD)*1. At TDWG 2016 (Fig.1) we presented the prototype of the system - then called Open Biodiversity Knolwedge Management System (OBKMS) (Senderov et al. 2016). The specification and design of OpenBiodiv was then outlined in more detail bySenderov and Penev (2016). In this poster, we describe the pilot implementation. We believe OpenBiodiv is possibly the first pilot-stage implementation of a semantic system running on top of a biodiversity knowledge graph. OpenBiodiv has several components: OpenBiodiv ontology: A general data model supporting the extraction of biodiversity knowledge from taxonomic articles or from databases such as GBIF. The ontology (in preparation, Journal of Biomedical Semantics, available on GitHub) incorporates several pre-existing models: Darwin-SW (Baskauf and Webb 2016), SPAR (Peroni 2014), Treatment Ontology, and several others. It defines classes, properties, and rules supporting the interlinking of these disparate ontologies to create a LOD biodiversity knowledge graph. A new addition is the Taxonomic Name Usage class, accompanied by a Vocabulary of Taxonomic Statuses (created via an analysis of 4, 000 Pensoft articles) enabling for the automated inference of the taxonomic status of Latinized scientific names. The ontology supports multiple backbone taxonomies via the introduction of a Taxon Concept class (equivalent to DarwinCore Taxon) and Taxon Concept Labels as a subclass of biological name. The Biodiversity Knowledge Graph: A LOD dataset of information extracted from taxonomic literature and databases. To date, this resource has realized part of what was proposed during thepro-iBiosphereproject and later discussed byPage (2016). Its main resources are articles, sub-article componets (tables, figures, treatents, references), author names, institution names, geographical locations, biological names, taxon concepts, and occurrences. Authors have been disambiguated via their affiliation with the use of fuzzy-logic based on theGraphDB Lucene connector. The graph interlinks: (1) Prospectively published literature viaPensoft Publishers.(2) Legacy literature viaPlazi. (3) Well-known resources such as geographical places or institutions viaDBPedia.(4) GBIF's backbone taxonomy as a default but not the preferential hierarchy of taxon concepts. (5)OpenBiodivid's with nomenclator id's (e.g.ZooBank)whenever possible. Names form two networks in the graph: (1) A directed-acyclical graph (DAG) of supercedence that can be followed to the corresponding sinks to infer the currently applicable scientific name for a given taxon. (2) A network of bi-directional relations indicating the relatedness of names. These names may be compared to the related names inferred on the basis of distributional semantics (Nguyen et al. 2017). ropenbio: An R package for RDF*2-ization of biodiversity information resources according to the OpenBiodiv ontology. We intend to submit this to the rOpenSci project. While many of its high-level functions are specific to OpenBiodiv, the low-level functions, and its RDF-ization framework can be used for any R-based RDF-ization effort. OpenBiodiv.net: A front-end of the system allowing users to run low-level SPARQL queries as well to use an extensible set of semantic apps running on top of a biodiversity knowledge graph.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter