Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
19
result(s) for
"Simov, Kiril"
Sort by:
The ParlaMint corpora of parliamentary proceedings
by
Barkarson, Starkaður
,
Osenova, Petya
,
Pančur, Andrej
in
Communication
,
Communications technology
,
Computer science
2023
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
Journal Article
The data-driven Bulgarian WordNet: BTBWN
2018
The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both — syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval.
Journal Article
OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science
by
Senderov, Viktor
,
Simov, Kiril
,
Zhelezov, Georgi
in
Biodiversity
,
biodiversity informatics
,
Cognition & reasoning
2019
Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.
Journal Article
OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system
2018
Background
The biodiversity domain, and in particular biological taxonomy, is moving in the direction of semantization of its research outputs. The present work introduces OpenBiodiv-O, the ontology that serves as the basis of the OpenBiodiv Knowledge Management System. Our intent is to provide an ontology that fills the gaps between ontologies for biodiversity resources, such as DarwinCore-based ontologies, and semantic publishing ontologies, such as the SPAR Ontologies. We bridge this gap by providing an ontology focusing on biological taxonomy.
Results
OpenBiodiv-O introduces classes, properties, and axioms in the domains of scholarly biodiversity publishing and biological taxonomy and aligns them with several important domain ontologies (FaBiO, DoCO, DwC, Darwin-SW, NOMEN, ENVO). By doing so, it bridges the ontological gap across scholarly biodiversity publishing and biological taxonomy and allows for the creation of a Linked Open Dataset (LOD) of biodiversity information (a biodiversity knowledge graph) and enables the creation of the OpenBiodiv Knowledge Management System.
A key feature of the ontology is that it is an ontology of the scientific process of biological taxonomy and not of any particular state of knowledge. This feature allows it to express a multiplicity of scientific opinions. The resulting OpenBiodiv knowledge system may gain a high level of trust in the scientific community as it does not force a scientific opinion on its users (e.g. practicing taxonomists, library researchers, etc.), but rather provides the tools for experts to encode different views as science progresses.
Conclusions
OpenBiodiv-O provides a conceptual model of the structure of a biodiversity publication and the development of related taxonomic concepts. It also serves as the basis for the OpenBiodiv Knowledge Management System.
Journal Article
Special Thematic Section on Semantic Models for Natural Language Processing (Preface)
2018
With the availability of large language data online, cross-linked lexical resources (such as BabelNet, Predicate Matrix and UBY) and semantically annotated corpora (SemCor, OntoNotes, etc.), more and more applications in Natural Language Processing (NLP) have started to exploit various semantic models. The semantic models have been created on the base of LSA, clustering, word embeddings, deep learning, neural networks, etc., and abstract logical forms, such as Minimal Recursion Semantics (MRS) or Meaning Representation (AMR), etc.
Additionally, the Linguistic Linked Open Data Cloud has been initiated (LLOD Cloud) which interlinks linguistic data for improving the tasks of NLP. This cloud has been expanding enormously for the last four-five years. It includes corpora, lexicons, thesauri, knowledge bases of various kinds, organized around appropriate ontologies, such as LEMON. The semantic models behind the data organization as well as the representation of the semantic resources themselves are a challenge to the NLP community.
The NLP applications that extensively rely on the above discussed models include Machine Translation, Information Extraction, Question Answering, Text Simplification, etc.
Journal Article
A Reservoir Computing Approach to Word Sense Disambiguation
by
Koprinkova-Hristova, Petia
,
Popov, Alexander
,
Simov, Kiril
in
Accuracy
,
Artificial Intelligence
,
Computation
2023
Reservoir computing (RC) has emerged as an alternative approach for the development of fast trainable recurrent neural networks (RNNs). It is considered to be biologically plausible due to the similarity between randomly designed artificial reservoir structures and cortical structures in the brain. The paper continues our previous research on the application of a member of the family of RC approaches—the echo state network (ESN)—to the natural language processing (NLP) task of Word Sense Disambiguation (WSD). A novel deep bi-directional ESN (DBiESN) structure is proposed, as well as a novel approach for exploiting reservoirs’ steady states. The models also make use of ESN-enhanced word embeddings. The paper demonstrates that our DBiESN approach offers a good alternative to previously tested BiESN models in the context of the word sense disambiguation task having smaller number of trainable parameters. Although our DBiESN-based model achieves similar accuracy to other popular RNN architectures, we could not outperform the state of the art. However, due to the smaller number of trainable parameters in the reservoir models, in contrast to fully trainable RNNs, it is to be expected that they would have better generalization properties as well as higher potential to increase their accuracy, which should justify further exploration of such architectures.
Journal Article
Syntactic-Semantic Treebank for Domain Ontology Creation
by
Simov, Kiril
,
Osenova, Petya
in
Computerized corpora
,
Dictionaries
,
Language and Literature Studies
2011
This paper focuses on the creation of a domain treebank for the purposes of compiling a domain ontology. The domain treebank is viewed as a suitable resource for extracting of semantic relations from syntactic structures. First, the steps for ontology building are considered. Then, the processing over glossaries and standards is described with regard to their syntactic annotation. The utility of deriving semantic knowledge from the Treebank is also illustrated via the basic phrases. The idea is that the domain knowledge is represented in the domain data, but via treebanking more linguistic patterns can be extracted, which to be mapped to concepts and relations in a domain ontology.
Journal Article
OpenBiodiv-O Ontology: Bridging the Gap Between Biodiversity Data and Biodiversity Publishing
by
Georgiev, Teodor
,
Penev, Lyubomir
,
Senderov, Viktor
in
Biodiversity
,
data collection
,
information management
2019
Communication of research findings is the last and arguably the most influential step of the scientific process. This is especially true for biodiversity science, in which new species descriptions and introduction of new taxonomic names happens through publication, as governed by the International Codes. Despite the strict rules for naming new taxa and revising existing taxonomic nomenclatures within scholarly literature, there is no system for keeping track of these changes and information often remains locked within the text of thousands of scattered journal articles. This talk presents OpenBiodiv-O, the first ontology which conceptually models the biodiversity publishing domain and through its application in the semantic graph database OpenBiodiv contributes to knowledge management of this domain. In combination with already existing ontologies for biodiversity and publishing (e.g. DarwinCore-based ontologies, SPAR ontologies), resource types introduced by OpenBiodiv-O help to create a link between these two domains. The ontology models the general structure of a research article, including sections specific to taxonomic articles, such as the treatment section, as well as other conceptual entities from taxonomy, like scientific names and taxonomic concepts. Thus, OpenBiodiv-O links scientific names to the corresponding article section in which they are mentioned via the class Taxonomic Name Usage and helps to discover hidden relationships between names. In addition, OpenBiodiv-O models the article metadata, such as the author names, affiliations and unique identifiers. The orcid class from the recently introduced Datacite ontology within OpenBiodiv-O models the ORCID of authors and will enable the future disambiguation of authors and linking with other platforms using ORCID. OpenBiodiv-O has been applied to the biodiversity knowledge graph OpenBiodiv, which is based on a Linked Open Dataset, created from Pensoft's journal articles and Plazi's treatments. Publishing of semantically enhanced scholarly literature as XML enables the conversion of semi-structured narrative into connected Resource Description Framework (RDF) statements. The ontology serves as a skeleton for the transformation of more than 729 million statements into a Linked Open Dataset. Reusing of existing ontologies within OpenBiodiv-O helps to establish a link between OpenBiodiv-O and other ontologies and facilitates federated querying between OpenBiodiv and other knowledge graphs. The application of OpenBiodiv-O towards a working solution for the biodiversity publishing domain demonstrates the potential of ontology modelling for data organisation and management.
Journal Article
New Applications of “Ontology-to-Text Relation” Strategy for Bulgarian Language
by
Simov, Kiril
,
Staykova, Kamenka
,
Osenova, Petya
in
annotation grammars
,
Annotations
,
Bulgaria
2012
The paper presents new applications of the Ontology-to-Text Relation Strategy to Bulgarian Iconographic Domain. First the strategy itself is discussed within the triple ontology-terminological lexicon-annotation grammars, then - the related works. Also, the specifics of the semantic annotation and evaluation over iconographic data are presented. A family of domain ontologies over the iconographic domain are created and used. The evaluation against a gold standard shows that this strategy is good enough for more precise, but shallow results, and can be supported further by deep parsing techniques.
Journal Article
OpenBiodiv Poster: an Implementation of a Semantic System Running on top of the Biodiversity Knowledge Graph
2017
We presentOpenBiodiv- an implementation of the Open Biodiversity Knowledge Management System. The need for an integrated information system serving the needs of the biodiversity community can be dated at least as far back as the sanctioning of theBouchout declarationin 2007. The Bouchout declaration proposes to make biodiversity knowledge freely available as Linked Open Data (LOD)*1. At TDWG 2016 (Fig.1) we presented the prototype of the system - then called Open Biodiversity Knolwedge Management System (OBKMS) (Senderov et al. 2016). The specification and design of OpenBiodiv was then outlined in more detail bySenderov and Penev (2016). In this poster, we describe the pilot implementation. We believe OpenBiodiv is possibly the first pilot-stage implementation of a semantic system running on top of a biodiversity knowledge graph. OpenBiodiv has several components: OpenBiodiv ontology: A general data model supporting the extraction of biodiversity knowledge from taxonomic articles or from databases such as GBIF. The ontology (in preparation, Journal of Biomedical Semantics, available on GitHub) incorporates several pre-existing models: Darwin-SW (Baskauf and Webb 2016), SPAR (Peroni 2014), Treatment Ontology, and several others. It defines classes, properties, and rules supporting the interlinking of these disparate ontologies to create a LOD biodiversity knowledge graph. A new addition is the Taxonomic Name Usage class, accompanied by a Vocabulary of Taxonomic Statuses (created via an analysis of 4, 000 Pensoft articles) enabling for the automated inference of the taxonomic status of Latinized scientific names. The ontology supports multiple backbone taxonomies via the introduction of a Taxon Concept class (equivalent to DarwinCore Taxon) and Taxon Concept Labels as a subclass of biological name. The Biodiversity Knowledge Graph: A LOD dataset of information extracted from taxonomic literature and databases. To date, this resource has realized part of what was proposed during thepro-iBiosphereproject and later discussed byPage (2016). Its main resources are articles, sub-article componets (tables, figures, treatents, references), author names, institution names, geographical locations, biological names, taxon concepts, and occurrences. Authors have been disambiguated via their affiliation with the use of fuzzy-logic based on theGraphDB Lucene connector. The graph interlinks: (1) Prospectively published literature viaPensoft Publishers.(2) Legacy literature viaPlazi. (3) Well-known resources such as geographical places or institutions viaDBPedia.(4) GBIF's backbone taxonomy as a default but not the preferential hierarchy of taxon concepts. (5)OpenBiodivid's with nomenclator id's (e.g.ZooBank)whenever possible. Names form two networks in the graph: (1) A directed-acyclical graph (DAG) of supercedence that can be followed to the corresponding sinks to infer the currently applicable scientific name for a given taxon. (2) A network of bi-directional relations indicating the relatedness of names. These names may be compared to the related names inferred on the basis of distributional semantics (Nguyen et al. 2017). ropenbio: An R package for RDF*2-ization of biodiversity information resources according to the OpenBiodiv ontology. We intend to submit this to the rOpenSci project. While many of its high-level functions are specific to OpenBiodiv, the low-level functions, and its RDF-ization framework can be used for any R-based RDF-ization effort. OpenBiodiv.net: A front-end of the system allowing users to run low-level SPARQL queries as well to use an extensible set of semantic apps running on top of a biodiversity knowledge graph.
Journal Article