Catalogue Search | MBRL

An Interface for Linking Ancient Languages

by Bandini, Michela , Quochi, Valeria , Mallia, Michele in Ancient languages , Digital historical linguistics , eLexicography

2024

This paper focuses on the linking potentials offered by the EpiLexO web-based front-end for creating and editing an ecosystem of digital resources for ancient languages, developed in the context of a project on the languages of fragmentary attestation of ancient Italy. The focus is particularly on mechanisms introduced for linking lexical information to other information bits either internally or externally, e.g., for creating attestations by linking lexical forms to their variants in relevant inscriptions, as well as for linking lexical data to external independent LOD datasets available on a remote endpoint. Finally, in the conclusions, we briefly introduce some future planned or desired enhancements as well as the final platform component, a parallel interface that constitutes the fruition application, which will be open to anyone on the web and will allow for browsing, searching, cross-querying and visualizing the created set of interlinked resources.

Journal Article

Share this book

Add to My Shelf

PreMOn: LODifing linguistic predicate models

by Corcoglioniti, Francesco , Palmero Aprosio, Alessio , Rospocher, Marco in Computational Linguistics , Computer Science , Knowledge representation

2019

PreMOn is a freely available linguistic resource for exposing predicate models (PropBank, NomBank, VerbNet, and FrameNet) and mappings between them (e.g., SemLink and the predicate matrix) as linguistic linked open data (LOD). It consists of two components: (1) the PreMOn Ontology , that builds on the OntoLex-Lemon model by the W3C ontology-Lexica community group to enable an homogeneous representation of data from various predicate models and their linking to ontological resources; and, (2) the PreMOn Dataset , a LOD dataset integrating various versions of the aforementioned predicate models and mappings, linked to other LOD ontologies and resources (e.g., FrameBase, ESO, WordNet RDF). PreMOn is accessible online in different ways (e.g., SPARQL endpoint), and extensively documented.

Journal Article

Share this book

Add to My Shelf

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

by Chiarcos, Christian , Khait, Ilya , Steuer, Julius in Annotations , Cuneiform , Languages

2018

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.

Journal Article

Share this book

Add to My Shelf

LLOD schema for Simplified Offensive Language Taxonomy in multilingual detection and applications

by Dontcheva-Navrátilová, Olga , Valūnaitė Oleškevičienė, Giedrė , Žitnik, Slavko in annotation , Classification , Czech language

2023

The goal of the paper is to present a Simplified Offensive Language (SOL) Taxonomy, its application and testing in the Second Annotation Campaign conducted between March-May 2023 on four languages: English, Czech, Lithuanian, and Polish to be verified and located in LLOD. Making reference to the previous Offensive Language taxonomic models proposed mostly by the same COST Action Nexus Linguarum WG 4.1.1 team, the number and variety of the categories underwent the definitional revision, and the present typology was tested in the annotation on the publicly available offensive language datasets of each of the four languages. The results of the annotation are presented and as they are contained within the accepted statistical values on the inter-annotator agreement in the SOL categories and their aspects, we propose this taxonomy as a core ontology which represents the encoding of the supported offensive languages and justify its use on new data in terms of a more universal Linguistic Linked Open Data (LLOD) schema.

Journal Article

Share this book

Add to My Shelf

Semantic Modelling and Publishing of Traditional Data Collection Questionnaires and Answers

by Wandl-Vogt, Eveline , Way, Andy , Dorn, Amelie in Culture , Data collection , Datasets

2018

Extensive collections of data of linguistic, historical and socio-cultural importance are stored in libraries, museums and national archives with enormous potential to support research. However, a sizable portion of the data remains underutilised because of a lack of the required knowledge to model the data semantically and convert it into a format suitable for the semantic web. Although many institutions have produced digital versions of their collection, semantic enrichment, interlinking and exploration are still missing from digitised versions. In this paper, we present a model that provides structure and semantics to a non-standard linguistic and historical data collection on the example of the Bavarian dialects in Austria at the Austrian Academy of Sciences. We followed a semantic modelling approach that utilises the knowledge of domain experts and the corresponding schema produced during the data collection process. The model is used to enrich, interlink and publish the collection semantically. The dataset includes questionnaires and answers as well as supplementary information about the circumstances of the data collection (person, location, time, etc.). The semantic uplift is demonstrated by converting a subset of the collection to a Linked Open Data (LOD) format, where domain experts evaluated the model and the resulting dataset for its support of user queries.

Journal Article

Share this book

Add to My Shelf

A Data Driven Approach for Raw Material Terminology

by Tomašević, Aleksandra , Kolonja, Ljiljana , Stanković, Ranka in Bilingualism , Collaboration , Data analysis

2021

The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has been generated and a mobile application for its use. Available (terminological) resources will be presented—paper dictionaries and digital resources related to the raw material domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well as adding bilingual terms. Dictionary development is relying on corpus analysis, details of which are also presented. Usage examples, collocations and concordances play an important role in raw material terminology, and have also been included in this research. Some important related issues discussed are collocation extraction methods, the use of domain labels, lexical and semantic relations, definitions and subentries.

Journal Article

Share this book

Add to My Shelf

Linked open data-based framework for automatic biomedical ontology generation

by Malik, Khalid Mahmood , Sabra, Susan , Alobaidi, Mazen in Algorithms , Artificial intelligence , Automation

2018

Background Fulfilling the vision of Semantic Web requires an accurate data model for organizing knowledge and sharing common understanding of the domain. Fitting this description, ontologies are the cornerstones of Semantic Web and can be used to solve many problems of clinical information and biomedical engineering, such as word sense disambiguation, semantic similarity, question answering, ontology alignment, etc. Manual construction of ontology is labor intensive and requires domain experts and ontology engineers. To downsize the labor-intensive nature of ontology generation and minimize the need for domain experts, we present a novel automated ontology generation framework, Linked Open Data approach for Automatic Biomedical Ontology Generation (LOD-ABOG), which is empowered by Linked Open Data (LOD). LOD-ABOG performs concept extraction using knowledge base mainly UMLS and LOD, along with Natural Language Processing (NLP) operations; and applies relation extraction using LOD, Breadth first Search (BSF) graph method, and Freepal repository patterns. Results Our evaluation shows improved results in most of the tasks of ontology generation compared to those obtained by existing frameworks. We evaluated the performance of individual tasks (modules) of proposed framework using CDR and SemMedDB datasets. For concept extraction, evaluation shows an average F-measure of 58.12% for CDR corpus and 81.68% for SemMedDB; F-measure of 65.26% and 77.44% for biomedical taxonomic relation extraction using datasets of CDR and SemMedDB, respectively; and F-measure of 52.78% and 58.12% for biomedical non-taxonomic relation extraction using CDR corpus and SemMedDB, respectively. Additionally, the comparison with manually constructed baseline Alzheimer ontology shows F-measure of 72.48% in terms of concepts detection, 76.27% in relation extraction, and 83.28% in property extraction. Also, we compared our proposed framework with ontology-learning framework called “OntoGain” which shows that LOD-ABOG performs 14.76% better in terms of relation extraction. Conclusion This paper has presented LOD-ABOG framework which shows that current LOD sources and technologies are a promising solution to automate the process of biomedical ontology generation and extract relations to a greater extent. In addition, unlike existing frameworks which require domain experts in ontology development process, the proposed approach requires involvement of them only for improvement purpose at the end of ontology life cycle.

Journal Article

Share this book

Add to My Shelf

Discovering Links between Geospatial Data Sources in the Web of Data: The Open Geospatial Engine Approach

by He, Lianlian , Liu, Ruixiang in Computational linguistics , data collection , Data integration

2024

The Web of Data has been fueled significantly by geospatial data over the last few years. In the current link discovery frameworks, there is still a lack of robust support for finding geospatial-aware links between geospatial data sources in the Web of Data. They are also limited in efficient association capabilities for large-scale datasets. This paper extends the data integration capability based on the spatial metrics in the open geospatial engine OGE. These metrics include topological relationships and spatial matching between geospatial entities within multiple geospatial data sources. Thus, the tool can be employed by data publishers to set geospatial-aware links to facilitate geospatial data and knowledge discovery in the Web of Data. Several geospatial data sources are used to demonstrate the usability and effectiveness of the approach and tool implementation.

Journal Article

Share this book

Add to My Shelf

Explainable Bilingual Medical-Question-Answering Model Using Ensemble Learning Technique

by Alkhurayyif, Yazeed , Sait, Abdul Rahaman Wahab in Access to information , Accessibility , Accuracy

2025

Accessing reliable medical information is a major challenge for healthcare professionals due to limited accessibility to real-time medical data sources. The study’s objectives are maximization of response accuracy with minimal latency and enhancement of the model’s interpretability. An explainable bilingual medical-question-answering system (MQAS) is introduced to improve accessibility and trust in healthcare information retrieval. Using knowledge-aware networks (KANs), retrieval augmented generation (RAG), and linked open data (LOD), a synthetic bilingual dataset is generated. Through the application of a synthetic dataset and Bayesian optimization HyperBand (BOHB)-based hyperparameter optimization, the performance of GPT-Neo and RoBERTa models is fine-tuned. The outcomes of GPT-Neo and RoBERTa are ensembled using the weighted majority voting approach, while Shapley Additive ExPlanation (SHAP) value provides interpretability and transparency. The proposed model is trained and evaluated using diverse medical-question-answering datasets, demonstrating superior performance over baseline models. It achieves a generalization accuracy of 90.58%, an F1-score of 89.62%, and a BLEU score of 0.80 with a low inference time of 3.4 s per query. The findings underscore the model’s potential in delivering accurate, bilingual, and explainable medical responses. This study establishes a foundation for building multilingual healthcare information systems, promoting inclusive and equitable access to medical information.

Journal Article

Share this book

Add to My Shelf

Doc2KG: Transforming Document Repositories to Knowledge Graphs

by Bassiliades, Nick , Vlachava, Danai , Konstantinidis, Ioannis in Computational linguistics , Computer programs , Document management systems

2022

Document Management Systems (DMS) are used for decades to store large amounts of information in textual form. Their technology paradigm is based on storing vast quantities of textual information enriched with metadata to support searchability. However, this exhibits limitations as it treats textual information as black box and is based exclusively on user-created metadata, a process that suffers from quality and completeness shortcomings. The use of knowledge graphs in DMS can substantially improve searchability, providing the ability to link data and enabling semantic searching. Recent approaches focus on either creating knowledge graphs from document collections or updating existing ones. In this paper, we introduce Doc2KG (Document-to-Knowledge-Graph), an intelligent framework that handles both creation and real-time updating of a knowledge graph, while also exploiting domain-specific ontology standards. We use DIAVGEIA (clarity), an award winning Greek open government portal, as our case-study and discuss new capabilities for the portal by implementing Doc2KG.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter