Catalogue Search | MBRL

Survey on graph embeddings and their applications to machine learning problems on graphs

by Subelj, Lovro , Nikitinsky, Nikita , Makarov, Ilya in Algorithms , Artificial Intelligence , Classification

2021

Dealing with relational data always required significant computational resources, domain expertise and task-dependent feature engineering to incorporate structural information into a predictive model. Nowadays, a family of automated graph feature engineering techniques has been proposed in different streams of literature. So-called graph embeddings provide a powerful tool to construct vectorized feature spaces for graphs and their components, such as nodes, edges and subgraphs under preserving inner graph properties. Using the constructed feature spaces, many machine learning problems on graphs can be solved via standard frameworks suitable for vectorized feature representation. Our survey aims to describe the core concepts of graph embeddings and provide several taxonomies for their description. First, we start with the methodological approach and extract three types of graph embedding models based on matrix factorization, random-walks and deep learning approaches. Next, we describe how different types of networks impact the ability of models to incorporate structural and attributed data into a unified embedding. Going further, we perform a thorough evaluation of graph embedding applications to machine learning problems on graphs, among which are node classification, link prediction, clustering, visualization, compression, and a family of the whole graph embedding algorithms suitable for graph classification, similarity and alignment problems. Finally, we overview the existing applications of graph embeddings to computer science domains, formulate open problems and provide experiment results, explaining how different networks properties result in graph embeddings quality in the four classic machine learning problems on graphs, such as node classification, link prediction, clustering and graph visualization. As a result, our survey covers a new rapidly growing field of network feature engineering, presents an in-depth analysis of models based on network types, and overviews a wide range of applications to machine learning problems on graphs.

Journal Article

Share this book

Add to My Shelf

FrameAxis: characterizing microframe bias and intensity with word embedding

by Jing, Elise , Ahn, Yong-Yeol , Kwak, Haewoon in Analysis , Annotations , Antonyms

2021

Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes (“microframes”) that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. It can also provide nuanced insights by considering a rich set of semantic axes. FrameAxis is designed to quantitatively tease out two important dimensions of how microframes are used in the text. Microframe bias captures how biased the text is on a certain microframe, and microframe intensity shows how prominently a certain microframe is used. Together, they offer a detailed characterization of the text. We demonstrate that microframes with the highest bias and intensity align well with sentiment, topic, and partisan spectrum by applying FrameAxis to multiple datasets from restaurant reviews to political news. The existing domain knowledge can be incorporated into FrameAxis by using custom microframes and by using FrameAxis as an iterative exploratory analysis instrument. Additionally, we propose methods for explaining the results of FrameAxis at the level of individual words and documents. Our method may accelerate scalable and sophisticated computational analyses of framing across disciplines.

Journal Article

Share this book

Add to My Shelf

ShExML: improving the usability of heterogeneous data mapping languages for first-time users

by Boneva, Iovka , Cueva Lovelle, Juan Manuel , García-González, Herminio in Analysis , Artificial Intelligence , Computer Science

2020

Integration of heterogeneous data sources in a single representation is an active field with many different tools and techniques. In the case of text-based approaches—those that base the definition of the mappings and the integration on a DSL—there is a lack of usability studies. In this work we have conducted a usability experiment ( n = 17) on three different languages: ShExML (our own language), YARRRML and SPARQL-Generate. Results show that ShExML users tend to perform better than those of YARRRML and SPARQL-Generate. This study sheds light on usability aspects of these languages design and remarks some aspects of improvement.

Journal Article

Share this book

Add to My Shelf

Relational graph convolutional networks: a closer look

by Groth, Paul , van Berkel, Lucas , Thanapalasingam, Thiviyan in Analysis , Artificial Intelligence , Artificial neural networks

2022

In this article, we describe a reproduction of the Relational Graph Convolutional Network (RGCN). Using our reproduction, we explain the intuition behind the model. Our reproduction results empirically validate the correctness of our implementations using benchmark Knowledge Graph datasets on node classification and link prediction tasks. Our explanation provides a friendly understanding of the different components of the RGCN for both users and researchers extending the RGCN approach. Furthermore, we introduce two new configurations of the RGCN that are more parameter efficient. The code and datasets are available at https://github.com/thiviyanT/torch-rgcn .

Journal Article

Share this book

Add to My Shelf

Web content topic modeling using LDA and HTML tags

by Altarturi, Hamza H.M. , Saadoon, Muntadher , Anuar, Nor Badrul in Analysis , Computational linguistics , Data mining

2023

An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.

Journal Article

Share this book

Add to My Shelf

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

by Krüger, Frank , Schindler, David , Bensmann, Felix in Academic publications , Analysis , Bibliographical citations

2022

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.

Journal Article

Share this book

Add to My Shelf

Semantic micro-contributions with decentralized nanopublication services

by Emonet, Vincent , Kuhn, Tobias , Antonatos, Haris in Access control , Analysis , Blockchain

2021

While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.

Journal Article

Share this book

Add to My Shelf

RA-QoS: a robust autoencoder-based QoS predictor for highly accurate web service QoS prediction

by Li, Junnan , Wang, Lufeng , Fu, Shun in Data Mining and Machine Learning , Deep neural networks , Neural Networks

2025

Web services are fundamental for online service-oriented applications, where accurately predicting quality of service (QoS) is critical for recommending optimal services among multiple candidates. Since QoS data often contains noise—stemming from factors like remote user or service locations—current deep neural network (DNN)-based QoS predictors, which generally rely on L2-norm loss functions, face limitations in robustness due to sensitivity to outliers. To address this issue, we propose a novel robust autoencoder-based QoS predictor (RA-QoS) that leverages a hybrid loss function combining bias, training bias, L1-norm and L2-norm to build a robust Autoencoder. This hybrid approach allows RA-QoS to better handle noisy data, minimizing the impact of outliers and biases on prediction accuracy. The RA-QoS model further incorporates preprocessing and training biases, improving its adaptability to real-world QoS data. To evaluate the proposed RA-QoS predictor, extensive experiments are conducted on two real-world QoS datasets. The results demonstrate that our RA-QoS predictor exhibits superior robustness to outliers and higher accuracy in QoS prediction compared to the related state-of-the-art models.

Journal Article

Share this book

Add to My Shelf

Result Assessment Tool (RAT): empowering search engine data analysis

by Lewandowski, Dirk , Schultheiß, Sebastian , Yagci, Nurce in Retrieval effectiveness studies , Retrieval tests , Search engine evaluation

2025

The Result Assessment Tool (RAT) is a Python-based software toolkit that enables researchers to analyze results from commercial search engines, social media platforms, and library search systems. RAT provides an integrated environment for designing studies, collecting results, and performing automated analysis. The software consists of two main modules: RAT Frontend and RAT Backend. RAT Frontend uses Flask to provide a researcher view for designing studies and an evaluation view for collecting ratings from study participants. RAT Backend includes modules for collecting search results, extracting source code, and adding classifiers for automated analysis. The system has been used in various studies, including search engine effectiveness studies, interactive information retrieval studies, and classification studies.

Journal Article

Share this book

Add to My Shelf

Accessibility challenges of e-commerce websites

by Acosta-Vargas, Patricia , Jadán-Guerrero, Janio , Salvador-Ullauri, Luis in Accessibility , Algorithms , Analysis

2022

Today, there are many e-commerce websites, but not all of them are accessible. Accessibility is a crucial element that can make a difference and determine the success or failure of a digital business. The study was applied to 50 e-commerce sites in the top rankings according to the classification proposed by ecommerceDB. In evaluating the web accessibility of e-commerce sites, we applied an automatic review method based on a modification of Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0. To evaluate accessibility, we used Web Accessibility Evaluation Tool (WAVE) with the extension for Google Chrome, which helps verify password-protected, locally stored, or highly dynamic pages. The study found that the correlation between the ranking of e-commerce websites and accessibility barriers is 0.329, indicating that the correlation is low positive according to Spearman’s Rho. According to the WAVE analysis, the research results reveal that the top 10 most accessible websites are Sainsbury’s Supermarkets, Walmart, Target Corporation, Macy’s, IKEA, H&M Hennes, Chewy, Kroger, QVC, and Nike. The most significant number of accessibility barriers relate to contrast errors that must be corrected for e-commerce websites to reach an acceptable level of accessibility. The most neglected accessibility principle is perceivable, representing 83.1%, followed by operable with 13.7%, in third place is robust with 1.7% and finally understandable with 1.5%. Future work suggests constructing a software tool that includes artificial intelligence algorithms that help the software identify accessibility barriers.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter