Catalogue Search | MBRL

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

by Ramaswamy, Krishnaraj , Saritha, K. , Muthusami, R. in 639/166 , 639/301 , 639/705

2024

The online channel has affected many facets of an individual's identity, commercial, social policy, and culture, among others. It implies that discovering the topics on which these brief writings are focused, as well as examining the qualities of these short texts is critical. Another key issue that has been identified is the evaluation of newly discovered topics in terms of topic quality, which includes topic separation and coherence. A topic modeling method has been shown to be an outstanding aid in the linguistic interpretation of quite tiny texts. Based on the underlying strategy, topic models are divided into two categories: probabilistic methods and non-probabilistic methods. In this research, short texts are analyzed using topic models, including latent Dirichlet allocation (LDA) for probabilistic topic modeling and non-negative matrix factorization (NMF) for non-probabilistic topic modeling. A novel approach for topic evaluation is used, such as clustering methods and silhouette analysis on both models, to investigate performance in terms of quality. The experiment results indicate that the proposed evaluation method outperforms on both LDA and NMF.

Journal Article

Share this book

Add to My Shelf

Tracing Long-term Value Change in (Energy) Technologies

by de Wildt, T. E. , Chappin, E. J. L. , van de Poel, I. R. in Acceptability , Advantages , Changes

2022

We propose a new approach for tracing value change. Value change may lead to a mismatch between current value priorities in society and the values for which technologies were designed in the past, such as energy technologies based on fossil fuels, which were developed when sustainability was not considered a very important value. Better anticipating value change is essential to avoid a lack of social acceptance and moral acceptability of technologies. While value change can be studied historically and qualitatively, we propose a more quantitative approach that uses large text corpora. It uses probabilistic topic models, which allow us to trace (new) values that are (still) latent. We demonstrate the approach for five types of value change in technology. Our approach is useful for testing hypotheses about value change, such as verifying whether value change has occurred and identifying patterns of value change. The approach can be used to trace value change for various technologies and text corpora, including scientific articles, newspaper articles, and policy documents.

Journal Article

Share this book

Add to My Shelf

Probabilistic Topic Model for Hybrid Recommender Systems: A Stochastic Variational Bayesian Approach

by Ansari, Asim , Li, Yang , Zhang, Jonathan Z. in Analysis , Bayesian analysis , Big Data

2018

This paper proposes a novel covariate-guided heterogeneous supervised topic model for online movie recommendation and develops a stochastic variational Bayesian framework to achieve fast, scalable, and accurate estimation in big data settings. Internet recommender systems are popular in contexts that include heterogeneous consumers and numerous products. In such contexts, product features that adequately describe all the products are often not readily available. Content-based systems therefore rely on user-generated content such as product reviews or textual product tags to make recommendations. In this paper, we develop a novel covariate-guided, heterogeneous supervised topic model that uses product covariates, user ratings, and product tags to succinctly characterize products in terms of latent topics and specifies consumer preferences via these topics. Recommendation contexts also generate big-data problems stemming from data volume, variety, and veracity, as in our setting, which includes massive textual and numerical data. We therefore develop a novel stochastic variational Bayesian framework to achieve fast, scalable, and accurate estimation in such big-data settings and apply it to a MovieLens data set of movie ratings and semantic tags. We show that our model yields interesting insights about movie preferences and makes much better predictions than a benchmark model that uses only product covariates. We show how our model can be used to target recommendations to particular users and illustrate its use in generating personalized search rankings of relevant products. Data are available at https://doi.org/10.1287/mksc.2018.1113 .

Journal Article

Share this book

Add to My Shelf

A decade of research in statistics: a topic model approach

by Ferrara, Alfio , De Battisti, Francesca , Salini, Silvia in Bibliometrics , Citations , Clustering

2015

Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.

Journal Article

Share this book

Add to My Shelf

The development of a competence framework for artificial intelligence professionals using probabilistic topic modelling

by Bick, Markus , Brauner, Sonja , Murawski, Matthias in Artificial intelligence , Big Data , Categories

2025

PurposeThe current gap between the required and available artificial intelligence (AI) professionals poses significant challenges for organisations and academia. Organisations are challenged to identify and secure the appropriate AI competencies. Simultaneously, academia is challenged to design, offer and quickly scale academic programmes in line with industry needs and train new generations of AI professionals. Therefore, identifying and structuring AI competencies is necessary to effectively overcome the AI competence shortage.Design/methodology/approachA probabilistic topic model was applied to explore the AI competence categories empirically. The authors analysed 1159 AI-related online job ads published on LinkedIn.FindingsThe authors identified five predominant competence categories: (1) Data Science, (2) AI Software Development, (3) AI Product Development and Management, (4) AI Client Servicing, and (5) AI Research. These five competence categories were summarised under the developed AI competence framework.Originality/valueThe AI competence framework contributes to clarifying and structuring the diverse AI landscape. These findings have the potential to aid various stakeholders involved in the process of training, recruiting and selecting AI professionals. They may guide organisations in constructing a complementary portfolio of AI competencies by helping users match the right competence requirements with an organisation's needs and business objectives. Similarly, they can support academia in designing academic programmes aligned with industry needs. Furthermore, while focusing on AI, this study contributes to the research stream of information technology (IT) competencies.

Journal Article

Share this book

Add to My Shelf

On mining latent treatment patterns from electronic medical records

by Huang, Zhengxing , Ji, Lei , Duan, Huilong in Angina pectoris , Artificial Intelligence , Automation

2015

Clinical pathway (CP) analysis plays an important role in health-care management in ensuring specialized, standardized, normalized and sophisticated therapy procedures for individual patients. Recently, with the rapid development of hospital information systems, a large volume of electronic medical records (EMRs) has been produced, which provides a comprehensive source for CP analysis. In this paper, we are concerned with the problem of utilizing the heterogeneous EMRs to assist CP analysis and improvement. More specifically, we develop a probabilistic topic model to link patient features and treatment behaviors together to mine treatment patterns hidden in EMRs. Discovered treatment patterns, as actionable knowledge representing the best practice for most patients in most time of their treatment processes, form the backbone of CPs, and can be exploited to help physicians better understand their specialty and learn from previous experiences for CP analysis and improvement. Experimental results on a real collection of 985 EMRs collected from a Chinese hospital show that the proposed approach can effectively identify meaningful treatment patterns from EMRs.

Journal Article

Share this book

Add to My Shelf

Modeling method of internet public information data mining based on probabilistic topic model

by Liu, Lizhi , Liu, Jun , Wu, Shaofei in Artificial intelligence , Classification , Clustering

2019

From the perspective of military intelligence work, the rise and widespread use of the network has opened up new horizons for intelligence acquisition, but the continuous increase in the amount of data in the network has made traditional intelligence analysis stretched. Therefore, enriching and developing military intelligence analysis methods have certain practical significance to make up for the shortcomings in current intelligence analysis. The traditional military intelligence analysis method cannot realize the in-depth mining and analysis of the network Shanghai information, obtain the deep intelligence knowledge required by the military, introduce the data mining technology into the military intelligence analysis and construct the network military intelligence analysis based on data mining model. The semantic analysis-based intelligence analysis algorithm in this model has certain superiority compared with the traditional association analysis, which can effectively improve the analysis efficiency and accuracy of military intelligence.

Journal Article

Share this book

Add to My Shelf

Knowledge discovery through directed probabilistic topic models: a survey

by LI, Juanzi , MUHAMMAD, Faqir , ZHOU, Lizhu in Algorithms , Classification , Clustering

2010

Graphical models have become the basic framework for topic based probabilistic modeling. Especially models with latent variables have proved to be effective in capturing hidden structures in the data. In this paper, we survey an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora. From an unsupervised learning perspective, \"topics are semantically related probabilistic clusters of words in text corpora; and the process for finding these topics is called topic modeling\". In topic modeling, a document consists of different hidden topics and the topic probabilities provide an explicit representation of a document to smooth data from the semantic level. It has been an active area of research during the last decade. Many models have been proposed for handling the problems of modeling text corpora with different characteristics, for applications such as document classification, hidden association finding, expert finding, community discovery and temporal trend analysis. We give basic concepts, advantages and disadvantages in a chronological order, existing models classification into different categories, their parameter estimation and inference making algorithms with models performance evaluation measures. We also discuss their applications, open challenges and future directions in this dynamic area of research.

Journal Article

Share this book

Add to My Shelf

Probabilistic topic modeling for the analysis and classification of genomic sequences

by Rizzo, Riccardo , Urso, Alfonso , Fiannaca, Antonino in Algorithms , Bacteria - classification , Bacteria - genetics

2015

Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k -mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k -mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased.

Journal Article

Share this book

Add to My Shelf

LEOnto+: a scalable ontology enrichment approach

by Chbeir, Richard , Tissaoui, Anis , Sassi, Salma in Dirichlet problem , Documents , Information retrieval

2022

Distributional semantic models like the Latent Dirichlet Allocation (LDA) model Guo et al. (Concurr. Comput.: Pract. Exper. 29(3), 319–343 2016) consist of defining similar representation of words according to their similar context. LDA has been originally used to model documents and extract topics in Information Retrieval. In recent years, LDA has become a hot topic among ontology learning because of the exponential increase of the number of documents and textual data not only on the web but also in digital libraries. LDA-based approaches have proven to provide the best result. However, they suffer of several limitations related to concept and relation extraction, as well as handling the corpus evolution and maintaining. In order to cope with these problems, we propose in this paper LEOnto+, an extended version of LEOnto (Tissaoui et al. 2020, Tissaoui et al. SN Comput. Sci. J. 1: 336 2020), to provide a new approach for automatic ontology enriching from textual corpus. In LEOnto+, LDA is used to provide dimension reduction and to identify semantic relationships between topic-document and word-topic using probability distributions. Here, we provide several experiments conducted using several evaluation techniques (Evaluation based criteria, Gold standard evaluation, Expert evaluation, Task-based evaluation and Corpus-based evaluation). We also compare the results of LEOnto+ with two existing methods using their respective datasets. The evaluation results show that LEOnto+ outperforms the aforementioned methods (particularly in terms of precision). We also compare our approach using two large corpus in order to demonstrate its scalability.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter