Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
297,045
result(s) for
"Topic models"
Sort by:
Validating psychological constructs : historical, philosophical, and practical dimensions
\"This book critically examines the historical and philosophical foundations of construct validity theory (CVT), and how these have and continue to inform and constrain the conceptualization of validity and its application in research. CVT has had an immense impact on how researchers in the behavioural sciences conceptualize and approach their subject matter. Yet, there is equivocation regarding the foundations of the CVT framework as well as ambiguities concerning the nature of the 'constructs' that are its raison d'etre. The book is organized in terms of three major parts that speak, respectively, to the historical, philosophical, and pragmatic dimensions of CVT. The primary objective is to provide researchers and students with a critical lens through which a deeper understanding may be gained of both the utility and limitations of CVT and the validation practices to which it has given rise.\"-- Back cover.
Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains
2024
The online channel has affected many facets of an individual's identity, commercial, social policy, and culture, among others. It implies that discovering the topics on which these brief writings are focused, as well as examining the qualities of these short texts is critical. Another key issue that has been identified is the evaluation of newly discovered topics in terms of topic quality, which includes topic separation and coherence. A topic modeling method has been shown to be an outstanding aid in the linguistic interpretation of quite tiny texts. Based on the underlying strategy, topic models are divided into two categories: probabilistic methods and non-probabilistic methods. In this research, short texts are analyzed using topic models, including latent Dirichlet allocation (LDA) for probabilistic topic modeling and non-negative matrix factorization (NMF) for non-probabilistic topic modeling. A novel approach for topic evaluation is used, such as clustering methods and silhouette analysis on both models, to investigate performance in terms of quality. The experiment results indicate that the proposed evaluation method outperforms on both LDA and NMF.
Journal Article
Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
2023
Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.
Journal Article
Twenty Years of Left-Behind Children Education in Rural China: Based on Structural Topic Model
by
WANG Xing, LI Yeye, ZHOU Tianyu, LIU Feng
in
left-behind children|topic model|structural topic model|information literacy
2023
[Purpose/Significance] The introduction of national poverty alleviation policies and rural revitalization strategies has thrust the issue of education for left-behind children into the spotlight of scholarly attention. Education, far beyond serving as a mere instrument for personal growth and human capital accumulation for left-behind children, emerges as a pivotal measure in consolidating rural poverty alleviation endeavors and breaking the transmission of intergenerational poverty in China. It stands as a vital force propelling the future of rural revitalization. Yet, the existing literature on the education of left-behind children remains sporadic and dispersed. A more profound organizational effort, integrating, synthesizing, and evaluating this scattered literature, is imperative to establish a foundational framework for future research, fostering more cohesive and focused research endeavors. Presently, literature review studies primarily fall into three categories: qualitative review methods, meta-analysis, and bibliometric analysis methods employing tools like Citespace. This study sets out to achieve a systematic and comprehensive understanding of education-related issues for rural left-behind children through text mining methods grounded in topic models. [Method/Process] The advent of artificial intelligence and machine learning technologies has empowered the processing and analysis of vast amounts of textual data. Previous research, employing latent dirichlet allocation (LDA) topic models, successfully mined texts related to teacher team construction reform policies, internationalization in higher education literature, news reports, and online comments. In this study, a corpus was meticulously constructed using abstract texts extracted from 2037 journal articles published between 2002 and 2023. The structural topic model (STM) was chosen for topic modeling, overcoming the limitations associated with LDA, with a specific emphasis on exploring the diversity and dynamism of topics within the existing literature. [Results/Conclusions] The culmination of this research effort identified eight distinct research themes: psychological well-being, factors leading to left-behind children, macro-level coping strategies, types of guardianship, review studies, family education, media literacy, and micro-level coping strategies. By synergizing document metadata information, the study systematically unraveled the evolving trends of these topics over time, providing crucial insights into potential shifts in the focus of left-behind children's education research. It is essential to note that this study, while collecting abstracts instead of full texts, may not capture the entirety of information contained in complete research articles. Future research endeavors should explore left-behind children's education more comprehensively, leveraging full-text mining techniques for a more nuanced understanding of this critical subject.
Journal Article
Web content topic modeling using LDA and HTML tags
by
Altarturi, Hamza H.M.
,
Saadoon, Muntadher
,
Anuar, Nor Badrul
in
Analysis
,
Computational linguistics
,
Data mining
2023
An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.
Journal Article
Extracting information and inferences from a large text corpus
by
Acharjya, Debi Prasanna
,
Chauhan, Ritu
,
Avasthi, Sandhya
in
Accuracy
,
Algorithms
,
Artificial Intelligence
2023
The usage of various software applications has grown tremendously due to the onset of Industry 4.0, giving rise to the accumulation of all forms of data. The scientific, biological, and social media text collections demand efficient machine learning methods for data interpretability, which organizations need in decision-making of all sorts. The topic models can be applied in text mining of biomedical articles, scientific articles, Twitter data, and blog posts. This paper analyzes and provides a comparison of the performance of Latent Dirichlet Allocation (LDA), Dynamic Topic Model (DTM), and Embedded Topic Model (ETM) techniques. An incremental topic model with word embedding (ITMWE) is proposed that processes large text data in an incremental environment and extracts latent topics that best describe the document collections. Experiments in both offline and online settings on large real-world document collections such as CORD-19, NIPS papers, and Tweet datasets show that, while LDA and DTM is a good model for discovering word-level topics, ITMWE discovers better document-level topic groups more efficiently in a dynamic environment, which is crucial in text mining applications.
Journal Article
Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
by
Austin, Peter C
,
Jaakkimainen, Liisa
,
Stukel, Therese A
in
Cohort analysis
,
Data mining
,
Dictionaries
2022
Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections.
We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada.
We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model.
Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs).
Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system.
Journal Article
Recurrent Embedded Topic Model
2023
In this paper we propose the Recurrent Embedded Topic Model (RETM) which is a modification of the Embedded Topic Modelling (ETM) by reusing the Continuous Bag of Words (CBOW) that the model had implemented and applying it to a recurrent neural network (LSTM), using the order of the words of the text, in the CBOW space as the recurrency of the LSTM, while calculating the topic–document distribution of the model. This approach is novel because the ETM and Latent Dirichlet Allocation (LDA) do not use the order of the words while calculating the topic proportions for each text, making worse predictions in the end. The RETM is a topic-modelling technique that vastly improves (by more than 15 times in train data and between 10% and 90% better based on test dataset values for perplexity) the quality of the topics that were calculated for the datasets used in this paper. This model is explained in detail throughout the paper and presents results on different use cases on how the model performs against ETM and LDA. The RETM can be used with better accuracy for any topic model-related problem.
Journal Article
Hybrid Topic Cluster Models for Social Healthcare Data
2019
Social media and in particular, microblogs are becoming an important data source for disease surveillance, behavioral medicine, and public healthcare. Topic Models are widely used in microblog analytics for analyzing and integrating the textual data within a corpus. This paper uses health tweets as microblogs and attempts the health data clustering by topic models. The traditional topic models, such as Latent Semantic Indexing (LSI), Probabilistic Latent Schematic Indexing (PLSI), Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and integer Joint NMF(intJNMF) methods are used for health data clustering; however, they are intractable to assess the number of health topic clusters. Proper visualizations are essential to extract the information from and identifying trends of data, as they may include thousands of documents and millions of words. For visualization of topic clouds and health tendency in the document collection, we present hybrid topic models by integrating traditional topic models with VAT. Proposed hybrid topic models viz., Visual Non-negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual Probabilistic Latent Schematic Indexing (VPLSI) and Visual Latent Schematic Indexing (VLSI) are promising methods for accessing the health tendency and visualization of topic clusters from benchmarked and Twitter datasets. Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.
Journal Article
Transition of Socio-Demographic Characteristics in Urban Areas by Applying a Topic Model to Small Area Units
2022
Under the depopulation society in Japan, the hollowing out and suburbanization of urban areas have become very serious problems, but an appropriate analytical tool for land use transition has not yet been proposed. This study analyzes the transitions in socio-demographic characteristics of small area units in the Fukuoka and Kitakyushu metropolitan areas by applying a topic model to geographical data. Plotting the topic shares on a map clarified the spatial distribution of topics, and the transitions between two cross-sections were analyzed along with other geographical characteristics. Our empirical study showed that the topic model could clearly and quantitatively describe the transitions between two cross-sections of these urban areas. The topic model revealed that the urban center of the Fukuoka metropolitan area was expanding, while the urban center of the Kitakyushu metropolitan area was shrinking. In suburban areas, both metropolitan areas had increasing low-density residential and commercial land use. In the Kitakyushu metropolitan area, this transition could seriously threaten the sustainability of land use, since the total population had significantly decreased.
Journal Article