Catalogue Search | MBRL

Statistical topic models for multi-label document classification

by Steyvers, Mark , Chambers, America , Rubin, Timothy N. in Algorithms , Amplification , Artificial Intelligence

2012

Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.

Journal Article

Share this book

Add to My Shelf

Stability Trends in Mono-Metallic 3d Layered Double Hydroxides

by Doustkhah, Esmail , Assadi, Mohammad Hussein Naseef , Esmailpour, Ayoub in density functional theory , Geometry , green rust

2022

Layered double hydroxides (LDHs) constitute a unique group of 2D materials that can deliver exceptional catalytic, optical, and electronic performance. However, they usually suffer from low stability compared to their oxide counterparts. Using density functional calculations, we quantitatively demonstrate the crucial impact of the intercalants (i.e., water, lactate, and carbonate) on the stability of a series of common LDHs based on Mn, Fe, and Co. We found that intercalation with the singly charged lactate results in higher stability in all these LDH compounds, compared to neutral water and doubly charged carbonate. Furthermore, we show that the dispersion effect aids the stability of these LDH compounds. This investigation reveals that certain intercalants enhance LDH stability and alter the bandgap favourably.

Journal Article

Share this book

Add to My Shelf

Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example

by Qi, Yong , Gan, Jingxian in Animals , Datasets , Fruits

2021

This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well.

Journal Article

Share this book

Add to My Shelf

Linear discriminant analysis for the small sample size problem: an overview

by Paliwal, Kuldip K. , Sharma, Alok in Artificial Intelligence , Bioinformatics , Classification

2015

Dimensionality reduction is an important aspect in the pattern classification literature, and linear discriminant analysis (LDA) is one of the most widely studied dimensionality reduction technique. The application of variants of LDA technique for solving small sample size (SSS) problem can be found in many research areas e.g. face recognition, bioinformatics, text recognition, etc. The improvement of the performance of variants of LDA technique has great potential in various fields of research. In this paper, we present an overview of these methods. We covered the type, characteristics and taxonomy of these methods which can overcome SSS problem. We have also highlighted some important datasets and software/packages.

Journal Article

Share this book

Add to My Shelf

“What Can ChatGPT Do?” Analyzing Early Reactions to the Innovative AI Chatbot on Twitter

by Taecharungroj, Viriya in AI chatbot , Algorithms , artificial intelligence

2023

In this study, the author collected tweets about ChatGPT, an innovative AI chatbot, in the first month after its launch. A total of 233,914 English tweets were analyzed using the latent Dirichlet allocation (LDA) topic modeling algorithm to answer the question “what can ChatGPT do?”. The results revealed three general topics: news, technology, and reactions. The author also identified five functional domains: creative writing, essay writing, prompt writing, code writing, and answering questions. The analysis also found that ChatGPT has the potential to impact technologies and humans in both positive and negative ways. In conclusion, the author outlines four key issues that need to be addressed as a result of this AI advancement: the evolution of jobs, a new technological landscape, the quest for artificial general intelligence, and the progress-ethics conundrum.

Journal Article

Share this book

Add to My Shelf

Investigating Feature Selection Techniques to Enhance the Performance of EEG-Based Motor Imagery Tasks Classification

by Md. Humaun Kabir , Shabbir Mahmood , Abu Saleh Musa Miah in Algorithms , Artificial intelligence , automatic feature selection

2023

Analyzing electroencephalography (EEG) signals with machine learning approaches has become an attractive research domain for linking the brain to the outside world to establish communication in the name of the Brain-Computer Interface (BCI). Many researchers have been working on developing successful motor imagery (MI)-based BCI systems. However, they still face challenges in producing better performance with them because of the irrelevant features and high computational complexity. Selecting discriminative and relevant features to overcome the existing issues is crucial. In our proposed work, different feature selection algorithms have been studied to reduce the dimension of multiband feature space to improve MI task classification performance. In the procedure, we first decomposed the MI-based EEG signal into four sets of the narrowband signal. Then a common spatial pattern (CSP) approach was employed for each narrowband to extract and combine effective features, producing a high-dimensional feature vector. Three feature selection approaches, named correlation-based feature selection (CFS), minimum redundancy and maximum relevance (mRMR), and multi-subspace randomization and collaboration-based unsupervised feature selection (SRCFS), were used in this study to select the relevant and effective features for improving classification accuracy. Among them, the SRCFS feature selection approach demonstrated outstanding performance for MI classification compared to other schemes. The SRCFS is based on the multiple k-nearest neighbour graphs method for learning feature weight based on the Laplacian score and then discarding the irrelevant features based on the weight value, reducing the feature dimension. Finally, the selected features are fed into the support vector machines (SVM), linear discriminative analysis (LDA), and multi-layer perceptron (MLP) for classification. The proposed model is evaluated with two benchmark datasets, namely BCI Competition III dataset IVA and dataset IIIB, which are publicly available and mainly used to recognize the MI tasks. The LDA classifier with the SRCFS feature selection algorithm exhibits better performance. It proves the superiority of our proposed study compared to the other state-of-the-art BCI-based MI task classification systems.

Journal Article

Share this book

Add to My Shelf

LDA-CBOW-Based Mining Model for Risky Driving Behavior in Traffic Accidents

by Shi, Tuo , Wang, Na , Zhang, Lei in CBOW , traffic data mining , word2vec

2021

Traffic accident data of traffic management department is recorded in unstructured text form, which contains a large number of characteristic descriptions related to risky driving behavior. However, such data has short text length and abundant professional vocabulary. Many text mining techniques cannot effectively analyze such text data. This paper proposes an improved LDA algorithm based on CBOW—LDA-CBOW model for the study of traffic accident text data containing illegal behaviors. This model can better extract the topics of traffic accident data and filter the keywords under the corresponding topics, which provides a better way to study the dependence relationship between traffic data and illegal behaviors. Experiments show that compared to other models, this model can better extract related topics of traffic accident data with higher model efficiency and better robustness.

Journal Article

Share this book

Add to My Shelf

HGATT_(L)R: transforming review text classification with hypergraphs attention layer and logistic regression

by Md. Mehedi Hassan , Dongwann Kang , Elizabeth Jomy in Amazon review dataset , HyperGAT , Hypergraph attention layer

2024

Abstract Text classification plays a major role in research such as sentiment analysis, opinion mining, and customer feedback analysis. Text classification using hypergraph algorithms is effective in capturing the intricate relationships between words and phrases in documents. The method entails text preprocessing, keyword extraction, feature selection, text classification, and performance metric evaluation. Here, we proposed a Hypergraph Attention Layer with Logistic Regression (HGATT_(L)R) for text classification in the Amazon review data set. The essential keywords are extracted by utilizing the Latent Dirichlet Allocation (LDA) technique. To build a hypergraph attention layer, feature selection based on node-level and edge-level attention is assessed. The resultant features are passed as an input of Logistic regression for text classification. Through a comparison analysis of different text classifiers on the Amazon data set, the performance metrics are assessed. Text classification using hypergraph Attention Network has been shown to achieve 88% accuracy which is better compared to other state-of-the-art algorithms. The proposed model is scalable and may be easily enhanced with more training data. The solution highlights the utility of hypergraph approaches for text classification as well as their applicability to real-world datasets with complicated interactions between text parts. This type of analysis will empower the business people will improve the quality of the product.

Journal Article

Share this book

Add to My Shelf

A systematic review of the use of topic models for short text social media analysis

in Applied research , Automation , Digital media

2023

Recently, research on short text topic models has addressed the challenges of social media datasets. These models are typically evaluated using automated measures. However, recent work suggests that these evaluation measures do not inform whether the topics produced can yield meaningful insights for those examining social media data. Efforts to address this issue, including gauging the alignment between automated and human evaluation tasks, are hampered by a lack of knowledge about how researchers use topic models. Further problems could arise if researchers do not construct topic models optimally or use them in a way that exceeds the models’ limitations. These scenarios threaten the validity of topic model development and the insights produced by researchers employing topic modelling as a methodology. However, there is currently a lack of information about how and why topic models are used in applied research. As such, we performed a systematic literature review of 189 articles where topic modelling was used for social media analysis to understand how and why topic models are used for social media analysis. Our results suggest that the development of topic models is not aligned with the needs of those who use them for social media analysis. We have found that researchers use topic models sub-optimally. There is a lack of methodological support for researchers to build and interpret topics. We offer a set of recommendations for topic model researchers to address these problems and bridge the gap between development and applied research on short text topic models.

Journal Article

Share this book

Add to My Shelf

Selecting critical features for data classification based on machine learning methods

by Chen, Rung-Ching , Caraka, Rezzy Eko , Huang, Su-Wen in Accuracy , Algorithms , Big Data

2020

Feature selection becomes prominent, especially in the data sets with many variables and features. It will eliminate unimportant variables and improve the accuracy as well as the performance of classification. Random Forest has emerged as a quite useful algorithm that can handle the feature selection issue even with a higher number of variables. In this paper, we use three popular datasets with a higher number of variables (Bank Marketing, Car Evaluation Database, Human Activity Recognition Using Smartphones) to conduct the experiment. There are four main reasons why feature selection is essential. First, to simplify the model by reducing the number of parameters, next to decrease the training time, to reduce overfilling by enhancing generalization, and to avoid the curse of dimensionality. Besides, we evaluate and compare each accuracy and performance of the classification model, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). The highest accuracy of the model is the best classifier. Practically, this paper adopts Random Forest to select the important feature in classification. Our experiments clearly show the comparative study of the RF algorithm from different perspectives. Furthermore, we compare the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination (RFE) to get the best percentage accuracy and kappa. Experimental results demonstrate that Random Forest achieves a better performance in all experiment groups.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter