Catalogue Search | MBRL

Statistical methods for recommender systems

by Agarwal, Deepak K., 1973- author , Chung-Chen, Bee, author in Recommender systems (Information filtering) Statistical methods. , Expert systems (Computer science) Statistical methods.

Book

Share this book

Add to My Shelf

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

by Triguero, Isaac , García, Salvador , Herrera, Francisco in Algorithms , Analysis , Artificial intelligence

2015

Semi-supervised classification methods are suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. This problem has been addressed by several approaches with different assumptions about the characteristics of the input data. Among them, self-labeled techniques follow an iterative procedure, aiming to obtain an enlarged labeled data set, in which they accept that their own predictions tend to be correct. In this paper, we provide a survey of self-labeled methods for semi-supervised classification. From a theoretical point of view, we propose a taxonomy based on the main characteristics presented in them. Empirically, we conduct an exhaustive study that involves a large number of data sets, with different ratios of labeled data, aiming to measure their performance in terms of transductive and inductive classification capabilities. The results are contrasted with nonparametric statistical tests. Note is then taken of which self-labeled models are the best-performing ones. Moreover, a semi-supervised learning module has been developed for the Knowledge Extraction based on Evolutionary Learning software, integrating analyzed methods and data sets.

Journal Article

Share this book

Add to My Shelf

Deterministic solution of algebraic equations in sentiment analysis

in Data mining , Machine learning , Mathematical analysis

2023

Text mining methods usually use statistical information to solve text and language-independent procedures. Text mining methods such as polarity detection based on stochastic patterns and rules need many samples to train. On the other hand, deterministic and non-probabilistic methods are easy to solve and faster than other methods but are not efficient in NLP data. In this article, a fast and efficient deterministic method for solving the problems is proposed. In the proposed method firstly we transform text and labels into a set of equations. In the second step, a mathematical solution of ill-posed equations known as Tikhonov regularization was used as a deterministic and non-probabilistic way including additional assumptions, such as smoothness of solution to assign a weight that can reflect the semantic information of each sentimental word. We confirmed the efficiency of the proposed method in the SemEval-2013 competition, ESWC Database and Taboada database as three different cases. We observed improvement of our method over negative polarity due to our proposed mathematical step. Moreover, we demonstrated the effectiveness of our proposed method over the most common and traditional machine learning, stochastic and fuzzy methods.

Journal Article

Share this book

Add to My Shelf

Early-production stage prediction of movies success using K-fold hybrid deep ensemble learning model

by Long, Hoang Viet , Kumar, Raghvendra , Shafi, Pathan Mohd in Accuracy , Decision analysis , Deep learning

2023

The Indian movie industry is the largest movie industry based on the number of movies produced per year. It is also the most diverse movie industry.It has been examined in a recent study that only a few of the movies achieved success. Revenue uncertainties have created immense pressure on the motion picture industry. Researchers and film producers continually feel a necessity to have some expert systems that predict the movie’s success probability preceding its production with reasonable accuracy. The diversity of the Indian movie industry makes the problem more difficult. Only a few researchers worked on Indian films, but most of them are targeted prerelease forecasting or have low prediction accuracy. This study focused on Indian movies and concentrated on the upcoming film’s success as soon as a quotient (director, cast) signed an agreement. This proposed forecasting has been considered as the earliest forecasting. Our study retrieved and used the last 30 years of Indian movie information covering all India’s regional movies.We had judicially chosen some of the movie’s intrinsic features and introduced a set of novel derived features to increase the forecasting accuracy. We had proposed a K-fold Hybrid Deep Ensemble learning Model (KHDEM), which includes Deep Learning models (DLM) and ensemble learning models. Finally, We made the prediction using a Logistic Regression (LR) classifier. We had implemented a binary classification model and achieved 96% accuracy, which outperforms all the benchmark models. The introduction of our derived features had improved the accuracy by 17.62%.This study highlights the potential of predictive and prescriptive data analytics in information systems to support industry decisions.

Journal Article

Share this book

Add to My Shelf

Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble

by Hemanth, D. Jude , Yenkikar, Anuradha , Babu, C. Narendra in Accuracy , Algorithms , Artificial Intelligence

2022

The exponential rise in social media via microblogging sites like Twitter has sparked curiosity in sentiment analysis that exploits user feedback towards a targeted product or service. Considering its significance in business intelligence and decision-making, numerous efforts have been made in this area. However, lack of dictionaries, unannotated data, large-scale unstructured data, and low accuracies have plagued these approaches. Also, sentiment classification through classifier ensemble has been underexplored in literature. In this article, we propose a Semantic Relational Machine Learning (SRML) model that automatically classifies the sentiment of tweets by using classifier ensemble and optimal features. The model employs the Cascaded Feature Selection (CFS) strategy, a novel statistical assessment approach based on Wilcoxon rank sum test, univariate logistic regression assisted significant predictor test and cross-correlation test. It further uses the efficacy of word2vec-based continuous bag-of-words and n-gram feature extraction in conjunction with SentiWordNet for finding optimal features for classification. We experiment on six public Twitter sentiment datasets, the STS-Gold dataset, the Obama-McCain Debate (OMD) dataset, the healthcare reform (HCR) dataset and the SemEval2017 Task 4A, 4B and 4C on a heterogeneous classifier ensemble comprising fourteen individual classifiers from different paradigms. Results from the experimental study indicate that CFS supports in attaining a higher classification accuracy with up to 50% lesser features compared to count vectorizer approach. In Intra-model performance assessment, the Artificial Neural Network-Gradient Descent (ANN-GD) classifier performs comparatively better than other individual classifiers, but the Best Trained Ensemble (BTE) strategy outperforms on all metrics. In inter-model performance assessment with existing state-of-the-art systems, the proposed model achieved higher accuracy and outperforms more accomplished models employing quantum-inspired sentiment representation (QSR), transformer-based methods like BERT, BERTweet, RoBERTa and ensemble techniques. The research thus provides critical insights into implementing similar strategy into building more generic and robust expert system for sentiment analysis that can be leveraged across industries.

Journal Article

Share this book

Add to My Shelf

Financial credit risk assessment: a recent review

in Accounting , Algorithms , Banking industry

2016

The assessment of financial credit risk is an important and challenging research topic in the area of accounting and finance. Numerous efforts have been devoted into this field since the first attempt last century. Today the study of financial credit risk assessment attracts increasing attentions in the face of one of the most severe financial crisis ever observed in the world. The accurate assessment of financial credit risk and prediction of business failure play an essential role both on economics and society. For this reason, more and more methods and algorithms were proposed in the past years. From this point, it is of crucial importance to review the nowadays methods applied to financial credit risk assessment. In this paper, we summarize the traditional statistical models and state-of-the-art intelligent methods for financial distress forecasting, with the emphasis on the most recent achievements as the promising trend in this area.

Journal Article

Share this book

Add to My Shelf

Uncertainty measurement for a gene space based on class-consistent technology: an application in gene selection

by Li, Zhaowen , Song, Yan , Wang, Pei in Algorithms , Artificial intelligence , Clustering

2023

With the development of data mining, artificial intelligence, neural network, expert system and machine learning, information system (i-system) becomes more and more important. If the objects, attributes and information values in an i-system are replaced by cells, genes and gene expression values, respectively, then the i-system is said to be a gene space. Because gene expression data is characterized by small samples, high dimension and noise, there is considerable uncertainty in a gene space. Traditional machine learning and statistical methods are often powerless to a gene space. Granular computing (GrC) can effectively deal with various uncertainties. This paper studies the uncertainty measurement of gene space based on the class-consistent technology and discusses its application in gene selection from the perspective of GrC. A class-consistent relation between cells in a gene space is first established by the gene expression values of cells on the basis of class-consistent technology. Then, the information granules (i-granules) are obtained from a gene space by using the class-consistent relation. Next, two metrics (information granularity and information entropy) to measure the uncertainty of gene space are defined and their properties are also investigated. The results of numerical experiments and statistical tests verify their effectiveness. Furthermore, as their application to gene space, two gene selection algorithms are proposed. Finally, the clustering experiments and statistical tests on 16 gene spaces show that the designed gene selection algorithms outperform some state-of-the-art feature selection algorithms in terms of three clustering performance indicators.

Journal Article

Share this book

Add to My Shelf

A method for explaining Bayesian networks for legal evidence with scenarios

by Vlek, Charlotte S. , Prakken, Henry , Verheij, Bart in Artificial Intelligence , Bayesian analysis , Case studies

2016

In a criminal trial, a judge or jury needs to reason about what happened based on the available evidence, often including statistical evidence. While a probabilistic approach is suitable for analysing the statistical evidence, a judge or jury may be more inclined to use a narrative or argumentative approach when considering the case as a whole. In this paper we propose a combination of two approaches, combining Bayesian networks with scenarios. Whereas a Bayesian network is a popular tool for analysing parts of a case, constructing and understanding a network for an entire case is not straightforward. We propose an explanation method for understanding a Bayesian network in terms of scenarios. This method builds on a previously proposed construction method, which we slightly adapt with the use of scenario schemes for the purpose of explaining. The resulting structure is explained in terms of scenarios, scenario quality and evidential support. A probabilistic interpretation of scenario quality is provided using the concept of scenario schemes. Finally, the method is evaluated by means of a case study.

Journal Article

Share this book

Add to My Shelf

Application of Natural Language Processing and Genetic Algorithm to Fine-Tune Hyperparameters of Classifiers for Economic Activities Analysis

by Nelyub, Vladimir , Masich, Igor , Malashin, Ivan in Accuracy , Algorithms , Artificial intelligence

2024

This study proposes a method for classifying economic activity descriptors to match Nomenclature of Economic Activities (NACE) codes, employing a blend of machine learning techniques and expert evaluation. By leveraging natural language processing (NLP) methods to vectorize activity descriptors and utilizing genetic algorithm (GA) optimization to fine-tune hyperparameters in multi-class classifiers like Naive Bayes, Decision Trees, Random Forests, and Multilayer Perceptrons, our aim is to boost the accuracy and reliability of an economic classification system. This system faces challenges due to the absence of precise target labels in the dataset. Hence, it is essential to initially check the accuracy of utilized methods based on expert evaluations using a small dataset before generalizing to a larger one.

Journal Article

Share this book

Add to My Shelf

Feature Selection Method Based on Class Discriminative Degree for Intelligent Medical Diagnosis

by Liang, Zhiyao , Wang, Guoyan , Fang, Shengqun in Accuracy , Chi-square test , Classification

2018

By using efficient and timely medical diagnostic decision making, clinicians can positively impact the quality and cost of medical care. However, the high similarity of clinical manifestations between diseases and the limitation of clinicians’ knowledge both bring much difficulty to decision making in diagnosis. Therefore, building a decision support system that can assist medical staff in diagnosing and treating diseases has lately received growing attentions in the medical domain. In this paper, we employ a multi-label classification framework to classify the Chinese electronic medical records to establish corresponding relation between the medical records and disease categories, and compare this method with the traditional medical expert system to verify the performance. To select the best subset of patient features, we propose a feature selection method based on the composition and distribution of symptoms in electronic medical records and compare it with the traditional feature selection methods such as chi-square test. We evaluate the feature selection methods and diagnostic models from two aspects, false negative rate (FNR) and accuracy. Extensive experiments have conducted on a real-world Chinese electronic medical record database. The evaluation results demonstrate that our proposed feature selection method can improve the accuracy and reduce the FNR compare to the traditional feature selection methods, and the multi-label classification framework have better accuracy and lower FNR than the traditional expert system.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter