Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
48,097
result(s) for
"Discriminant Analysis"
Sort by:
HIGH-DIMENSIONAL ASYMPTOTICS OF PREDICTION
2018
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where p,n → ∞ and p/n → γ > 0, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength and the aspect ratio γ. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover an exact inverse relation between the limiting predictive risk and the limiting estimation risk in high-dimensional linear models. The analysis builds on recent advances in random matrix theory.
Journal Article
Landslide susceptibility assessment and mapping using state-of-the art machine learning techniques
by
Pourghasemi, Hamid Reza
,
Santosh, M
,
Eskandari Saeedeh
in
Algorithms
,
Discriminant analysis
,
Generalized linear models
2021
Landslides pose a serious risk to human life and the natural environment. Here, we compare machine learning algorithms including the generalized linear model (GLM), mixture discriminant analysis (MDA), boosted regression tree (BRT), and functional discriminant analysis (FDA) to evaluate the landslide exposure regions in Fars Province, comprising an area of approximately 7% of Iran. Initially, an aggregate of 179 historical landslide occurrences was prepared and partitioned. Subsequently, ten landslide conditioning factors (LCFs) were generated. The partial least squares algorithm was utilized to assess the significance of the LCFs with the help of a training dataset which indicated that distance from road had the maximum significance in forecasting landslides, followed by altitude (Al), lithological units, and slope degree. Finally, the LSMs generated using BRT, GLM, MDA, and FDA were validated and compared using cut-off reliant and independent validation measures. The results of the validation metrics showed that GLM and BRT had an AUC of 0.908, while FDA and MDA had AUCs of 0.858 and 0.821, respectively. The results from our case study can be utilized to develop strategies and plans to minimize the loss of human lives and the natural environment.
Journal Article
EEG Signal Analysis for Diagnosing Neurological Disorders Using Discrete Wavelet Transform and Intelligent Techniques
by
Abdurraqeeb, Akram M.
,
Alturki, Fahd A.
,
AlSharabi, Khalil
in
Accuracy
,
Algorithms
,
artificial neural network
2020
Analysis of electroencephalogram (EEG) signals is essential because it is an efficient method to diagnose neurological brain disorders. In this work, a single system is developed to diagnose one or two neurological diseases at the same time (two-class mode and three-class mode). For this purpose, different EEG feature-extraction and classification techniques are investigated to aid in the accurate diagnosis of neurological brain disorders: epilepsy and autism spectrum disorder (ASD). Two different modes, single-channel and multi-channel, of EEG signals are analyzed for epilepsy and ASD. The independent components analysis (ICA) technique is used to remove the artifacts from EEG dataset. Then, the EEG dataset is segmented and filtered to remove noise and interference using an elliptic band-pass filter. Next, the EEG signal features are extracted from the filtered signal using a discrete wavelet transform (DWT) to decompose the filtered signal to its sub-bands delta, theta, alpha, beta and gamma. Subsequently, five statistical methods are used to extract features from the EEG sub-bands: the logarithmic band power (LBP), standard deviation, variance, kurtosis, and Shannon entropy (SE). Further, the features are fed into four different classifiers, linear discriminant analysis (LDA), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural networks (ANNs), to classify the features corresponding to their classes. The combination of DWT with SE and LBP produces the highest accuracy among all the classifiers. The overall classification accuracy approaches 99.9% using SVM and 97% using ANN for the three-class single-channel and multi-channel modes, respectively.
Journal Article
Sparse Discriminant Analysis
2011
We consider the problem of performing interpretable classification in the high-dimensional setting, in which the number of features is very large and the number of observations is limited. This setting has been studied extensively in the chemometrics literature, and more recently has become commonplace in biological and medical applications. In this setting, a traditional approach involves performing feature selection before classification. We propose sparse discriminant analysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously. Sparse discriminant analysis is based on the optimal scoring interpretation of linear discriminant analysis, and can be extended to perform sparse discrimination via mixtures of Gaussians if boundaries between classes are nonlinear or if subgroups are present within each class. Our proposal also provides low-dimensional views of the discriminative directions.
Journal Article
CLASSIFICATION ACCURACY AS A PROXY FOR TWO-SAMPLE TESTING
2021
When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We prove two results that hold for all classifiers in any dimensions: if its true error remains ϵ-better than chance for some ϵ > 0 as d,n → ∞, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent. To get a finer understanding of the rates of consistency, we study a specialized setting of distinguishing Gaussians with mean-difference δ and common (known or unknown) covariance Σ, when d/n → c ∈ (0,∞). We study variants of Fisher’s linear discriminant analysis (LDA) such as “naive Bayes” in a nontrivial regime when ϵ → 0 (the Bayes classifier has true accuracy approaching 1/2), and contrast their power with corresponding variants of Hotelling’s test. Surprisingly, the expressions for their power match exactly in terms of n, d, δ, Σ, and the LDA approach is only worse by a constant factor, achieving an asymptotic relative efficiency (ARE) of 1/√π for balanced samples. We also extend our results to high-dimensional elliptical distributions with finite kurtosis. Other results of independent interest include minimax lower bounds, and the optimality of Hotelling’s test when d = o(n). Simulation results validate our theory, and we present practical takeaway messages along with natural open problems.
Journal Article
A Feature Extraction Method Based on Differential Entropy and Linear Discriminant Analysis for Emotion Recognition
2019
Feature extraction of electroencephalography (EEG) signals plays a significant role in the wearable computing field. Due to the practical applications of EEG emotion calculation, researchers often use edge calculation to reduce data transmission times, however, as EEG involves a large amount of data, determining how to effectively extract features and reduce the amount of calculation is still the focus of abundant research. Researchers have proposed many EEG feature extraction methods. However, these methods have problems such as high time complexity and insufficient precision. The main purpose of this paper is to introduce an innovative method for obtaining reliable distinguishing features from EEG signals. This feature extraction method combines differential entropy with Linear Discriminant Analysis (LDA) that can be applied in feature extraction of emotional EEG signals. We use a three-category sentiment EEG dataset to conduct experiments. The experimental results show that the proposed feature extraction method can significantly improve the performance of the EEG classification: Compared with the result of the original dataset, the average accuracy increases by 68%, which is 7% higher than the result obtained when only using differential entropy in feature extraction. The total execution time shows that the proposed method has a lower time complexity.
Journal Article
Characterization and authentication of olive, camellia and other vegetable oils by combination of chromatographic and chemometric techniques: role of fatty acids, tocopherols, sterols and squalene
2021
Fatty acids, tocopherols, sterols and squalene were analyzed by chromatographic-based techniques and were selected as variables to build a variety of classification models for the accurate characterization and authentication of olive, camellia oil and six other vegetable oils (soybean, corn, rapeseed, peanut, palm and sunflower). Different unsupervised and supervised chemometrics techniques, such as principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLS-DA), have been applied. In addition, the Kennard–Stone algorithm was used to select the training samples for the construction of supervised models. The discriminating power of different components was compared, and the results suggested that fatty acids are the most powerful in distinguishing vegetable oils, followed by tocopherols and sterols, and squalene contributed to the discrimination between olive and camellia oils despite their apparent similarities. This proposed method was straightforward and can be easily implemented to identify unknown oil samples.
Journal Article
Wasserstein discriminant analysis
by
Courty, Nicolas
,
Flamary, Rémi
,
Cuturi, Marco
in
Algorithms
,
Discriminant analysis
,
Dispersion
2018
Wasserstein discriminant analysis (WDA) is a new supervised linear dimensionality reduction algorithm. Following the blueprint of classical Fisher Discriminant Analysis, WDA selects the projection matrix that maximizes the ratio of the dispersion of projected points pertaining to different classes and the dispersion of projected points belonging to a same class. To quantify dispersion, WDA uses regularized Wasserstein distances. Thanks to the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples’ scale) interactions between classes. In addition, we show that WDA leverages a mechanism that induces neighborhood preservation. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; the optimization problem of WDA can be tackled using automatic differentiation of Sinkhorn’s fixed-point iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.
Journal Article
A Novel Pattern Recognition Method for Non-Destructive and Accurate Origin Identification of Food and Medicine Homologous Substances with Portable Near-Infrared Spectroscopy
by
Liu, Yang
,
Zhang, Ziqin
,
Fan, Wei
in
Angelica - chemistry
,
boosting–partial least squares–discriminant analysis
,
Chromatography
2025
In this study, a novel pattern recognition method named boosting–partial least squares–discriminant analysis (Boosting-PLS-DA) was developed for the non-destructive and accurate origin identification of food and medicine homologous substances (FMHSs). Taking Gastrodia elata, Aurantii Fructus Immaturus, and Angelica dahurica as examples, spectra of FMHSs from different origins were obtained by portable near-infrared (NIR) spectroscopy without destroying the samples. The identification models were developed with Boosting-PLS-DA, compared with principal component analysis (PCA) and partial least squares–discriminant analysis (PLS-DA) models. The model performances were evaluated using the validation set and an external validation set obtained one month later. The results showed that the Boosting-PLS-DA method can obtain the best results. For the analysis of Aurantii Fructus Immaturus and Angelica dahurica, 100% accuracies of the validation sets and external validation sets were obtained using Boosting-PLS-DA models. For the analysis of Gastrodia elata, Boosting-PLS-DA models showed significant improvements in external validation set accuracies compared to PLS-DA, reducing the risk of overfitting. Boosting-PLS-DA method combines the high robustness of ensemble learning with the strong discriminative capability of discriminant analysis. The generalizability will be further validated with a sufficiently large external validation set and more types of FMHSs.
Journal Article
Machine Learning Based Predictive Modeling of Debris Flow Probability Following Wildfire in the Intermountain Western United States
by
Kern, Ashley N.
,
Addison, Priscilla
,
Oommen, Thomas
in
Artificial intelligence
,
Aversion learning
,
Basins
2017
It has been recognized that wildfire, followed by large precipitation events, triggers both flooding and debris flows in mountainous regions. The ability to predict and mitigate these hazards is crucial in protecting public safety and infrastructure. A need for advanced modeling techniques was highlighted by re-evaluating existing prediction models from the literature. Data from 15 individual burn basins in the intermountain western United States, which contained 388 instances and 26 variables, were obtained from the United States Geological Survey (USGS). After randomly selecting a subset of the data to serve as a validation set, advanced predictive modeling techniques, using machine learning, were implemented using the remaining training data. Tenfold cross-validation was applied to the training data to ensure nearly unbiased error estimation and also to avoid model over-fitting. Linear, nonlinear, and rule-based predictive models including naïve Bayes, mixture discriminant analysis, classification trees, and logistic regression models were developed and tested on the validation dataset. Results for the new non-linear approaches were nearly twice as successful as those for the linear models, previously published in debris flow prediction literature. The new prediction models advance the current state-of-the-art of debris flow prediction and improve the ability to accurately predict debris flow events in wildfire-prone intermountain western United States.
Journal Article