Catalogue Search | MBRL

HIGH-DIMENSIONAL ASYMPTOTICS OF PREDICTION

by Wager, Stefan , Dobriban, Edgar in Aspect ratio , Constraining , Covariance matrix

2018

We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where p,n → ∞ and p/n → γ > 0, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength and the aspect ratio γ. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover an exact inverse relation between the limiting predictive risk and the limiting estimation risk in high-dimensional linear models. The analysis builds on recent advances in random matrix theory.

Journal Article

Share this book

Add to My Shelf

EEG Signal Analysis for Diagnosing Neurological Disorders Using Discrete Wavelet Transform and Intelligent Techniques

by Abdurraqeeb, Akram M. , Alturki, Fahd A. , AlSharabi, Khalil in Accuracy , Algorithms , artificial neural network

2020

Analysis of electroencephalogram (EEG) signals is essential because it is an efficient method to diagnose neurological brain disorders. In this work, a single system is developed to diagnose one or two neurological diseases at the same time (two-class mode and three-class mode). For this purpose, different EEG feature-extraction and classification techniques are investigated to aid in the accurate diagnosis of neurological brain disorders: epilepsy and autism spectrum disorder (ASD). Two different modes, single-channel and multi-channel, of EEG signals are analyzed for epilepsy and ASD. The independent components analysis (ICA) technique is used to remove the artifacts from EEG dataset. Then, the EEG dataset is segmented and filtered to remove noise and interference using an elliptic band-pass filter. Next, the EEG signal features are extracted from the filtered signal using a discrete wavelet transform (DWT) to decompose the filtered signal to its sub-bands delta, theta, alpha, beta and gamma. Subsequently, five statistical methods are used to extract features from the EEG sub-bands: the logarithmic band power (LBP), standard deviation, variance, kurtosis, and Shannon entropy (SE). Further, the features are fed into four different classifiers, linear discriminant analysis (LDA), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural networks (ANNs), to classify the features corresponding to their classes. The combination of DWT with SE and LBP produces the highest accuracy among all the classifiers. The overall classification accuracy approaches 99.9% using SVM and 97% using ANN for the three-class single-channel and multi-channel modes, respectively.

Journal Article

Share this book

Add to My Shelf

Landslide susceptibility assessment and mapping using state-of-the art machine learning techniques

by Pourghasemi, Hamid Reza , Santosh, M , Eskandari Saeedeh in Algorithms , Discriminant analysis , Generalized linear models

2021

Landslides pose a serious risk to human life and the natural environment. Here, we compare machine learning algorithms including the generalized linear model (GLM), mixture discriminant analysis (MDA), boosted regression tree (BRT), and functional discriminant analysis (FDA) to evaluate the landslide exposure regions in Fars Province, comprising an area of approximately 7% of Iran. Initially, an aggregate of 179 historical landslide occurrences was prepared and partitioned. Subsequently, ten landslide conditioning factors (LCFs) were generated. The partial least squares algorithm was utilized to assess the significance of the LCFs with the help of a training dataset which indicated that distance from road had the maximum significance in forecasting landslides, followed by altitude (Al), lithological units, and slope degree. Finally, the LSMs generated using BRT, GLM, MDA, and FDA were validated and compared using cut-off reliant and independent validation measures. The results of the validation metrics showed that GLM and BRT had an AUC of 0.908, while FDA and MDA had AUCs of 0.858 and 0.821, respectively. The results from our case study can be utilized to develop strategies and plans to minimize the loss of human lives and the natural environment.

Journal Article

Share this book

Add to My Shelf

Sparse Discriminant Analysis

by Hastie, Trevor , Ersbøll, Bjarne , Clemmensen, Line in Applied sciences , Centroids , Chemistry

2011

We consider the problem of performing interpretable classification in the high-dimensional setting, in which the number of features is very large and the number of observations is limited. This setting has been studied extensively in the chemometrics literature, and more recently has become commonplace in biological and medical applications. In this setting, a traditional approach involves performing feature selection before classification. We propose sparse discriminant analysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously. Sparse discriminant analysis is based on the optimal scoring interpretation of linear discriminant analysis, and can be extended to perform sparse discrimination via mixtures of Gaussians if boundaries between classes are nonlinear or if subgroups are present within each class. Our proposal also provides low-dimensional views of the discriminative directions.

Journal Article

Share this book

Add to My Shelf

A Feature Extraction Method Based on Differential Entropy and Linear Discriminant Analysis for Emotion Recognition

by Huang, Lan , Han, Na , Liang, Yong in Algorithms , differential entropy , Discriminant Analysis

2019

Feature extraction of electroencephalography (EEG) signals plays a significant role in the wearable computing field. Due to the practical applications of EEG emotion calculation, researchers often use edge calculation to reduce data transmission times, however, as EEG involves a large amount of data, determining how to effectively extract features and reduce the amount of calculation is still the focus of abundant research. Researchers have proposed many EEG feature extraction methods. However, these methods have problems such as high time complexity and insufficient precision. The main purpose of this paper is to introduce an innovative method for obtaining reliable distinguishing features from EEG signals. This feature extraction method combines differential entropy with Linear Discriminant Analysis (LDA) that can be applied in feature extraction of emotional EEG signals. We use a three-category sentiment EEG dataset to conduct experiments. The experimental results show that the proposed feature extraction method can significantly improve the performance of the EEG classification: Compared with the result of the original dataset, the average accuracy increases by 68%, which is 7% higher than the result obtained when only using differential entropy in feature extraction. The total execution time shows that the proposed method has a lower time complexity.

Journal Article

Share this book

Add to My Shelf

Landslide Susceptibility Mapping: Machine and Ensemble Learning Based on Remote Sensing Big Data

by Saeidi, Vahideh , Kalantar, Bahareh , Shabani, Farzin in algorithms , artificial intelligence , big data

2020

Predicting landslide occurrences can be difficult. However, failure to do so can be catastrophic, causing unwanted tragedies such as property damage, community displacement, and human casualties. Research into landslide susceptibility mapping (LSM) attempts to alleviate such catastrophes through the identification of landslide prone areas. Computational modelling techniques have been successful in related disaster scenarios, which motivate this work to explore such modelling for LSM. In this research, the potential of supervised machine learning and ensemble learning is investigated. Firstly, the Flexible Discriminant Analysis (FDA) supervised learning algorithm is trained for LSM and compared against other algorithms that have been widely used for the same purpose, namely Generalized Logistic Models (GLM), Boosted Regression Trees (BRT or GBM), and Random Forest (RF). Next, an ensemble model consisting of all four algorithms is implemented to examine possible performance improvements. The dataset used to train and test all the algorithms consists of a landslide inventory map of 227 landslide locations. From these sources, 13 conditioning factors are extracted to be used in the models. Experimental evaluations are made based on True Skill Statistic (TSS), the Receiver Operation characteristic (ROC) curve and kappa index. The results show that the best TSS (0.6986), ROC (0.904) and kappa (0.6915) were obtained by the ensemble model. FDA on its own seems effective at modelling landslide susceptibility from multiple data sources, with performance comparable to GLM. However, it slightly underperforms when compared to GBM (BRT) and RF. RF seems most capable compared to GBM, GLM, and FDA, when dealing with all conditioning factors.

Journal Article

Share this book

Add to My Shelf

Scheelite chemistry from skarn systems: implications for ore-forming processes and mineral exploration

by Miranda, Ana Carolina R , Beaudoin, Georges , Rottier, Bertrand in Ablation , Anomalies , Composition

2022

The trace element composition of scheelite from 19 well-documented reduced and oxidized skarn systems was measured by laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) to establish chemical criteria for the application of scheelite as an efficient indicator mineral for mineral exploration targeting. In both reduced and oxidized skarns systems, scheelite forms during prograde and retrograde stages. Prograde scheelite is texturally and chemically zoned, whereas retrograde scheelite is predominantly texturally homogeneous but may display chemical zonation. Five chondrite-normalized REE patterns, displaying both positive and negative Eu anomalies, are identified in the data: (i) steep and (ii) shallow negative slopes, (iii) concave, (iv) flat to slightly concave, and (v) convex shapes. The different REE patterns are related to variable fluid salinity and association with co-precipitated garnet or clinopyroxene. Results of partial least square-discriminate analysis (PLS-DA) show that scheelite composition varies according to skarn redox, intrusion composition, and metal association. These results support the fact that the trace element composition of scheelite is in part a function of igneous rock composition and oxygen fugacity, in addition to salinity, co-genetic minerals, and composition of the mineralizing fluids. Scheelite from reduced and oxidized skarns can be discriminated from those from orogenic and intrusion-related gold deposits due to their lower Sr and higher Mo, Ta, and Nb concentrations. Scheelite trace element composition investigated by PLS-DA is effective in discriminating different deposit types, supporting the use of scheelite as an indicator mineral for exploration targeting.

Journal Article

Share this book

Add to My Shelf

Characterization and authentication of olive, camellia and other vegetable oils by combination of chromatographic and chemometric techniques: role of fatty acids, tocopherols, sterols and squalene

by Shen Mingyue , Zhao, Shanshan , Huang, Mingquan in Algorithms , Chemometrics , Chromatography

2021

Fatty acids, tocopherols, sterols and squalene were analyzed by chromatographic-based techniques and were selected as variables to build a variety of classification models for the accurate characterization and authentication of olive, camellia oil and six other vegetable oils (soybean, corn, rapeseed, peanut, palm and sunflower). Different unsupervised and supervised chemometrics techniques, such as principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLS-DA), have been applied. In addition, the Kennard–Stone algorithm was used to select the training samples for the construction of supervised models. The discriminating power of different components was compared, and the results suggested that fatty acids are the most powerful in distinguishing vegetable oils, followed by tocopherols and sterols, and squalene contributed to the discrimination between olive and camellia oils despite their apparent similarities. This proposed method was straightforward and can be easily implemented to identify unknown oil samples.

Journal Article

Share this book

Add to My Shelf

CLASSIFICATION ACCURACY AS A PROXY FOR TWO-SAMPLE TESTING

by Ramdas, Aaditya , Wasserman, Larry , Kim, Ilmun in Accuracy , Approximation , Classifiers

2021

When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We prove two results that hold for all classifiers in any dimensions: if its true error remains ϵ-better than chance for some ϵ > 0 as d,n → ∞, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent. To get a finer understanding of the rates of consistency, we study a specialized setting of distinguishing Gaussians with mean-difference δ and common (known or unknown) covariance Σ, when d/n → c ∈ (0,∞). We study variants of Fisher’s linear discriminant analysis (LDA) such as “naive Bayes” in a nontrivial regime when ϵ → 0 (the Bayes classifier has true accuracy approaching 1/2), and contrast their power with corresponding variants of Hotelling’s test. Surprisingly, the expressions for their power match exactly in terms of n, d, δ, Σ, and the LDA approach is only worse by a constant factor, achieving an asymptotic relative efficiency (ARE) of 1/√π for balanced samples. We also extend our results to high-dimensional elliptical distributions with finite kurtosis. Other results of independent interest include minimax lower bounds, and the optimality of Hotelling’s test when d = o(n). Simulation results validate our theory, and we present practical takeaway messages along with natural open problems.

Journal Article

Share this book

Add to My Shelf

A Novel Pattern Recognition Method for Non-Destructive and Accurate Origin Identification of Food and Medicine Homologous Substances with Portable Near-Infrared Spectroscopy

by Liu, Yang , Zhang, Ziqin , Fan, Wei in Angelica - chemistry , boosting–partial least squares–discriminant analysis , Chromatography

2025

In this study, a novel pattern recognition method named boosting–partial least squares–discriminant analysis (Boosting-PLS-DA) was developed for the non-destructive and accurate origin identification of food and medicine homologous substances (FMHSs). Taking Gastrodia elata, Aurantii Fructus Immaturus, and Angelica dahurica as examples, spectra of FMHSs from different origins were obtained by portable near-infrared (NIR) spectroscopy without destroying the samples. The identification models were developed with Boosting-PLS-DA, compared with principal component analysis (PCA) and partial least squares–discriminant analysis (PLS-DA) models. The model performances were evaluated using the validation set and an external validation set obtained one month later. The results showed that the Boosting-PLS-DA method can obtain the best results. For the analysis of Aurantii Fructus Immaturus and Angelica dahurica, 100% accuracies of the validation sets and external validation sets were obtained using Boosting-PLS-DA models. For the analysis of Gastrodia elata, Boosting-PLS-DA models showed significant improvements in external validation set accuracies compared to PLS-DA, reducing the risk of overfitting. Boosting-PLS-DA method combines the high robustness of ensemble learning with the strong discriminative capability of discriminant analysis. The generalizability will be further validated with a sufficiently large external validation set and more types of FMHSs.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter