Catalogue Search | MBRL

Tutorial on PCA and approximate PCA and approximate kernel PCA

in Artificial intelligence , Data analysis , Data science

2023

Principal Component Analysis (PCA) is one of the most widely used data analysis methods in machine learning and AI. This manuscript focuses on the mathematical foundation of classical PCA and its application to a small-sample-size scenario and a large dataset in a high-dimensional space scenario. In particular, we discuss a simple method that can be used to approximate PCA in the latter case. This method can also help approximate kernel PCA or kernel PCA (KPCA) for a large-scale dataset. We hope this manuscript will give readers a solid foundation on PCA, approximate PCA, and approximate KPCA.

Journal Article

Share this book

Add to My Shelf

Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images

by Sala, Evis , Wibmer, Andreas , Vargas, Herbert Alberto in Biological Sciences , Classification , Comparative analysis

2015

Noninvasive, radiological image-based detection and stratification of Gleason patterns can impact clinical outcomes, treatment selection, and the determination of disease status at diagnosis without subjecting patients to surgical biopsies. We present machine learning-based automatic classification of prostate cancer aggressiveness by combining apparent diffusion coefficient (ADC) and T2-weighted (T2-w) MRI-based texture features. Our approach achieved reasonably accurate classification of Gleason scores (GS) 6(3 + 3) vs. ≥7 and 7(3 + 4) vs. 7(4 + 3) despite the presence of highly unbalanced samples by using two different sample augmentation techniques followed by feature selection-based classification. Our method distinguished between GS 6(3 + 3) and ≥7 cancers with 93% accuracy for cancers occurring in both peripheral (PZ) and transition (TZ) zones and 92% for cancers occurring in the PZ alone. Our approach distinguished the GS 7(3 + 4) from GS 7(4 + 3) with 92% accuracy for cancers occurring in both the PZ and TZ and with 93% for cancers occurring in the PZ alone. In comparison, a classifier using only the ADC mean achieved a top accuracy of 58% for distinguishing GS 6(3 + 3) vs. GS ≥7 for cancers occurring in PZ and TZ and 63% for cancers occurring in PZ alone. The same classifier achieved an accuracy of 59% for distinguishing GS 7(3 + 4) from GS 7(4 + 3) occurring in the PZ and TZ and 60% for cancers occurring in PZ alone. Separate analysis of the cancers occurring in TZ alone was not performed owing to the limited number of samples. Our results suggest that texture features derived from ADC and T2-w MRI together with sample augmentation can help to obtain reasonably accurate classification of Gleason patterns.

Journal Article

Share this book

Add to My Shelf

EEG-Based Alzheimer’s Disease Recognition Using Robust-PCA and LSTM Recurrent Neural Network

by Falaschetti, Laura , Biagetti, Giorgio , Luzzi, Simona in Accuracy , Advertising executives , Algorithms

2022

The use of electroencephalography (EEG) has recently grown as a means to diagnose neurodegenerative pathologies such as Alzheimer’s disease (AD). AD recognition can benefit from machine learning methods that, compared with traditional manual diagnosis methods, have higher reliability and improved recognition accuracy, being able to manage large amounts of data. Nevertheless, machine learning methods may exhibit lower accuracies when faced with incomplete, corrupted, or otherwise missing data, so it is important do develop robust pre-processing techniques do deal with incomplete data. The aim of this paper is to develop an automatic classification method that can still work well with EEG data affected by artifacts, as can arise during the collection with, e.g., a wireless system that can lose packets. We show that a recurrent neural network (RNN) can operate successfully even in the case of significantly corrupted data, when it is pre-filtered by the robust principal component analysis (RPCA) algorithm. RPCA was selected because of its stated ability to remove outliers from the signal. To demonstrate this idea, we first develop an RNN which operates on EEG data, properly processed through traditional PCA; then, we use corrupted data as input and process them with RPCA to filter outlier components, showing that even with data corruption causing up to 20% erasures, the RPCA was able to increase the detection accuracy by about 5% with respect to the baseline PCA.

Journal Article

Share this book

Add to My Shelf

Assessing the effectiveness of spatial PCA on SVM-based decoding of EEG data

by Luck, Steven J. , Zhang, Guanghui , Bahle, Brett in Adult , Brain - physiology , Dimensionality reduction

2024

Principal component analysis (PCA) has been widely employed for dimensionality reduction prior to multivariate pattern classification (decoding) in EEG research. The goal of the present study was to provide an evaluation of the effectiveness of PCA on decoding accuracy (using support vector machines) across a broad range of experimental paradigms. We evaluated several different PCA variations, including group-based and subject-based component decomposition and the application of Varimax rotation or no rotation. We also varied the numbers of PCs that were retained for the decoding analysis. We evaluated the resulting decoding accuracy for seven common event-related potential components (N170, mismatch negativity, N2pc, P3b, N400, lateralized readiness potential, and error-related negativity). We also examined more challenging decoding tasks, including decoding of face identity, facial expression, stimulus location, and stimulus orientation. The datasets also varied in the number and density of electrode sites. Our findings indicated that none of the PCA approaches consistently improved decoding performance related to no PCA, and the application of PCA frequently reduced decoding performance. Researchers should therefore be cautious about using PCA prior to decoding EEG data from similar experimental paradigms, populations, and recording setups. •We evaluated the impact of PCA on EEG/ERP SVM-based decoding performance.•We examined a broad set of datasets, spanning easy and difficult decoding tasks collected with different electrode densities.•We varied the numbers of principal components and tried several PCA approaches.•We found that PCA-based dimensionality reduction did not improve SVM-based decoding performance.

Journal Article

Share this book

Add to My Shelf

The Spatially Varying Components of Vulnerability to Energy Poverty

by Lindley, Sarah , Robinson, Caitlin , Bouzarovski, Stefan in análisis espacial , PCA geográficamente ponderado , pobreza energética

2019

A household's vulnerability to energy poverty is socially and spatially variable. Efforts to measure energy poverty, however, have focused on narrow, expenditure-based metrics or area-based targeting. These metrics are not spatial per se, because the relative importance of drivers does not vary between neighborhoods to reflect localized challenges. Despite recent advancements in geographically weighted methodologies that have the potential to yield important information about the sociospatial distribution of vulnerability to energy poverty, the phenomenon has not been approached from this perspective. For a case study of England, global principal component analysis (PCA) and local geographically weighted PCA (GWPCA) are applied to a suite of neighborhood-scale vulnerability indicators. The explicit spatiality of this methodological approach addresses a common criticism of vulnerability assessments. The global PCA reaffirms the importance of well-established vulnerabilities, including older age, disability, and energy efficiency. It also demonstrates striking new evidence of vulnerabilities among precarious and transient households that are less well understood and have become starker during austerity. In contrast, rather than providing a single estimate of propensity to energy poverty for neighborhoods based on a national understanding of what drives the condition, the GWPCA identifies a diverse array of vulnerability factors of greatest importance in different locales. These local results destabilize the geographical configurations of an urban-rural and north-south divide that typify understandings of deprivation in this context. The geographically weighted approach therefore draws attention to vulnerabilities often hidden in policymaking, allowing for reflection on the applicability of spatially constituted methodologies to wider social vulnerability assessments. Key Words: energy poverty, geographically weighted PCA, GIS, spatial analysis, vulnerability.

Journal Article

Share this book

Add to My Shelf

Model-based Intelligent Recognition for Aluminum Plate Seam Defects

by Liu, Hongming in Nondestructive testing

2021

In order to study the application of nonlinear ultrasonic in the quantitative identification of defective aluminum plate, different depth cracks are machined on the aluminum alloy plate with a thickness of 10 mm by wire cutting to simulate the defects in the plate. The normal and defective aluminum plates are selected to establish the experimental model, and the continuous wavelet transform (CWT) is used to extract the characteristic parameters of the aluminum plate nonlinear ultrasonic signal. The dimensions of the data are reduced by principal component analysis (PCA), and the principal component with the top three contribution rate are selected as the characteristic value. Finally, the support vector machine (SVM) algorithm is used to analyze the aluminum alloy plate state and classify the defect signal. The experimental results show that the feasibility of nonlinear ultrasonic signal recognition of aluminum plate defects is verified by combining principal component analysis and support vector machine model.

Journal Article

Share this book

Add to My Shelf

Molecular Mechanisms Related to Hormone Inhibition Resistance in Prostate Cancer

by Di Nunno, Vincenzo , Scarpelli, Marina , Montironi, Rodolfo in Androgens , Antineoplastic Agents, Hormonal - therapeutic use , AR splice variants

2019

Management of metastatic or advanced prostate cancer has acquired several therapeutic approaches that have drastically changed the course of the disease. In particular due to the high sensitivity of prostate cancer cells to hormone depletion, several agents able to inhibit hormone production or binding to nuclear receptor have been evaluated and adopted in clinical practice. However, despite several hormonal treatments being available nowadays for the management of advanced or metastatic prostate cancer, the natural history of the disease leads inexorably to the development of resistance to hormone inhibition. Findings regarding the mechanisms that drive this process are of particular and increasing interest as these are potentially related to the identification of new targetable pathways and to the development of new drugs able to improve our patients’ clinical outcomes.

Journal Article

Share this book

Add to My Shelf

DISTRIBUTED ESTIMATION OF PRINCIPAL EIGENSPACES

by Wang, Kaizheng , Zhu, Ziwei , Wang, Dong in Algorithms , Computer simulation , Covariance matrix

2019

Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top K eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top K eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased, and hence the distributed PCA is “unbiased.” We derive the rate of convergence for distributed PCA estimators, which depends explicitly on the effective rank of covariance, eigengap, and the number of machines. We show that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data. The theoretical results are verified by an extensive simulation study. We also extend our analysis to the heterogeneous case where the population covariance matrices are different across local machines but share similar top eigenstructures.

Journal Article

Share this book

Add to My Shelf

Denoising of diffusion MRI using random matrix theory

by Ades-aron, Benjamin , Veraart, Jelle , Sijbers, Jan in Accuracy , Data Interpretation, Statistical , Diffusion

2016

We introduce and evaluate a post-processing technique for fast denoising of diffusion-weighted MR images. By exploiting the intrinsic redundancy in diffusion MRI using universal properties of the eigenspectrum of random covariance matrices, we remove noise-only principal components, thereby enabling signal-to-noise ratio enhancements. This yields parameter maps of improved quality for visual, quantitative, and statistical interpretation. By studying statistics of residuals, we demonstrate that the technique suppresses local signal fluctuations that solely originate from thermal noise rather than from other sources such as anatomical detail. Furthermore, we achieve improved precision in the estimation of diffusion parameters and fiber orientations in the human brain without compromising the accuracy and spatial resolution. •Denoising enhances the image quality for improved visual, quantitative, and statistical interpretation.•Random matrix theory enables data-driven threshold for PCA denoising.•The Marchenko-Pastur distribution is a universal signature of noise.•The technique suppresses signal fluctuations that solely originate in thermal noise.•Precision of diffusion parameter estimators increases without lowering accuracy.

Journal Article

Share this book

Add to My Shelf

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

by Albrechtsen, Anders , Meisner, Jonas in Admixtures , Alleles , Animals

2018

Meisner and Albrechtsen present two methods for inferring population structure and admixture proportions in low depth next-generation sequencing (NGS). NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth... We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter