Catalogue Search | MBRL

Point Event Cluster Detection via the Bayesian Generalized Fused Lasso

by Inoue, Ryo , Masuda, Ryo in Algorithms , area , Bayesian analysis

2022

Spatial cluster detection is one of the focus areas of spatial analysis, whose objective is the identification of clusters from spatial distributions of point events aggregated in districts with small areas. Choi et al. (2018) formulated cluster detection as a parameter estimation problem to leverage the parameter selection capability of the sparse modeling method called the generalized fused lasso. Although this work is superior to conventional methods for detecting multiple clusters, its estimation results are limited to point estimates. This study therefore extended the above work as a Bayesian cluster detection method to describe the probabilistic variations of clustering results. The proposed method combines multiple sparsity-inducing priors and encourages sparse solutions induced by the generalized fused lasso. Evaluations were performed with simulated and real-world distributions of point events to demonstrate that the proposed method provides new information on the quantified reliabilities of clustering results at the district level while achieving comparable detection performances to that of the previous work.

Journal Article

Share this book

Add to My Shelf

joint graphical lasso for inverse covariance estimation across multiple classes

by Wang, Pei , Witten, Daniela M. , Danaher, Patrick in Algorithms , Alternating directions method of multipliers , Analysis of covariance

2014

We consider the problem of estimating multiple related Gaussian graphical models from a high dimensional data set with observations belonging to distinct classes. We propose the joint graphical lasso, which borrows strength across the classes to estimate multiple graphical models that share certain characteristics, such as the locations or weights of non‐zero edges. Our approach is based on maximizing a penalized log‐likelihood. We employ generalized fused lasso or group lasso penalties and implement a fast alternating directions method of multipliers algorithm to solve the corresponding convex optimization problems. The performance of the method proposed is illustrated through simulated and real data examples.

Journal Article

Share this book

Add to My Shelf

Estimating DNA methylation levels by joint modeling of multiple methylation profiles from microarray data

by Zhao, Hongyu , Chen, Mengjie , Wang, Tao in Animals , BIOMETRIC METHODOLOGY , biometry

2016

DNA methylation studies have been revolutionized by the recent development of high throughput array-based platforms. Most of the existing methods analyze microarray methylation data on a probe-by-probe basis, ignoring probe-specific effects and correlations among methylation levels at neighboring genomic locations. These methods can potentially miss functionally relevant findings associated with genomic regions. In this article, we propose a statistical model that allows us to pool information on the same probe across multiple samples to estimate the probe affinity effect, and to borrow strength from the neighboring probe sites to better estimate the methylation values. Using a simulation study, we demonstrate that our method can provide accurate model-based estimates. We further use the proposed method to develop a new procedure for detecting differentially methylated regions, and compare it with a state-of-the-art approach via a data application.

Journal Article

Share this book

Add to My Shelf

Reconstructing DNA copy number by joint segmentation of multiple sequences

by Sabatti, Chiara , Zhang, Zhongyang , Lange, Kenneth in Algorithms , Analysis , Arrays

2012

Background Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. Results We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. Conclusions The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.

Journal Article

Share this book

Add to My Shelf

Efficient Implementations of the Generalized Lasso Dual Path Algorithm

by Tibshirani, Ryan J. , Arnold, Taylor B. in Algorithms , Fused lasso , Laplacian linear systems

2016

We consider efficient implementations of the generalized lasso dual path algorithm given by Tibshirani and Taylor in 2011 . We first describe a generic approach that covers any penalty matrix D and any (full column rank) matrix X of predictor variables. We then describe fast implementations for the special cases of trend filtering problems, fused lasso problems, and sparse fused lasso problems, both with X = I and a general matrix X. These specialized implementations offer a considerable improvement over the generic implementation, both in terms of numerical stability and efficiency of the solution path computation. These algorithms are all available for use in the genlasso R package, which can be found in the CRAN repository.

Journal Article

Share this book

Add to My Shelf

Spatial Homogeneity Pursuit of Regression Coefficients for Large Datasets

by Sang, Huiyan , Li, Furong in Applications and Case Studies , basins , Change detection

2019

Spatial regression models have been widely used to describe the relationship between a response variable and some explanatory variables over a region of interest, taking into account the spatial dependence of the observations. In many applications, relationships between response variables and covariates are expected to exhibit complex spatial patterns. We propose a new approach, referred to as spatially clustered coefficient (SCC) regression, to detect spatially clustered patterns in the regression coefficients. It incorporates spatial neighborhood information through a carefully constructed regularization to automatically detect change points in space and to achieve computational scalability. Our numerical studies suggest that SCC works very effectively, capturing not only clustered coefficients, but also smoothly varying coefficients because of its strong local adaptivity. This flexibility allows researchers to explore various spatial structures in regression coefficients. We also establish theoretical properties of SCC. We use SCC to explore the relationship between the temperature and salinity of sea water in the Atlantic basin; this can provide important insights about the evolution of individual water masses and the pathway and strength of meridional overturning circulation in oceanography. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Journal Article

Share this book

Add to My Shelf

Convex Biclustering

by Baraniuk, Richard G. , Allen, Genevera I. , Chi, Eric C. in Algorithms , BIOMETRIC METHODOLOGY , biometry

2017

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees—features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

Journal Article

Share this book

Add to My Shelf

Modeling disease progression via multi-task learning

by Ye, Jieping , Zhou, Jiayu , Liu, Jun in ADAS-Cog , Adult and adolescent clinical studies , Aged

2013

Alzheimer's disease (AD), the most common type of dementia, is a severe neurodegenerative disorder. Identifying biomarkers that can track the progress of the disease has recently received increasing attentions in AD research. An accurate prediction of disease progression would facilitate optimal decision-making for clinicians and patients. A definitive diagnosis of AD requires autopsy confirmation, thus many clinical/cognitive measures including Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog) have been designed to evaluate the cognitive status of the patients and used as important criteria for clinical diagnosis of probable AD. In this paper, we consider the problem of predicting disease progression measured by the cognitive scores and selecting biomarkers predictive of the progression. Specifically, we formulate the prediction problem as a multi-task regression problem by considering the prediction at each time point as a task and propose two novel multi-task learning formulations. We have performed extensive experiments using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, we use the baseline MRI features to predict MMSE/ADAS-Cog scores in the next 4years. Results demonstrate the effectiveness of the proposed multi-task learning formulations for disease progression in comparison with single-task learning algorithms including ridge regression and Lasso. We also perform longitudinal stability selection to identify and analyze the temporal patterns of biomarkers in disease progression. We observe that cortical thickness average of left middle temporal, cortical thickness average of left and right Entorhinal, and white matter volume of left Hippocampus play significant roles in predicting ADAS-Cog at all time points. We also observe that several MRI biomarkers provide significant information for predicting MMSE scores for the first 2years, however very few are shown to be significant in predicting MMSE score at later stages. The lack of predictable MRI biomarkers in later stages may contribute to the lower prediction performance of MMSE than that of ADAS-Cog in our study and other related studies. •Ability to simultaneously learn and predict disease status at multiple time points.•Two multi-task learning formulations for high dimensional data.•Longitudinal stability selection to analyze the dynamic patterns of biomarkers.•Detailed comparison among different methods of disease progression on ADNI data.

Journal Article

Share this book

Add to My Shelf

A UNIFIED FRAMEWORK FOR CHANGE POINT DETECTION IN HIGH-DIMENSIONAL LINEAR MODELS

by Bai, Yue , Safikhani, Abolfazl in HIGH-DIMENSIONAL STATISTICS

2023

Although change-point detection for high-dimensional data has become increasingly important in many scientific fields, most existing methods are designed for specific models (e.g., mean shift model, vector auto-regressive model, graphical model). Here, we provide a unified framework for structural break detection that is suitable for a large class of models. Moreover, we propose a three-step algorithm that automatically achieves consistent parameter estimates during the change-point detection process, without needing to refit the model. The first step combines the block segmentation strategy and a fused lasso-based estimation criterion, leading to significant computational gains, without compromising the statistical accuracy of identifying the number and location of the structural breaks. Then, we use hard-thresholding and exhaustive search steps to consistently estimate the number and location of the break points. We prove strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and provide consistent estimates of the model parameters. The findings of our numerical studies support the theory and validate the competitive performance of the algorithm for a wide range of models. The proposed algorithm is implemented in the R package LinearDetect.

Journal Article

Share this book

Add to My Shelf

False Discovery Rate Smoothing

by Koyejo, Oluwasanmi , Tansey, Wesley , Poldrack, Russell A. in Algorithms , data collection , Discovery

2018

We present false discovery rate (FDR) smoothing, an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false discovery rate at a given level. This results in increased power and cleaner spatial separation of signals from noise. The approach requires solving a nonstandard high-dimensional optimization problem, for which an efficient augmented-Lagrangian algorithm is presented. In simulation studies, FDR smoothing exhibits state-of-the-art performance at modest computational cost. In particular, it is shown to be far more robust than existing methods for spatially dependent multiple testing. We also apply the method to a dataset from an fMRI experiment on spatial working memory, where it detects patterns that are much more biologically plausible than those detected by standard FDR-controlling methods. All code for FDR smoothing is publicly available in Python and R ( https://github.com/tansey/smoothfdr ). Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter