Catalogue Search | MBRL

BHPMF – a hierarchical Bayesian approach to gap‐filling and trait prediction for macroecology and functional biogeography

by Wright, Ian J. , Wirth, Christian B. , Dickie, John in artificial intelligence , Bayesian analysis , Bayesian hierarchical model

2015

AIM: Functional traits of organisms are key to understanding and predicting biodiversity and ecological change, which motivates continuous collection of traits and their integration into global databases. Such trait matrices are inherently sparse, severely limiting their usefulness for further analyses. On the other hand, traits are characterized by the phylogenetic trait signal, trait–trait correlations and environmental constraints, all of which provide information that could be used to statistically fill gaps. We propose the application of probabilistic models which, for the first time, utilize all three characteristics to fill gaps in trait databases and predict trait values at larger spatial scales. INNOVATION: For this purpose we introduce BHPMF, a hierarchical Bayesian extension of probabilistic matrix factorization (PMF). PMF is a machine learning technique which exploits the correlation structure of sparse matrices to impute missing entries. BHPMF additionally utilizes the taxonomic hierarchy for trait prediction and provides uncertainty estimates for each imputation. In combination with multiple regression against environmental information, BHPMF allows for extrapolation from point measurements to larger spatial scales. We demonstrate the applicability of BHPMF in ecological contexts, using different plant functional trait datasets, also comparing results to taking the species mean and PMF. MAIN CONCLUSIONS: Sensitivity analyses validate the robustness and accuracy of BHPMF: our method captures the correlation structure of the trait matrix as well as the phylogenetic trait signal – also for extremely sparse trait matrices – and provides a robust measure of confidence in prediction accuracy for each missing entry. The combination of BHPMF with environmental constraints provides a promising concept to extrapolate traits beyond sampled regions, accounting for intraspecific trait variability. We conclude that BHPMF and its derivatives have a high potential to support future trait‐based research in macroecology and functional biogeography.

Journal Article

Share this book

Add to My Shelf

Mapping local and global variability in plant trait distributions

by Spasojevic, Marko J. , González-Melo, Andrés , Laughlin, Daniel C. in 60 APPLIED LIFE SCIENCES , Bayesian analysis , Bayesian modeling

2017

Our ability to understand and predict the response of ecosystems to a changing environment depends on quantifying vegetation functional diversity. However, representing this diversity at the global scale is challenging. Typically, in Earth system models, characterization of plant diversity has been limited to grouping related species into plant functional types (PFTs), with all trait variation in a PFT collapsed into a single mean value that is applied globally. Using the largest global plant trait database and state of the art Bayesian modeling, we created fine-grained global maps of plant trait distributions that can be applied to Earth system models. Focusing on a set of plant traits closely coupled to photosynthesis and foliar respiration-specific leaf area (SLA) and dry mass-based concentrations of leaf nitrogen (N-m) and phosphorus (P-m), we characterize how traits vary within and among over 50,000 similar to 50 x 50-km cells across the entire vegetated land surface. We do this in several ways-without defining the PFT of each grid cell and using 4 or 14 PFTs; each model's predictions are evaluated against out-of-sample data. This endeavor advances prior trait mapping by generating global maps that preserve variability across scales by using modern Bayesian spatial statistical modeling in combination with a database over three times larger than that in previous analyses. Our maps reveal that the most diverse grid cells possess trait variability close to the range of global PFT means.

Journal Article

Share this book

Add to My Shelf

Robustness of trait connections across environmental gradients and growth forms

by Anand, Madhur , Laughlin, Daniel C. , Flores-Moreno, Habacuc in Arid regions , cold , Correlation

2019

Aim Plant trait databases often contain traits that are correlated, but for whom direct (undirected statistical dependency) and indirect (mediated by other traits) connections may be confounded. The confounding of correlation and connection hinders our understanding of plant strategies, and how these vary among growth forms and climate zones. We identified the direct and indirect connections across plant traits relevant to competition, resource acquisition and reproductive strategies using a global database and explored whether connections within and between traits from different tissue types vary across climates and growth forms. Location Global. Major taxa studied Plants. Time period Present. Methods We used probabilistic graphical models and a database of 10 plant traits (leaf area, specific leaf area, mass‐ and area‐based leaf nitrogen and phosphorous content, leaf life span, plant height, stem specific density and seed mass) with 16,281 records to describe direct and indirect connections across woody and non‐woody plants across tropical, temperate, arid, cold and polar regions. Results Trait networks based on direct connections are sparser than those based on correlations. Land plants had high connectivity across traits within and between tissue types; leaf life span and stem specific density shared direct connections with all other traits. For both growth forms, two groups of traits form modules of more highly connected traits; one related to resource acquisition, the other to plant architecture and reproduction. Woody species had higher trait network modularity in polar compared to temperate and tropical climates, while non‐woody species did not show significant differences in modularity across climate regions. Main conclusions Plant traits are highly connected both within and across tissue types, yet traits segregate into persistent modules of traits. Variation in the modularity of trait networks suggests that trait connectivity is shaped by prevailing environmental conditions and demonstrates that plants of different growth forms use alternative strategies to cope with local conditions.

Journal Article

Share this book

Add to My Shelf

Probabilistic Structured Models for Plant Trait Analysis

by Fazayeli, Farideh in Computer science

2017

Many fields in modern science and engineering such as ecology, computational biology, astronomy, signal processing, climate science, brain imaging, natural language processing, and many more involve collecting data sets in which the dimensionality of the data p exceeds the sample size n. Since it is usually impossible to obtain consistent procedures unless p < n, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse structured graphical models, low-rank matrices, and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. Of particular interest are structure learning of graphical models in high dimensional setting. The majority of statistical analysis of graphical model estimations assume that all the data are fully observed and the data points are sampled from the same distribution and provide the sample complexity and convergence rate by considering only one graphical structure for all the observations. In this thesis, we extend the above results to estimate the structure of graphical models where the data is partially observed or the data is sampled from multiple distributions. First, we consider the problem of estimating change in the dependency structure of two p-dimensional models, based on samples drawn from two graphical models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual graphical models. Next, we consider the problem of estimating sparse structure of Gaussian copula distributions (corresponding to non-paranormal distributions) using samples with missing values. We prove that our proposed estimators consistently estimate the non-paranormal correlation matrix where the convergence rate depends on the probability of missing values. In the second part of thesis, we consider matrix completion problem. Low-rank matrix completion methods have been successful in a variety of settings such as recommendation systems. However, most of the existing matrix completion methods only provide a point estimate of missing entries, and do not characterize uncertainties of the predictions. First, we illustrate that the posterior distribution in latent factor models, such as probabilistic matrix factorization, when marginalized over one latent factor has the Matrix Generalized Inverse Gaussian (MGIG) distribution. We show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation equation. The characterization leads to a novel Collapsed Monte Carlo inference algorithm for such latent factor models. Next, we propose a Bayesian hierarchical probabilistic matrix factorization (BHPMF) model to 1) incorporate hierarchical side information, and 2) provide uncertainty quantified predictions. The former yields significant performance improvements in the problem of plant trait prediction, a key problem in ecology, by leveraging the taxonomic hierarchy in the plant kingdom. The latter is helpful in identifying predictions of low confidence which can in turn be used to guide field work for data collection efforts. Finally, we consider applications of probabilistic structured models to plant trait analysis. We apply BHPMF model to fill the gaps in TRY database. The BHPMF model is the-state-of-the-art model for plant trait prediction and is getting increasing visibility and usage in the plant trait analysis. We have submitted a R package for BHPMF to CRAN. Next, we apply the Gaussian graphical model structure estimators to obtain the trait-trait interactions. We study the trait-trait interactions structure at different climate zones and among different plant growth forms and uncover the dependence of traits on climate and on vegetation.

Dissertation

Share this book

Add to My Shelf

BHPMF – a hierarchical B ayesian approach to gap‐filling and trait prediction for macroecology and functional biogeography

by Wright, Ian J. , Wirth, Christian B. , Dickie, John

2015

Journal Article

Share this book

Add to My Shelf

Algorithms for correcting next generation sequencing errors

by Fazayeli, Farideh in Computer science

2011

Dissertation

Share this book

Add to My Shelf

The Matrix Generalized Inverse Gaussian Distribution: Properties and Applications

by Fazayeli, Farideh , Banerjee, Arindam in Algorithms , Generalized inverse , Importance sampling

2016

While the Matrix Generalized Inverse Gaussian (\\(\\mathcal{MGIG}\\)) distribution arises naturally in some settings as a distribution over symmetric positive semi-definite matrices, certain key properties of the distribution and effective ways of sampling from the distribution have not been carefully studied. In this paper, we show that the \\(\\mathcal{MGIG}\\) is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation (ARE) equation [7]. Based on the property, we propose an importance sampling method for the \\(\\mathcal{MGIG}\\) where the mode of the proposal distribution matches that of the target. The proposed sampling method is more efficient than existing approaches [32, 33], which use proposal distributions that may have the mode far from the \\(\\mathcal{MGIG}\\)'s mode. Further, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization (PMF) [25], when marginalized over one latent factor has the \\(\\mathcal{MGIG}\\) distribution. The characterization leads to a novel Collapsed Monte Carlo (CMC) inference algorithm for such latent factor models. We illustrate that CMC has a lower log loss or perplexity than MCMC, and needs fewer samples.

Paper

Share this book

Add to My Shelf

Generalized Direct Change Estimation in Ising Model Structure

by Fazayeli, Farideh , Banerjee, Arindam in Dependence , Estimation , Ising model

2016

We consider the problem of estimating change in the dependency structure between two \\(p\\)-dimensional Ising models, based on respectively \\(n_1\\) and \\(n_2\\) samples drawn from the models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual Ising models. The estimator can work with any norm, and can be generalized to other graphical models under mild assumptions. We show that only one set of samples, say \\(n_2\\), needs to satisfy the sample complexity requirement for the estimator to work, and the estimation error decreases as \\(\\frac{c}{\\sqrt{\\min(n_1,n_2)}}\\), where \\(c\\) depends on the Gaussian width of the unit norm ball. For example, for \\(\\ell_1\\) norm applied to \\(s\\)-sparse change, the change can be accurately estimated with \\(\\min(n_1,n_2)=O(s \\log p)\\) which is sharper than an existing result \\(n_1= O(s^2 \\log p)\\) and \\(n_2 = O(n_1^2)\\). Experimental results illustrating the effectiveness of the proposed estimator are presented.

Paper

Share this book

Add to My Shelf

Estimation with Norm Regularization

by Chen, Sheng , Banerjee, Arindam , Fazayeli, Farideh in Complexity , Error analysis , Estimating techniques

2015

Analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, needs to consider four aspects: the norm, the loss function, the design matrix, and the noise model. This paper presents generalizations of such estimation error analysis on all four aspects compared to the existing literature. We characterize the restricted error set where the estimation error vector lies, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to any norm. Precise characterizations of the bound is presented for isotropic as well as anisotropic subGaussian design matrices, subGaussian noise models, and convex loss functions, including least squares and generalized linear models. Generic chaining and associated results play an important role in the analysis. A key result from the analysis is that the sample complexity of all such estimators depends on the Gaussian width of a spherical cap corresponding to the restricted error set. Further, once the number of samples \\(n\\) crosses the required sample complexity, the estimation error decreases as \\(\\frac{c}{\\sqrt{n}}\\), where \\(c\\) depends on the Gaussian width of the unit norm ball.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter