Catalogue Search | MBRL

IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

by Arndt von Haeseler , Bui, Quang Minh , Schmidt, Heiko A in Algorithms , Intelligence , Perturbation methods

2015

Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3–97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

Journal Article

Share this book

Add to My Shelf

Computationally Efficient Composite Likelihood Statistics for Demographic Inference

by Gravel, Simon , Hsieh, Ping Hsun , Gutenkunst, Ryan N in Computational efficiency , Computing time , Demographics

2016

Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.

Journal Article

Share this book

Add to My Shelf

A modern maximum-likelihood theory for high-dimensional logistic regression

by Candès, Emmanuel J. , Sur, Pragya in Asymptotic properties , Bias , Maximum likelihood estimates

2019

Students in statistics or data science usually learn early on that when the sample size n is large relative to the number of variables p, fitting a logistic model by the method of maximum likelihood produces estimates that are consistent and that there are well-known formulas that quantify the variability of these estimates which are used for the purpose of statistical inference. We are often told that these calculations are approximately valid if we have 5 to 10 observations per unknown parameter. This paper shows that this is far from the case, and consequently, inferences produced by common software packages are often unreliable. Consider a logistic model with independent features in which n and p become increasingly large in a fixed ratio. We prove that (i) the maximum-likelihood estimate (MLE) is biased, (ii) the variability of the MLE is far greater than classically estimated, and (iii) the likelihood-ratio test (LRT) is not distributed as a X². The bias of the MLE yields wrong predictions for the probability of a case based on observed values of the covariates. We present a theory, which provides explicit expressions for the asymptotic bias and variance of the MLE and the asymptotic distribution of the LRT. We empirically demonstrate that these results are accurate in finite samples. Our results depend only on a single measure of signal strength, which leads to concrete proposals for obtaining accurate inference in finite samples through the estimate of this measure.

Journal Article

Share this book

Add to My Shelf

Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models

by Wood, Simon N. in Adaptive smoothing , Approximation , Convergence

2011

Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton-Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.

Journal Article

Share this book

Add to My Shelf

Power-Law Distributions in Empirical Data

by Clauset, Aaron , Shalizi, Cosma Rohilla , Newman, M. E. J. in Cumulative distribution functions , Datasets , Estimating techniques

2009

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov—Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

Journal Article

Share this book

Add to My Shelf

Estimating the Negative Binomial Dispersion Parameter with a Stratum-Effects Model and Many Strata

by Cadigan, Noel , Zheng, Nan , Nirmalkanna, Kunasekaran in Bias , Estimation , Fisheries

2024

We investigate several estimation methods based on marginal and conditional likelihoods to estimate the negative binomial (NB) dispersion parameter for highly stratified count data, for which the statistical model has a separate mean parameter for each stratum. If the number of samples per stratum is small, then the model is highly parameterized, and the maximum likelihood estimator of the NB dispersion parameter can be seriously biased and inefficient. For marginal likelihoods, we assume either a lognormal or beta prior for functions of strata means. We demonstrate using simulations that the marginal and conditional likelihood-based estimators give much improved estimates compared to other methods for highly stratified count data, such as the double-extended quasi-likelihood estimator and the restricted maximum likelihood estimator. We prefer the conditional approach that does not rely on assumptions about the distribution of stratum means; however, this estimator may be less efficient in some situations. We demonstrate in a case study that these estimators can give substantially different results. We also provide simulation results about the power of likelihood ratio tests for change in the NB over-dispersion parameter. Supplementary materials accompanying this paper appear on-line.

Journal Article

Share this book

Add to My Shelf

AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS

by Varin, Cristiano , Reid, Nancy , Firth, David in Binary data , Estimators , Inference

2011

A survey of recent developments in the theory and application of composite likelihood is provided, building on the review paper of Varin (2008). A range of application areas, including geostatistics, spatial extremes, and space-time models, as well as clustered and longitudinal data and time series are considered. The important area of applications to statistical genetics is omitted, in light of Larribe and Fearnhead (2011). Emphasis is given to the development of the theory, and the current state of knowledge on efficiency and robustness of composite likelihood inference.

Journal Article

Share this book

Add to My Shelf

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

by Draxler, Clemens , Zimmer, Felix , Debelak, Rudolf in Assessment , Behavioral Science and Psychology , Computer Simulation

2023

The Wald, likelihood ratio, score, and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item response theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This allows the application to a variety of IRT models, which are commonly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. We also provide a sampling-based approach for applications where the analytical approach is computationally infeasible. This can be the case with 20 or more items, since the computational load increases exponentially with the number of items. We performed extensive simulation studies in three practically relevant settings, i.e., testing a Rasch model against a 2PL model, testing for differential item functioning, and testing a partial credit model against a generalized partial credit model. The observed distributions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods in sufficiently large samples. We provide an openly accessible R package that implements the methods for user-supplied hypotheses.

Journal Article

Share this book

Add to My Shelf

UFBoot2: Improving the Ultrafast Bootstrap Approximation

by Arndt von Haeseler , Chernomor, Olga , Bui, Quang Minh in Approximation , Computing time , Resampling

2018

The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.

Journal Article

Share this book

Add to My Shelf

A QUASI—MAXIMUM LIKELIHOOD APPROACH FOR LARGE, APPROXIMATE DYNAMIC FACTOR MODELS

by Reichlin, Lucrezia , Doz, Catherine , Giannone, Domenico in Consistent estimators , Covariance matrices , Cross-sectional analysis

2012

Is maximum likelihood suitable for factor models in large cross-sections of time series? We answer this question from both an asymptotic and an empirical perspective. We show that estimates of the common factors based on maximum likelihood are consistent for the size of the cross-section (n) and the sample size (T), going to infinity along any path, and that maximum likelihood is viable for n large. The estimator is robust to misspecification of cross-sectional and time series correlation of the idiosyncratic components. In practice, the estimator can be easily implemented using the Kalman smoother and the EM algorithm as in traditional factor analysis.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter