Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
190
result(s) for
"pseudo-likelihood"
Sort by:
R$^{2}$s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs
2019
Abstract
Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the $R^{2}$ to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigated three $R^{2}$s for phylogenetic and mixed models. $R^{2}_{resid}$ is an extension of the ordinary least-squares $R^{2}$ that weights residuals by variances and covariances estimated by the model; it is closely related to $R^{2}_{glmm}$ presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133–142). $R^{2}_{pred}$ is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. $R^{2}_{lik}$ is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three $R^{2}$s are formulated as partial $R^{2}$s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial $R^{2}$s compare a full model with a reduced model without components of the full model, they are distinct from marginal $R^{2}$s that partition additive components of the variance. I assessed the properties of the $R^{2}$s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the $R^{2}$s are designed broadly for any model for correlated data, I also compared $R^{2}$s for linear mixed models and generalized linear mixed models. $R^{2}_{resid}$, $R^{2}_{pred}$, and $R^{2}_{lik}$ all have similar performance in describing the variance explained by different components of models. However, $R^{2}_{pred}$ gives the most direct answer to the question of how much variance in the data is explained by a model. $R^{2}_{resid}$ is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And $R^{2}_{lik}$ is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.
Journal Article
PSEUDO-LIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
2013
Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudo-likelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudolikelihood. We prove that pseudo-likelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two communities.
Journal Article
JOINT ESTIMATION OF PARAMETERS IN ISING MODEL
2020
We study joint estimation of the inverse temperature and magnetization parameters (β,B) of an Ising model with a nonnegative coupling matrix An of size n × n, given one sample from the Ising model. We give a general bound on the rate of consistency of the bi-variate pseudo-likelihood estimator. Using this, we show that estimation at rate n−1/2
is always possible if An
is the adjacency matrix of a bounded degree graph. If An
is the scaled adjacency matrix of a graph whose average degree goes to +∞, the situation is a bit more delicate. In this case, estimation at rate n−1/2
is still possible if the graph is not regular (in an asymptotic sense). Finally, we show that consistent estimation of both parameters is impossible if the graph is Erdős–Renyi with parameter p >0 independent of n, thus confirming that estimation is harder on approximately regular graphs with large degree.
Journal Article
Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information
by
Kamisetty, Hetunandan
,
Baker, David
,
Ovchinnikov, Sergey
in
ABC transporter
,
Amino acid sequence
,
Amino acids
2014
Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein–protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter.
Proteins are considered the ‘workhorse molecules’ of life and they are involved in virtually everything that cells do. Proteins are strings of amino acids that have folded into a specific three-dimensional shape. Proteins must have the correct shape to function properly, as they often work by binding to other proteins or molecules—much like a key fitting into a lock. Working out the structure of a protein can, therefore, provide major insights into how the protein does its job.
Two or more proteins can bind together and form a complex to perform various tasks; and solving the structures of these complexes can be challenging, even if the structures of the protein subunits are known. Now, Ovchinnikov, Kamisetty, and Baker have developed a method for predicting which parts of the proteins make contact with each other in a two-protein complex.
Different species can have copies of the same proteins; but a copy from one species might have different amino acids at certain positions when compared to a related copy from another species. As such, when pairs of interacting proteins from different species are compared, there will be many positions in the two proteins that vary. However, if the amino acid at a position in one protein (let's call it ‘X’) varies, and the amino acid at, say, position ‘Y’ in the other protein also varies such that for any given amino acid at position Y there is often a specific amino acid at position X; positions X and Y are said to ‘co-vary’. Ovchinnikov et al. noticed that when a pair of amino acids (one from each protein in a two-protein complex) co-varied, these two amino acids tended to make contact with each other at the protein–protein interface.
Ovchinnikov et al. used the new method to make predictions about the protein–protein interfaces in 28 protein complexes found in bacteria, and also to make a prediction about the interface between protein subunits in the bacterial ribosome. When these predictions were checked against the actual structures, which were all known beforehand, they were found to be accurate if the number of copies of each protein being compared is greater than the average length of the two proteins.
Ovchinnikov et al. went on to predict the amino acids on the protein–protein interfaces for another 36 bacterial protein complexes with unknown structures, and to present models for four larger complexes. The next challenge is to extend the method to protein complexes that are found only in eukaryotes (i.e., not in bacteria). Since the number of related copies for eukaryotic proteins tends to be smaller, there are fewer proteins to compare and it is therefore harder to detect ‘covariation’ when it occurs.
Journal Article
Minimum Scoring Rule Inference
by
Musio, Monica
,
Dawid, A. Philip
,
Ventura, Laura
in
B-robustness
,
Bregman estimate
,
composite score
2016
Proper scoring rules are devices for encouraging honest assessment of probability distributions. Just like log-likelihood, which is a special case, a proper scoring rule can be applied to supply an unbiased estimating equation for any statistical model, and the theory of such equations can be applied to understand the properties of the associated estimator. In this paper, we discuss some novel applications of scoring rules to parametric inference. In particular, we focus on scoring rule test statistics, and we propose suitable adjustments to allow reference to the usual asymptotic chi-squared distribution. We further explore robustness and interval estimation properties, by both theory and simulations.
Journal Article
Accounting for careless and insufficient effort responding in large-scale survey data—development, evaluation, and application of a screen-time-based weighting procedure
by
Ulitzsch, Esther
,
Lüdtke, Oliver
,
Shin, Hyo Jeong
in
Behavioral Science and Psychology
,
Cognitive Psychology
,
Humans
2024
Careless and insufficient effort responding (C/IER) poses a major threat to the quality of large-scale survey data. Traditional indicator-based procedures for its detection are limited in that they are only sensitive to specific types of C/IER behavior, such as straight lining or rapid responding, rely on arbitrary threshold settings, and do not allow taking the uncertainty of C/IER classification into account. Overcoming these limitations, we develop a two-step screen-time-based weighting procedure for computer-administered surveys. The procedure allows considering the uncertainty in C/IER identification, is agnostic towards the specific types of C/IE response patterns, and can feasibly be integrated with common analysis workflows for large-scale survey data. In Step 1, we draw on mixture modeling to identify subcomponents of log screen time distributions presumably stemming from C/IER. In Step 2, the analysis model of choice is applied to item response data, with respondents’ posterior class probabilities being employed to downweigh response patterns according to their probability of stemming from C/IER. We illustrate the approach on a sample of more than 400,000 respondents being administered 48 scales of the PISA 2018 background questionnaire. We gather supporting validity evidence by investigating relationships between C/IER proportions and screen characteristics that entail higher cognitive burden, such as screen position and text length, relating identified C/IER proportions to other indicators of C/IER as well as by investigating rank-order consistency in C/IER behavior across screens. Finally, in a re-analysis of the PISA 2018 background questionnaire data, we investigate the impact of the C/IER adjustments on country-level comparisons.
Journal Article
Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis
by
Crainiceanu, Ciprian M.
,
Li, Yingxing
,
Ruppert, David
in
Analysis
,
Asymptotic properties
,
Covariance
2014
This paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are as follows: (1) testing the null hypothesis that the mean of a functional process is parametric against a general alternative modelled by penalized splines; and (2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo-likelihood ratio test is proposed, and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite-sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized δ-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls.
Journal Article
Close-Kin Mark-Recapture
by
Bravington, Mark V.
,
Anderson, Eric C.
,
Skaug, Hans J.
in
Animal age determination
,
Demography
,
Design of experiments
2016
Mark-recapture (MR) methods are commonly used to study wildlife populations. Taking advantage of modern genetics one can generalize from \"recapture of self\" to \"recapture of closely-related kin\". Abundance and other demographic parameters of adults can then be estimated using, if necessary, only samples from dead animals (live-release is optional). This greatly widens the scope of MR, e.g. to commercial fisheries where large-scale tagging is impractical, and enhances the power of conventional MR studies where live release and tissue sampling is possible. We give explicit formulae for kinship (i.e., recapture) probabilities in general and specific cases. These yield a pseudo-likelihood based on pairwise comparisons of individuals in the samples. It is shown that the pseudo-likelihood approximates the full likelihood under sparse sampling of large populations. Experimental design is addressed via the principle of maximizing the Fisher information for parameters of interest. Finally, we discuss challenges related to kinship determination from genetic data, focusing on current limitations and future possibilities.
Journal Article
sparse ising model with covariates
by
Levina, Elizaveta
,
Cheng, Jie
,
Zhu, Ji
in
Algorithms
,
Binary Markov network
,
BIOMETRIC METHODOLOGY
2014
There has been a lot of work fitting Ising models to multivariate binary data in order to understand the conditional dependency relationships between the variables. However, additional covariates are frequently recorded together with the binary data, and may influence the dependence relationships. Motivated by such a dataset on genomic instability collected from tumor samples of several types, we propose a sparse covariate dependent Ising model to study both the conditional dependency within the binary data and its relationship with the additional covariates. This results in subject‐specific Ising models, where the subject's covariates influence the strength of association between the genes. As in all exploratory data analysis, interpretability of results is important, and we use ℓ1 penalties to induce sparsity in the fitted graphs and in the number of selected covariates. Two algorithms to fit the model are proposed and compared on a set of simulated data, and asymptotic results are established. The results on the tumor dataset and their biological significance are discussed in detail.
Journal Article
AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS
2011
A survey of recent developments in the theory and application of composite likelihood is provided, building on the review paper of Varin (2008). A range of application areas, including geostatistics, spatial extremes, and space-time models, as well as clustered and longitudinal data and time series are considered. The important area of applications to statistical genetics is omitted, in light of Larribe and Fearnhead (2011). Emphasis is given to the development of the theory, and the current state of knowledge on efficiency and robustness of composite likelihood inference.
Journal Article