Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,137
result(s) for
"Summary statistics"
Sort by:
Flexible behavioral capture-recapture modeling
2016
We develop alternative strategies for building and fitting parametric capture–recapture models for closed populations which can be used to address a better understanding of behavioral patterns. In the perspective of transition models, we first rely on a conditional probability parameterization. A large subset of standard capture–recapture models can be regarded as a suitable partitioning in equivalence classes of the full set of conditional probability parameters. We exploit a regression approach combined with the use of new suitable summaries of the conditioning binary partial capture histories as a device for enlarging the scope of behavioral models and also exploring the range of all possible partitions. We show how one can easily find unconditional MLE of such models within a generalized linear model framework. We illustrate the potential of our approach with the anlaysis of some known datasets and a simulation study.
Journal Article
The variant call format provides efficient and robust storage of GWAS summary statistics
by
Andrews, Shea J.
,
Hemani, Gibran
,
Lyon, Matthew S.
in
Animal Genetics and Genomics
,
Bioinformatics
,
Biomedical and Life Sciences
2021
GWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (
https://gwas.mrcieu.ac.uk
).
Journal Article
LEARNING FROM REVIEWS
by
Ozdaglar, Asuman
,
Acemoglu, Daron
,
Malekian, Azarakhsh
in
Bayesian analysis
,
Bayesian learning
,
Consumers
2022
This paper develops a model of Bayesian learning from online reviews and investigates the conditions for learning the quality of a product and the speed of learning under different rating systems. A rating system provides information about reviews left by previous customers. observe the ratings of a product and decide whether to purchase and review it. We study learning dynamics under two classes of rating systems: full history, where customers see the full history of reviews, and summary statistics, where the platform reports some summary statistics of past reviews. In both cases, learning dynamics are complicated by a selection effect—the types of users who purchase the good, and thus their overall satisfaction and reviews depend on the information available at the time of purchase. We provide conditions for complete learning and characterize and compare its speed under full history and summary statistics. We also show that providing more information does not always lead to faster learning, but strictly finer rating systems do.
Journal Article
The effect of the Shorter Stays in Emergency Departments health target on the quality of ED discharge summaries
2016
ObjectiveTime targets for ED stays are used as a policy instrument to reduce ED crowding. There is debate whether such policies are helpful or harmful, as focus on a process target may divert attention from clinical care. The objective of this study is to investigate whether the Shorter Stays in Emergency Departments target in New Zealand was associated with a change in the quality of ED discharge information provided to primary care providers.MethodsThe quality of discharge summaries was assessed retrospectively over time using chart review. Logistic regression was used to account for secular trends with adequate or not as the dependent variable. Explanatory variables were: age, ethnicity, deprivation, triage category, year, the step at target introduction (2009) and the change in slope before and after the target.ResultsOf 500 randomly selected discharge summaries, 491 (98.2%) were included in the analysis. There was evidence of a decrease over time in the proportion of adequate discharge summaries before the introduction of the target (slope estimate (SE) −0.43 (0.20), p=0.02). A step at the target introduction could not be shown (p=0.47). There was evidence of an improvement over time from pre-target to post-target: slope afterwards 0.33, estimate of change in slope (SE) 0.76 (0.27), p=0.006.ConclusionsThere was no reduction in the quality of discharge summaries following the introduction of the shorter stays in ED target and trends in quality improved. These findings deserve replication in other hospitals which may experience different challenges.
Journal Article
LEARNING SUMMARY STATISTIC FOR APPROXIMATE BAYESIAN COMPUTATION VIA DEEP NEURAL NETWORK
2017
Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to construct effective summary statistics. In this paper we explore the possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracies of the resulting posteriors.
Journal Article
A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES
by
Zhou, Xiang
2017
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs—the restricted maximum likelihood estimation method (REML)—suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods—the renowned Haseman–Elston (HE) regression and the recent LD score regression (LDSC)—into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Journal Article
Estimating genetic nurture with summary statistics of multigenerational genome-wide association studies
by
Wu, Yuchang
,
Lin, Yunong
,
Fletcher, Jason M.
in
Autism
,
Autism Spectrum Disorder - genetics
,
Birth weight
2021
Marginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic, and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower body mass index, less-active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. A polygenic transmission disequilibrium test showed a significant overtransmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.
Journal Article
Accounting for temporal change in multiple biodiversity patterns improves the inference of metacommunity processes
by
Thompson, Patrick L
,
Antón Pardo, María
,
Chase, Jonathan M
in
Assembly
,
Biodiversity
,
Coefficient of variation
2022
In metacommunity ecology, a major focus has been on combining observational and analytical approaches to identify the role of critical assembly processes, such as dispersal limitation and environmental filtering, but this work has largely ignored temporal community dynamics. Here, we develop a \"virtual ecologist\" approach to evaluate assembly processes by simulating metacommunities varying in three main processes: density-independent responses to abiotic conditions, density-dependent biotic interactions, and dispersal. We then calculate a number of commonly used summary statistics of community structure in space and time and use random forests to evaluate their utility for inferring the strength of these three processes. We find that (i) both spatial and temporal data are necessary to disentangle metacommunity processes based on the summary statistics we test, and including statistics that are measured through time increases the explanatory power of random forests by up to 59% compared to cases where only spatial variation is considered; (ii) the three studied processes can be distinguished with different descriptors; and (iii) each summary statistic is differently sensitive to temporal and spatial sampling effort. Including repeated observations of metacommunities over time was essential for inferring the metacommunity processes, particularly dispersal. Some of the most useful statistics include the coefficient of variation of species abundances through time and metrics that incorporate variation in the relative abundances (evenness) of species. We conclude that a combination of methods and summary statistics is probably necessary to understand the processes that underlie metacommunity assembly through space and time, but we recognize that these results will be modified when other processes or summary statistics are used.
Journal Article
New models for symbolic data analysis
by
Beranger, Boris
,
Lin, Huan
,
Sisson, Scott
in
Chemistry and Earth Sciences
,
Computer Science
,
Data analysis
2023
Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e.
symbols
), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are unobserved, and are aggregated into observed symbols—group-based distributional-valued summaries—prior to the analysis. We introduce a novel general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses, including a study of novel classes of multivariate symbol construction techniques.
Journal Article
BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES
2017
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a \"Regression with Summary Statistics\" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.
Journal Article