Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
10,229
result(s) for
"Mixture models"
Sort by:
Identifiability in N-mixture models
2018
Binomial N-mixture models have proven very useful in ecology, conservation, and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by Akaike’s information criterion. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models that combine different observation models or the use of external information via informative priors or penalized likelihoods, may help.
Journal Article
On the Reliability of N-Mixture Models for Count Data
by
Link, William A.
,
Sauer, John R.
,
Schofield, Matthew R.
in
Abundance
,
Ancillary statistic
,
Animal Distribution
2018
N-mixture models describe count data replicated in time and across sites in terms of abundance N and detectability p. They are popular because they allow inference about N while controlling for factors that influence p without the need for marking animals. Using a capture-recapture perspective, we show that the loss of information that results from not marking animals is critical, making reliable statistical modeling of N and p problematic using just count data. One cannot reliably fit a model in which the detection probabilities are distinct among repeat visits as this model is overspecified. This makes uncontrolled variation in p problematic. By counter example, we show that even if p is constant after adjusting for covariate effects (the \"constant p\" assumption) scientifically plausible alternative models in which N (or its expectation) is non-identifiable or does not even exist as a parameter, lead to data that are practically indistinguishable from data generated under an N-mixture model. This is particularly the case for sparse data as is commonly seen in applications. We conclude that under the constant p assumption reliable inference is only possible for relative abundance in the absence of questionable and/or untestable assumptions or with better quality data than seen in typical applications. Relative abundance models for counts can be readily fitted using Poisson regression in standard software such as R and are sufficiently flexible to allow controlling for p through the use covariates while simultaneously modeling variation in relative abundance. If users require estimates of absolute abundance, they should collect auxiliary data that help with estimation of p.
Journal Article
Distinct trajectories of physical activity and related factors during the life course in the general population: a systematic review
by
Tammelin, Tuija H.
,
Hirvensalo, Mirja
,
Palomäki, Sanna
in
Biostatistics
,
Chronic diseases
,
Elderly
2019
Background
In recent years, researchers have begun applying a trajectory approach to identify homogeneous subgroups of physical activity (PA) in heterogeneous populations. This study systematically reviewed the articles identifying longitudinal PA trajectory classes and the related factors (e.g., determinants, predictors, and outcomes) in the general population during different life phases.
Methods
The included studies used finite mixture models for identifying trajectories of PA, exercise, or sport participation. Three electronic databases, PubMed (Medline), Web of Science, and CINAHL, were searched from the year 2000 to 13 February 2018. The study was conducted according to the PRISMA recommendations.
Results
Twenty-seven articles were included and organized into three age group: youngest (eleven articles), middle (eight articles), and oldest (eight articles). The youngest group consisted mainly of youth, the middle group of adults and the oldest group of late middle-aged and older adults. Most commonly, three or four trajectory classes were reported. Several trajectories describing a decline in PA were reported, especially in the youngest group, whereas trajectories of consistently increasing PA were observed in the middle and oldest group. While the proportion of persistently physically inactive individuals increased with age, the proportion was relatively high at all ages. Generally, male gender, being Caucasian, non-smoking, having low television viewing time, higher socioeconomic status, no chronic illnesses, and family support for PA were associated either with persistent or increasing PA.
Conclusions
The reviewed articles identified various PA subgroups, indicating that finite mixture modeling can yield new information on the complexity of PA behavior compared to studying population mean PA level only. The studies also provided novel information how different factors relate to changes in PA during life course. The recognition of the PA subgroups and their determinants is important for the more precise targeting of PA promotion and PA interventions.
Trial registration
PROSPERO registration number:
CRD42018088120
.
Journal Article
An Overview of Semiparametric Extensions of Finite Mixture Models
2019
Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed.
Journal Article
On the robustness of N-mixture models
by
Link, William A.
,
Sauer, John R.
,
Schofield, Matthew R.
in
abundance estimation
,
Animals
,
Bayesian P‐value
2018
N-mixture models provide an appealing alternative to mark–recapture models, in that they allow for estimation of detection probability and population size from count data, without requiring that individual animals be identified. There is, however, a cost to using the N-mixture models: inference is very sensitive to the model’s assumptions. We consider the effects of three violations of assumptions that might reasonably be expected in practice: double counting, unmodeled variation in population size over time, and unmodeled variation in detection probability over time. These three examples show that small violations of assumptions can lead to large biases in estimation. The violations of assumptions we consider are not only small qualitatively, but are also small in the sense that they are unlikely to be detected using goodness-of-fit tests. In cases where reliable estimates of population size are needed, we encourage investigators to allocate resources to acquiring additional data, such as recaptures of marked individuals, for estimation of detection probabilities.
Journal Article
Scalable Empirical Mixture Models That Account for Across-Site Compositional Heterogeneity
by
Lartillot, Nicolas
,
Schrempf, Dominik
,
Szöllősi, Gergely
in
Amino acids
,
Cluster analysis
,
Coordinate transformations
2020
Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10–C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10–C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).
Journal Article
Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models
by
Cloos, Jacqueline
,
van Wieringen, Wessel N.
,
Rutten, Philip
in
Algorithms
,
Bayes Theorem
,
Bayesian analysis
2025
Background
An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can characterize a heterogeneous set of samples, and facilitates direct comparison of cell populations across the data samples. The multi-sample approach to statistical mixture modeling has been explored in a number of reports, mostly within a Bayesian framework and with high computational complexity. Although these approaches are effective, they are also computationally demanding, and therefore do not relate well to the requirement of scalability, which is essential in the multi-sample setting. This limits their utility in the analysis of large sets of large FCM samples.
Results
We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.
Conclusions
We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes.
Journal Article
Models for Estimating Abundance from Repeated Counts of an Open Metapopulation
2011
Using only spatially and temporally replicated point counts, Royle (2004b, Biometrics 60, 108-115) developed an N-mixture model to estimate the abundance of an animal population when individual animal detection probability is unknown. One assumption inherent in this model is that the animal populations at each sampled location are closed with respect to migration, births, and deaths throughout the study. In the past this has been verified solely by biological arguments related to the study design as no statistical verification was available. In this article, we propose a generalization of the N-mixture model that can be used to formally test the closure assumption. Additionally, when applied to an open metapopulation, the generalized model provides estimates of population dynamics parameters and yields abundance estimates that account for imperfect detection probability and do not require the closure assumption. A simulation study shows these abundance estimates are less biased than the abundance estimate obtained from the original N-mixture model. The proposed model is then applied to two data sets of avian point counts. The first example demonstrates the closure test on a single-season study of Mallards (Anas platyrhynchos) , and the second uses the proposed model to estimate the population dynamics parameters and yearly abundance of American robins (Turdus migratorius) from a multi-year study.
Journal Article
Latent variable mixture modeling in psychiatric research – a review and application
by
Nordström, T.
,
Kaakinen, M.
,
Ahmed, A. O.
in
Cross-Sectional Studies
,
Factor analysis
,
Factor Analysis, Statistical
2016
Latent variable mixture modeling represents a flexible approach to investigating population heterogeneity by sorting cases into latent but non-arbitrary subgroups that are more homogeneous. The purpose of this selective review is to provide a non-technical introduction to mixture modeling in a cross-sectional context. Latent class analysis is used to classify individuals into homogeneous subgroups (latent classes). Factor mixture modeling represents a newer approach that represents a fusion of latent class analysis and factor analysis. Factor mixture models are adaptable to representing categorical and dimensional states of affairs. This article provides an overview of latent variable mixture models and illustrates the application of these methods by applying them to the study of the latent structure of psychotic experiences. The flexibility of latent variable mixture models makes them adaptable to the study of heterogeneity in complex psychiatric and psychological phenomena. They also allow researchers to address research questions that directly compare the viability of dimensional, categorical and hybrid conceptions of constructs.
Journal Article
Accounting for imperfect detection and survey bias in statistical analysis of presence‐only data
by
Dorazio, Robert M.
in
Animal and plant ecology
,
Animal, plant and microbial ecology
,
Applied ecology
2014
AIM: During the past decade ecologists have attempted to estimate the parameters of species distribution models by combining locations of species presence observed in opportunistic surveys with spatially referenced covariates of occurrence. Several statistical models have been proposed for the analysis of presence‐only data, but these models have largely ignored the effects of imperfect detection and survey bias. In this paper I describe a model‐based approach for the analysis of presence‐only data that accounts for errors in the detection of individuals and for biased selection of survey locations. INNOVATION: I develop a hierarchical, statistical model that allows presence‐only data to be analysed in conjunction with data acquired independently in planned surveys. One component of the model specifies the spatial distribution of individuals within a bounded, geographic region as a realization of a spatial point process. A second component of the model specifies two kinds of observations, the detection of individuals encountered during opportunistic surveys and the detection of individuals encountered during planned surveys. MAIN CONCLUSIONS: Using mathematical proof and simulation‐based comparisons, I demonstrate that biases induced by errors in detection or biased selection of survey locations can be reduced or eliminated by using the hierarchical model to analyse presence‐only data in conjunction with counts observed in planned surveys. I show that a relatively small number of high‐quality data (from planned surveys) can be used to leverage the information in presence‐only observations, which usually have broad spatial coverage but may not be informative of both occurrence and detectability of individuals. Because a variety of sampling protocols can be used in planned surveys, this approach to the analysis of presence‐only data is widely applicable. In addition, since the point‐process model is formulated at the level of an individual, it can be extended to account for biological interactions between individuals and temporal changes in their spatial distributions.
Journal Article