Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
68
result(s) for
"multinomial mixtures"
Sort by:
Identifiability in N-mixture models
2018
Binomial N-mixture models have proven very useful in ecology, conservation, and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by Akaike’s information criterion. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models that combine different observation models or the use of external information via informative priors or penalized likelihoods, may help.
Journal Article
Generalized site occupancy models allowing for false positive and false negative errors
2006
Site occupancy models have been developed that allow for imperfect species detection or \"false negative\" observations. Such models have become widely adopted in surveys of many taxa. The most fundamental assumption underlying these models is that \"false positive\" errors are not possible. That is, one cannot detect a species where it does not occur. However, such errors are possible in many sampling situations for a number of reasons, and even low false positive error rates can induce extreme bias in estimates of site occupancy when they are not accounted for. In this paper, we develop a model for site occupancy that allows for both false negative and false positive error rates. This model can be represented as a two-component finite mixture model and can be easily fitted using freely available software. We provide an analysis of avian survey data using the proposed model and present results of a brief simulation study evaluating the performance of the maximum-likelihood estimator and the naive estimator in the presence of false positive errors.
Journal Article
AN OPERATOR THEORETIC APPROACH TO NONPARAMETRIC MIXTURE MODELS
by
Vandermeulen, Robert A.
,
Scott, Clayton D.
in
Algorithms
,
Finite element analysis
,
Linear equations
2019
When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations n per group needed for the mixture model to be identifiable, as a function of the number m of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components.
Journal Article
Topic extraction from extremely short texts with variational manifold regularization
2021
With the emerging of massive short texts, e.g., social media posts and question titles from Q&A systems, discovering valuable information from them is increasingly significant for many real-world applications of content analysis. The family of topic modeling can effectively explore the hidden structures of documents through the assumptions of latent topics. However, due to the sparseness of short texts, the existing topic models, e.g., latent Dirichlet allocation, lose effectiveness on them. To this end, an effective solution, namely Dirichlet multinomial mixture (DMM), supposing that each short text is only associated with a single topic, indirectly enriches document-level word co-occurrences. However, DMM is sensitive to noisy words, where it often learns inaccurate topic representations at the document level. To address this problem, we extend DMM to a novel Laplacian Dirichlet Multinomial Mixture (LapDMM) topic model for short texts. The basic idea of LapDMM is to preserve local neighborhood structures of short texts, enabling to spread topical signals among neighboring documents, so as to modify the inaccurate topic representations. This is achieved by incorporating the variational manifold regularization into the variational objective of DMM, constraining the close short texts with similar variational topic representations. To find nearest neighbors of short texts, before model inference, we construct an offline document graph, where the distances of short texts can be computed by the word mover’s distance. We further develop an online version of LapDMM, namely Online LapDMM, to achieve inference speedup on massive short texts. Carrying this implications, we exploit the spirit of stochastic optimization with mini-batches and an up-to-date document graph that can efficiently find approximate nearest neighbors instead. To evaluate our models, we compare against the state-of-the-art short text topic models on several traditional tasks, i.e., topic quality, document clustering and classification. The empirical results demonstrate that our models achieve very significant performance gains over the baseline models.
Journal Article
Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering
by
Sikka, Geeta
,
Agarwal, Neha
,
Awasthi, Lalit Kumar
in
Algorithms
,
Cluster analysis
,
Clustering
2024
With accelerated advancement of web 2.0, developers generally describe the functionality of services in short natural text. Keyword-based searching techniques are not an efficient way of discovering services from repositories. It suffers from vocabulary problems. Latent Dirichlet allocation (LDA) with word embedding techniques is widely adopted for efficiently extracting latent features from the service descriptions. However, LDA is not efficient on short text due to limited content and inadequate occurring words. The word vectors generated by word embedding techniques are of finer quality than topic modeling techniques. Gibbs sampling algorithm for Dirichlet multinomial mixture (GSDMM) model gives better results on web service description documents because it provides one topic corresponding to one document. In this paper, we evaluate the performance of GSDMM model with word embeddings and propose WV+GSDMMK model. The proposed model improves service-to-topic mapping by determining semantic similarity among features. K-means clustering is applied on service to topic representation. Results are evaluated on five real-time datasets based on intrinsic and extrinsic evaluation measures. Experimental results demonstrate that the proposed method outperforms other baseline techniques, and the accuracy score is also increased by 5%, 18%, 3%, 4%, and 6% on datasets DS1, DS2, DS3, DS4, and DS5, respectively.
Journal Article
Distinct Composition and Assembly Processes of Bacterial Communities in a River from the Arid Area: Ecotypes or Habitat Types?
2022
The composition, function, and assembly mechanism of the bacterial community are the focus of microbial ecology. Unsupervised machine learning may be a better way to understand the characteristics of bacterial metacommunities compared to the empirical habitat types. In this study, the composition, potential function, and assembly mechanism of the bacterial community in the arid river were analysed. The Dirichlet multinomial mixture method recognised four ecotypes across the three habitats (biofilm, water, and sediment). The bacterial communities in water are more sensitive to human activities. Bacterial diversity and richness in water decreased as the intensity of human activities increased from the region of water II to water I. Significant differences in the composition and potential function profile of bacterial communities between water ecotypes were also observed, such as higher relative abundance in the taxonomic composition of Firmicutes and potential function of plastic degradation in water I than those in water II. Habitat filtering may play a more critical role in the assembly of bacterial communities in the river biofilm, while stochastic processes dominate the assembly process of bacterial communities in water and sediment. In water I, salinity and mean annual precipitation were the main drivers shaping the biogeography of taxonomic structure, while mean annual temperature, total organic carbon, and ammonium nitrogen were the main environmental factors influencing the taxonomic structure in water II. These results would provide conceptual frameworks about choosing habitat types or ecotypes for the research of microbial communities among different niches in the aquatic environment.
Journal Article
Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clustering
by
Al-Mulla, Noha A
,
Jawarneh, Sana
,
ALmarashdeh, Ibrahim
in
Algorithms
,
Arabic language
,
Big Data
2024
Topic modeling methods proved to be effective for inferring latent topics from short texts. Dealing with short texts is challenging yet helpful for many real-world applications, due to the sparse terms in the text and the high dimensionality representation. Most of the topic modeling methods require the number of topics to be defined earlier. Similarly, methods based on Dirichlet Multinomial Mixture (DMM) involve the maximum possible number of topics before execution which is hard to determine due to topic uncertainty, and many noises exist in the dataset. Hence, a new approach called the Topic Clustering algorithm based on Levenshtein Distance (TCLD) is introduced in this paper, TCLD combines DMM models and the Fuzzy matching algorithm to address two key challenges in topic modeling: (a) The outlier problem in topic modeling methods. (b) The problem of determining the optimal number of topics. TCLD uses the initial clustered topics generated by DMM models and then evaluates the semantic relationships between documents using Levenshtein Distance. Subsequently, it determines whether to keep the document in the same cluster, relocate it to another cluster, or mark it as an outlier. The results demonstrate the efficiency of the proposed approach across six English benchmark datasets, in comparison to seven topic modeling approaches, with 83% improvement in purity and 67% enhancement in Normalized Mutual Information (NMI) across all datasets. The proposed method was also applied to a collected Arabic tweet and the results showed that only 12% of the Arabic short texts were incorrectly clustered, according to human inspection.
Journal Article
A Heuristic-Based Model for MMMs-Induced Fuzzy Co-Clustering with Dual Exclusive Partition
by
Hakui, Yoshiki
,
Ubukata, Seiki
,
Notsu, Akira
in
Clustering
,
Constraint modelling
,
Heuristic methods
2020
MMMs-induced fuzzy co-clustering achieves dual partition of objects and items by estimating two different types of fuzzy memberships. Because memberships of objects and items are usually estimated under different constraints, the conventional models mainly targeted object clusters only, but item memberships were designed for representing intra-cluster typicalities of items, which are independently estimated in each cluster. In order to improve the interpretability of co-clusters, meaningful items should not belong to multiple clusters such that each co-cluster is characterized by different representative items. In previous studies, the item sharing penalty approach has been applied to the MMMs-induced model but the dual exclusive constraints approach has not yet. In this paper, a heuristic-based approach in FCM-type co-clustering is modified for adopting in MMMs-induced fuzzy co-clustering and its characteristics are demonstrated through several comparative experiments.
Journal Article
Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
1997
We discuss Bayesian methods for model averaging and model selection among Bayesian-network models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) Minimum Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman-Stutz and Diagonal approximations are the most computationally efficient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman-Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori configuration are near a boundary.[PUBLICATION ABSTRACT]
Journal Article
Estimating invasive rodent abundance using removal data and hierarchical models
2025
Invasive rodents pose significant ecological, economic, and public health challenges. Robust methods are needed for estimating population abundance to guide effective management. Traditional methods such as capture-recapture are often impractical for invasive species due to ethical, legal and logistical constraints. Here, the application of hierarchical multinomial N-mixture models for estimating the abundance of invasive rodents using removal data is highlighted. Firstly, a simulation study was performed which demonstrated minimal bias, as well as good precision and reliable coverage of confidence intervals across a range of sampling scenarios. Additionally, the consequences of violating the population closure assumption were illustrated by showing how between-occasion dynamics can bias inference. Secondly, removal data was analyzed for two invasive rodent species, namely coypus ( Myocastor coypus ) in France and muskrats ( Ondatra zibethicus ) in the Netherlands. Using hierarchical multinomial N-mixture models, the effect of temperature on abundance was examined, while accounting for imperfect and time-varying capture probabilities. Additionally, this study demonstrated how to accommodate spatial variability using random effects, quantify uncertainty in parameter estimates, and account for violations of closure by fitting an open-population model to multi-year data. Taken together, these approaches demonstrate the flexibility and utility of hierarchical models in invasive species management.
Journal Article