Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
298
result(s) for
"inclusion probability"
Sort by:
Bayesian Adaptive Sampling for Variable Selection and Model Averaging
by
Ghosh, Joyee
,
Littman, Michael L.
,
Clyde, Merlise A.
in
Algorithms
,
Bayesian analysis
,
Bayesian Computing
2011
For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models, BAS is guaranteed to enumerate the model space in 2
p
iterations where p is the number of potential variables under consideration. For larger problems where sampling is required, we provide conditions under which BAS provides perfect samples without replacement. When the sampling probabilities in the algorithm are the marginal variable inclusion probabilities, BAS may be viewed as sampling models \"near\" the median probability model of Barbieri and Berger. As marginal inclusion probabilities are not known in advance, we discuss several strategies to estimate adaptively the marginal inclusion probabilities within BAS. We illustrate the performance of the algorithm using simulated and real data and show that BAS can outperform Markov chain Monte Carlo methods. The algorithm is implemented in the
R
package
BAS
available at CRAN. This article has supplementary material online.
Journal Article
Effects of ignoring survey design information for data reuse
by
Schulz, Torsti
,
Vanhatalo, Jarno
,
Trenkel, Verena M.
in
accessible
,
Bias
,
Computer Simulation
2021
Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: “Are aggregated databases currently providing the right information to enable effective and unbiased reuse?” We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.
Journal Article
A general stream sampling design
2024
With the emergence of the big data era, the need for sampling methods that select samples based on the order of the observed units is felt more than ever. In order to meet this necessity, a new sequential unequal probability sampling method is proposed. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows a selection of a sample from a stream. This method consists in using sliding windows which are a kind of strata of controllable size. This method also allows the sample to be spread in a controlled manner throughout the population. A special case of the method with windows of size one leads to deciding on each sampling unit immediately after observing it. The implementation of size one windows is simple and will be presented here based on an algorithm with a single condition. Also, by selecting the windows of size two, we will have one of the optimal stream sampling methods, which results in a well-spread stream sample with positive second-order inclusion probabilities.
Journal Article
Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information
Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI−1 value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI−1) values were examined and compared with corresponding log(LR) values. Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI−1 increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice.
Journal Article
Approaches to Improving Survey-Weighted Estimates
by
Thompson, Mary
,
Elliott, Michael R.
,
Little, Roderick J. A.
in
Bias
,
Design modifications
,
Estimates
2017
In sample surveys, the sample units are typically chosen using a complex design. This may lead to a selection effect and, if uncorrected in the analysis, may lead to biased inferences. To mitigate the effect on inferences of deviations from a simple random sample a common technique is to use survey weights in the analysis. This article reviews approaches to address possible inefficiency in estimation resulting from such weighting. To improve inferences we emphasize modifications of the basic design-based weight, that is, the inverse of a unit's inclusion probability. These techniques include weight trimming, weight modelling and incorporating weights via models for survey variables. We start with an introduction to survey weighting, including methods derived from both the design and model-based perspectives. Then we present the rationale and a taxonomy of methods for modifying the weights. We next describe an extensive numerical study to compare these methods. Using as the criteria relative bias, relative mean square error, confidence or credible interval width and coverage probability, we compare the alternative methods and summarize our findings. To supplement this numerical study we use Texas school data to compare the distributions of the weights for several methods. We also make general recommendations, describe limitations of our numerical study and make suggestions for further investigation.
Journal Article
Evaluation of CMIP5 models and ensemble climate projections using a Bayesian approach: a case study of the Upper Indus Basin, Pakistan
2021
The availability of a variety of Global Climate Models (GCMs) has increased the importance of the selection of suitable GCMs for impact assessment studies. In this study, we have used Bayesian Model Averaging (BMA) for GCM(s) selection and ensemble climate projection from the output of thirteen CMIP5 GCMs for the Upper Indus Basin (UIB), Pakistan. The results show that the ranking of the top best models among thirteen GCMs is not uniform regarding maximum, minimum temperature, and precipitation. However, some models showed the best performance for all three variables. The selected GCMs were used to produce ensemble projections via BMA for maximum, minimum temperature and precipitation under RCP4.5 and RCP8.5 scenarios for the duration of 2011–2040. The ensemble projections show a higher correlation with observed data than individual GCM’s output, and the BMA’s prediction well captured the trend of observed data. Furthermore, the 90% prediction intervals of BMA’s output closely captured the extreme values of observed data. The projected results of both RCPs were compared with the climatology of baseline duration (1981–2010) and it was noted that RCP8.5 show more changes in future temperature and precipitation compared to RCP4.5. For maximum temperature, there is more variation in monthly climatology for the duration of 2011–2040 in the first half of the year; however, under the RCP8.5, higher variation was noted during the winter season. A decrease in precipitation is projected during the months of January and August under the RCP4.5 while under RCP8.5, decrease in precipitation was noted during the months of March, May, July, August, September, and October; however, the changes (decrease/increase) are higher than under the RCP4.5.
Journal Article
Bayesian Variable Selection Under Collinearity
by
Ghosh, Joyee
,
Ghattas, Andrew E.
in
Bayesian analysis
,
Bayesian model averaging
,
Data analysis
2015
In this article, we highlight some interesting facts about Bayesian variable selection methods for linear regression models in settings where the design matrix exhibits strong collinearity. We first demonstrate via real data analysis and simulation studies that summaries of the posterior distribution based on marginal and joint distributions may give conflicting results for assessing the importance of strongly correlated covariates. The natural question is which one should be used in practice. The simulation studies suggest that posterior inclusion probabilities and Bayes factors that evaluate the importance of correlated covariates jointly are more appropriate, and some priors may be more adversely affected in such a setting. To obtain a better understanding behind the phenomenon, we study some toy examples with Zellner's g-prior. The results show that strong collinearity may lead to a multimodal posterior distribution over models, in which joint summaries are more appropriate than marginal summaries. Thus, we recommend a routine examination of the correlation matrix and calculation of the joint inclusion probabilities for correlated covariates, in addition to marginal inclusion probabilities, for assessing the importance of covariates in Bayesian variable selection.
Journal Article
Spatially Clustered Survey Designs
by
Hoskins, Andrew J.
,
Lawrence, Emma
,
Foster, Scott D.
in
Agriculture
,
Biostatistics
,
Clustering
2024
Direct observation, through surveys, underpins nearly all aspects of environmental sciences. Survey design theory has evolved to make sure that sampling is as efficient as possible whilst remaining robust and fit-for-purpose. However, these methods frequently focus on theoretical aspects and often increase the logistical difficulty of performing the survey. Usually, the survey design process will place individual sampling locations one-by-one throughout the sampling area (e.g. random sampling). A consequence of these approaches is that there is usually a large cost in travel time between locations. This can be a huge problem for surveys that are large in spatial scale or are in inhospitable environments where travel is difficult and/or costly. Our solution is to constrain the sampling process so that the sample consists of spatially clustered observations, with all sites within a cluster lying within a predefined distance. The spatial clustering is achieved by a two-stage sampling process: first cluster centres are sampled and then sites within clusters are sampled. A novelty of our approach is that these clusters are allowed to overlap and we present the necessary calculations required to adjust the specified inclusion probabilities so that they are respected in the clustered sample. The process is illustrated with a real and on-going large-scale ecological survey. We also present simulation results to assess the methods performance. Spatially clustered survey design provides a formal statistical framework for grouping sample sites in space whilst maintaining multiple levels of spatial-balance. These designs reduce the logistical burden placed on field workers by decreasing total travel time and logistical overheads.Supplementary materials accompanying this paper appear on-line.
Journal Article
Association of Combined Per- and Polyfluoroalkyl Substances and Metals with Chronic Kidney Disease
2024
Background: Exposure to environmental pollutants such as metals and Per- and Polyfluoroalkyl Substances (PFAS) has become common and increasingly associated with a decrease in the estimated Glomerular Filtration Rate (eGFR), which is a marker often used to measure chronic kidney disease (CKD). However, there are limited studies involving the use of both eGFR and the urine albumin creatinine ratio (uACR), which are more comprehensive markers to determine the presence of CKD and the complexity of pollutant exposures and response interactions, especially for combined metals and PFAS, which has not been comprehensively elucidated. Objective: This study aims to assess the individual and combined effects of perfluorooctanoic acid (PFOA), perfluorooctanesulfonic acid (PFOS), Cadmium (Cd), Mercury (Hg), and Lead (Pb) exposure on CKD using data from the National Health and Nutritional Examination Survey (NHANES) 2017–2018. Methods: We employed the use of bivariate logistic regression and Bayesian Kernel Machine Regression (BKMR) in our analysis of the data. Results: Logistic regression results revealed a positive association between PFOA and CKD. Our BKMR analysis revealed a non-linear and bi-phasic relationship between the metal exposures and CKD. In our univariate exposure–response function plot, Cd and Hg exhibited a U and N-shaped interaction, which indicated a non-linear and non-additive relationship with both low and high exposures associated with CKD. In addition, the bivariate exposure–response function between two exposures in a mixture revealed that Cd had a U-shaped relationship with CKD at different quantiles of Pb, Hg, PFOA, and PFOS, indicating that both low and high levels of Cd is associated with CKD, implying a non-linear and complex biological interaction. Hg’s interaction plot demonstrated a N-shaped association across all quantiles of Cd, with the 75th quantile of Pb and the 50th and 75th quantiles of PFOA and PFOS. Furthermore, the PIP results underscored Cd’s consistent association with CKD (PIP = 1.000) followed by Hg’s (PIP = 0.9984), then PFOA and PFOS with a closely related PIP of 0.7880 and 0.7604, respectively, and finally Pb (PIP = 0.6940), contributing the least among the five environmental pollutants on CKD, though significant. Conclusions: Our findings revealed that exposure to environmental pollutants, particularly Hg and Cd, are associated with CKD. These findings highlight the need for public health interventions and strategies to mitigate the cumulative effect of PFAS and metal exposure and elucidate the significance of utilizing advanced statistical methods and tools to understand the impact of environmental pollutants on human health. Further research is needed to understand the mechanistic pathways of PFAS and metal-induced kidney injury and CKD, and longitudinal studies are required to ascertain the long-term impact of these environmental exposures.
Journal Article
Bark Stripping Damage Caused by Red Deer (Cervus elaphus L.): Inventory Design Using Hansen–Hurwitz and Horvitz–Thompson Approach
2025
This study investigates the use of adaptive cluster sampling (ACS) for estimating bark stripping damage in forests, employing the Hansen–Hurwitz (HH) and Horvitz–Thompson (HT) estimators. Through simulations, we analysed the total, summer, and new bark stripping damage with varying grid sizes and sample sizes in eight full-censused stands in Northern Styria/Austria. The results showed that the HT estimator consistently had lower standard errors (SEs) (variability of the sample mean from the true population mean) than the HH estimator. SEs decreased with increasing grid space for new and summer damages, but increased for total damage up to 35 m, then remained stable. Inclusion probabilities (IP) were highest for total damage. ACS showed precision gains, particularly for rare and clustered damages like new damage, but did not achieve the target SE of 10%. Adaptive sampling is most beneficial for monitoring rare and clustered events, though precision remains limited, especially for new damage. The study suggests ACS is suitable for rare damage types (e.g., summer and new bark stripping wounds) but requires further refinement to meet operational precision targets. Future work should focus on integrating adaptive designs with practical field methods, such as fixed-radius plots and refined damage criteria.
Journal Article