Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,857
result(s) for
"Cluster level analysis"
Sort by:
Cluster randomised trials with a binary outcome and a small number of clusters: comparison of individual and cluster level analysis method
by
Hayes, Richard J.
,
Thompson, Jennifer A.
,
Fielding, Katherine L.
in
Bias
,
Binomial distribution
,
Clinical trials
2022
Background
Cluster randomised trials (CRTs) are often designed with a small number of clusters, but it is not clear which analysis methods are optimal when the outcome is binary. This simulation study aimed to determine (i) whether cluster-level analysis (CL), generalised linear mixed models (GLMM), and generalised estimating equations with sandwich variance (GEE) approaches maintain acceptable type-one error including the impact of non-normality of cluster effects and low prevalence, and if so (ii) which methods have the greatest power. We simulated CRTs with 8–30 clusters, altering the cluster-size, outcome prevalence, intracluster correlation coefficient, and cluster effect distribution. We analysed each dataset with weighted and unweighted CL; GLMM with adaptive quadrature and restricted pseudolikelihood; GEE with Kauermann-and-Carroll and Fay-and-Graubard sandwich variance using independent and exchangeable working correlation matrices. P-values were from a t-distribution with degrees of freedom (DoF) as clusters minus cluster-level parameters; GLMM pseudolikelihood also used Satterthwaite and Kenward-Roger DoF.
Results
Unweighted CL, GLMM pseudolikelihood, and Fay-and-Graubard GEE with independent or exchangeable working correlation matrix controlled type-one error in > 97% scenarios with clusters minus parameters DoF. Cluster-effect distribution and prevalence of outcome did not usually affect analysis method performance. GEE had the least power. With 20–30 clusters, GLMM had greater power than CL with varying cluster-size but similar power otherwise; with fewer clusters, GLMM had lower power with common cluster-size, similar power with medium variation, and greater power with large variation in cluster-size.
Conclusion
We recommend that CRTs with ≤ 30 clusters and a binary outcome use an unweighted CL or restricted pseudolikelihood GLMM both with DoF clusters minus cluster-level parameters.
Journal Article
The performance of small sample correction methods for controlling type I error when analyzing parallel cluster randomized trials: a systematic review of simulation studies
2025
Most cluster randomized trials (CRTs) include fewer than 50 clusters, yet the assumption of a “large sample” is relied upon to derive the sampling distributions of treatment effects. We review the current simulation study literature pertaining to small sample corrections for common analytical approaches for parallel CRTs.
We searched Ovid Medline and Web of Science up to August 30 2024 for simulation studies evaluating the performance of small sample corrections. We include binary and continuous outcomes analyzed using generalized linear mixed models, generalized estimating equations, or cluster-level approaches. Full-text screening and data abstraction was performed independently in duplicate.
Fourteen studies evaluated binary outcomes and 6 studies continuous outcomes. The number of clusters ranged from 4 to 200; the median smallest intracluster correlation coefficient was 0.001 [range 0.000–0.200]; largest intracluster correlation coefficient was 0.10 [range 0.05–0.70]; lowest prevalence was 0.25 [range 0.05–0.50]; and median coefficient of variation of cluster sizes 1.00 [range: 0.80–1.50]. For continuous outcomes, a cluster-level analysis (either unweighted or inverse-variance weighted) with a t-distribution (with between-within degree of freedom); a linear mixed model with a Satterthwaite correction; or a generalized estimating equation with the Fay and Graubard correction mostly preserve nominal type I error with as few as six clusters (although up to 40 clusters in some settings). Other approaches work less favorably (eg, Kenward-Roger is conservative even with 30 clusters). For binary outcomes, an unweighted or inverse-variance weighted cluster-level analysis can achieve nominal type I error (but can be anticonservative with small cluster sizes or low prevalence); as can a generalized linear mixed model with a between-within correction with as few as 10 clusters (but sometimes conservative with up to 30 clusters). Other corrections such as the Kenward-Roger or Satterthwaite correction are more conservative. For generalized estimating equations, the Mancl and DeRouen correction mostly seems to preserve nominal errors but can be anticonservative.
The literature on the performance of small sample corrections for parallel CRTs is complex. While the available corrections can maintain type I error with a very small number of clusters, more than 40 clusters are required to guarantee nominal type I error across all settings.
•With fewer than 50 clusters, analysis of data from a cluster trial requires a small sample correction.•Small sample corrections mostly maintain type I error close to 5% or are conservative.•For continuous outcomes, Satterthwaite and Fay/Graubard maintain type I error with 6 clusters.•For binary outcomes, the between-within and Mancl and DeRouen can sometimes maintain type I error.
Journal Article
How Sharp Are Classifications?
by
Pillar, Valerio DePatta
in
Animal, plant and microbial ecology
,
Biogeography
,
Biological and medical sciences
1999
Ecologists often use cluster analysis as a tool in the classification and mapping of entities such as communities or landscapes. The problem is that the researcher has to choose an adequate group partition level. In addition, cluster analysis techniques will always reveal groups, even if the data set does not have a clear group structure. This paper offers a method to test statistically for fuzziness of the partitions in cluster analysis of sampling units that can be used with a wide range of data types and clustering methods. The method applies bootstrap resampling. In this, partitions found in bootstrap samples are compared to the observed partition by the similarity of the sampling units that form the groups. The method tests the null hypothesis that the clusters in the bootstrap samples are random samples of their most similar corresponding clusters mapped one-to-one into the observed data. The resulting probability indicates whether the groups in the partition are sharp enough to reappear consistently in resampling. Examples with artificial and vegetational field data show that the test gives consistent and useful results. Though the method is computationally demanding, its implementation in a C++ program can run very fast on microcomputers.
Journal Article
Simple analysis of cRCT outcomes using aggregate cluster-level summaries
by
Walters, Stephen J
,
Campbell, Michael J
in
Aggregate cluster level analysis
,
independent samples t‐test Mann‐Whitney U test
,
MATHEMATICS
2014
This chapter describes the statistical methods used to compare outcomes between two groups in a cluster randomised controlled trial (cRCT) using aggregate cluster‐level summaries, such as the mean outcome per cluster. It shows how the two independent‐samples t‐test or its non‐parametric equivalent, the Mann–Whitney U test, can be used to compare cluster‐level summary statistics. This chapter further sets out the use of weighted and robust versions of the t‐test to allow for different cluster sizes and unequal variances in the outcome. It also describes how summary measures can be created and analysed for binary outcomes and matched‐pairs designs.
Book Chapter
A Practitioner's Guide to Cluster-Robust Inference
2015
We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year differences-in-differences studies with clustering on state. In such settings, default standard errors can greatly overstate estimator precision. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. We outline the basic method as well as many complications that can arise in practice. These include cluster-specific fixed effects, few clusters, multiway clustering, and estimators other than OLS.
Journal Article
Heavy metal pollutants and their spatial distribution in surface sediments from Thondi coast, Palk Bay, South India
by
Karthikeyan, Perumal
,
Muthuramalingam Subagunasekar
,
Antony, Joseph
in
Aluminum
,
Anthropogenic factors
,
Aquaculture
2021
BackgroundThe concentration of heavy metals and their spatial distribution in surface sediments collected from the Thondi coast, Palk Bay, South India were analysed in this study. The sediment grain size, pH, EC, and major elements (Fe, and Al), heavy metal concentrations (Mn, Cr, Zn, Cd, Ni, Cu, and Pb) were determined and the values for the geoaccumulation index (Igeo), enrichment factor (EF), potential contamination index (Cp), potential ecological risk index (RI), contamination factor (CF), modified contamination degree (mCd), degree of contamination (Cd), and potential contamination factors (Cp) were calculated based on their background values to determine the pollution level of the study area. Multivariate analysis such as Pearson’s correlation coefficient, principal component analysis/factor analysis (PCA/FA), cluster analysis, and regression analysis are a versatile method for identifying heavy metal sources and determining the relationship between pollutants in marine sediment.ResultsThe pollution indices, namely EF, CF, Cd, mCd, CP, RI, and Igeo, revealed that the heavy metal contamination was due to Cd, while a moderate level of contamination was caused by Cu, Zn, Pb, and Cr. The principal component analysis and correlation matrix analysis showed a strong positive loading for Cd due to its high level of contamination in the study area. Anthropogenic inputs such as municipal wastewater, domestic sewage discharge, fishing harbour activities, and industrial and aquaculture wastes led to the increased Cd concentration in the study area. Moreover, the pollution load index revealed that the sediments were polluted by heavy metals.ConclusionThe findings of this study revealed that the increased concentration of heavy metals in the study area increases the toxicity in the marine environment, thus affecting the ecosystem.
Journal Article
Establishment of a deformation forecasting model for a step-like landslide based on decision tree C5.0 and two-step cluster algorithms: a case study in the Three Gorges Reservoir area, China
2017
This study presents a hybrid approach based on two-step cluster and decision tree C5.0 algorithms to establish a deformation forecasting model for a step-like landslide. The Zhujiadian landslide, a typical step-like landslide in the Three Gorges Reservoir area, was selected as a case study. Approximately
, 6
years of historical records of landslide displacement, precipitation
,
and reservoir level were used to build the forecasting model. The forecasting model consisted of seven comprehensive rules governing hydrologic parameters and their magnitudes and was developed to predict landslide deformation. This model was applied to rapidly forecast the likelihood of step-like landslide deformation resulting from rainfall and water level fluctuations in the Three Gorges Reservoir area. Given the satisfactory accuracy of the trained model, the presented approach can be used to establish forecasting models for step-like landslides and to facilitate rapid decision making.
Journal Article
Environmental DNA reveals seasonal shifts and potential interactions in a marine community
2020
Environmental DNA (eDNA) analysis allows the simultaneous examination of organisms across multiple trophic levels and domains of life, providing critical information about the complex biotic interactions related to ecosystem change. Here we used multilocus amplicon sequencing of eDNA to survey biodiversity from an eighteen-month (2015–2016) time-series of seawater samples from Monterey Bay, California. The resulting dataset encompasses 663 taxonomic groups (at Family or higher taxonomic rank) ranging from microorganisms to mammals. We inferred changes in the composition of communities, revealing putative interactions among taxa and identifying correlations between these communities and environmental properties over time. Community network analysis provided evidence of expected predator-prey relationships, trophic linkages, and seasonal shifts across all domains of life. We conclude that eDNA-based analyses can provide detailed information about marine ecosystem dynamics and identify sensitive biological indicators that can suggest ecosystem changes and inform conservation strategies.
Increasingly, eDNA is being used to infer ecological interactions. Here the authors sample eDNA over 18 months in a marine environment and use co-occurrence network analyses to infer potential interactions among organisms from microbes to mammals, testing how they change over time in response to oceanographic factors.
Journal Article
Water quality assessment and pollution source apportionment using multi-statistic and APCS-MLR modeling techniques in Min River Basin, China
by
Zhang, Han
,
Cheng, Siqian
,
Yu, Haoran
in
Agricultural management
,
Agricultural wastes
,
Anthropogenic factors
2020
Anthropogenic activities pose challenges on security of water quality. Identifying potential sources of pollution and quantifying their corresponding contributions are essential for water management and pollution control. In our study, 2-year (2017–2018) water quality dataset of 15 parameters from eight sampling sites in tributaries and mainstream of the Min River was analyzed with multivariate statistical analysis methods and absolute principal component score-multiple linear regression (APCS-MLR) receptor modeling technique to reveal potential sources of pollution and apportion their contributions. Temporal and spatial cluster analysis (CA) classified 12 months into three periods exactly consistent with dry, wet, and normal seasons, and eight monitoring sites into two regions, lightly polluted (LP) and highly polluted (HP) regions, based on different levels of pollution caused by physicochemical properties and anthropogenic activities. The principal component analysis (PCA) identified five latent factors accounting for 75.84% and 73.46% of the total variance in the LP and HP regions, respectively. The main pollution sources in the two regions included agricultural activities, domestic sewage, and industrial wastewater discharge. APCS-MLR results showed that in the LP region, contribution of five potential pollution sources was ranked as agricultural non-point source pollution (22.13%) > seasonal effect and phytoplankton growth (19.86%) > leakage of septic tanks (15.73%) > physicochemical effect (12.86%) > industrial effluents and domestic sewage (11.59%), while in the HP region ranked as point source pollution from domestic and industrial discharges (20.81%) > municipal sewage (16.66%) > agricultural non-point source pollution (15.23%) > phytoplankton growth (14.82%) > natural and seasonal effects (12.67%). Based on the quantitative assessment of main pollution sources, the study can help policymakers to formulate strategies to improve water quality in different regions.
Journal Article
Groundwater quality assessment using SPSS based on multivariate statistics and water quality index of Gaya, Bihar (India)
2023
Groundwater is a valuable resource for developmental activities, and its demand is growing as surface water becoming scarce. Groundwater demand is increasing, resulting in reduction in water level and deterioration in water quality. A total of 156 groundwater samples were taken from Gaya, a district in Bihar (India), to check the safety of drinking water. The quality of groundwater was assessed using a water quality index (WQI). Analysed samples were assessed using a variety of physicochemical characteristics, and statistical methods principal component analysis (PCA) and cluster analysis (CA) were used as they are effective and efficient. As per the Gibbs, plot majority of the sample falls in the rock-water interaction and some evaporation dominance field. The domination of major cation in the order of Ca
2+
> Mg
2+
> Na
+
and the major anions followed the order of HCO
3
−
>
Cl
-
>
SO
4
2
-
>
NO
3
-
>
PO
4
2
-
. The KMO’s sample adequacy value of 0.703 and the significance level of Bartlett’s test of sphericity (0.0001) were indicating that PCA may be implemented. Using the PCA, the three components recovered explained 69.58% of the total variation. Cluster analysis classified the groundwater sample into three cluster based on the similarities among chemical parameters involved in groundwater quality. HCA exhibit less, intermediate, and heavily mineralized groundwater characteristics of groups I, II, and III, respectively. The major parameters affecting the water quality in the study region are TDS, Ca
2+
, Mg
2+
, HCO
3
−
,
Cl
-
,
F
-
,
and
PO
4
2
-
. WQI indicates 17% of the sample were found to be of very poor quality and not consumable. The study’s findings offer insights and understanding into groundwater pollution regimes. These results used for water quality assessment leading to improved environmental management and planning and in decision-making for water quality management.
Journal Article