Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
651
result(s) for
"Stratified sampling"
Sort by:
Linear and stratified sampling-based deep learning models for improving the river streamflow forecasting to mitigate flooding disaster
by
Birima, Ahmed H
,
Afan Haitham Abdulmohsin
,
Ahmed Ali Najah
in
Algorithms
,
Correlation analysis
,
Datasets
2022
Due to the need to reduce the flooding disaster, river streamflow prediction is required to be enhanced by the aid of deep learning algorithms. To achieve accurate model of streamflow prediction, it is important to provide suitable data sets to train the predictive models. Thus, this research has investigated two sampling approaches by using deep learning algorithms. These sampling approaches are linear and stratified selection in deep learning algorithms. This investigation has been performed on the Tigris River data set in terms of 2 scenarios. The first scenario: implementation of 12 different linear and stratified sampling selection in deep learning models. This scenario is trained and tested as much as a number of months per year—12 months. The second scenario: the complete time series is taken into consideration while performing the two approaches that are utilized in this research. Furthermore, the optimal input combination is identified via correlation analysis. To evaluate the performance of the algorithms utilized in this research, a number of metrics have been used which are Root Mean Square Error RMSE, Absolute Error AE, Relative Error RE, Relative Error Lenient REL, Relative Error Strict RES, Root Relative Squared Error RRSE, Coefficient of determination R2, Spearman rho and Kendall tau. The results have indicated that in both scenarios, stratified-deep learning (SDL) improves the accuracy by about 7.96–94.6 with respect to several assessment criteria. Thus, finally, it is worth mentioning that SDL outperforms Linear-deep learning (LDL) in monthly streamflow modelling.
Journal Article
What are the most crucial soil variables for predicting the distribution of mountain plant species? A comprehensive study in the Swiss Alps
by
Grand, Stéphanie
,
Spangenberg, Jorge E.
,
Pinto-Figueroa, Eric
in
alpine plants
,
biogeography
,
calcium oxide
2020
Aim To investigate the potential of a large range of soil variables to improve topo‐climatic models of plant species distributions in a temperate mountain region encompassing complex relief. Location The western Swiss Alps. Methods Fitting topo‐climatic models for >60 plant species across >250 sites with and without added soil predictor variables (>30). Testing included the following: (a) which soil variables improve plant species distribution models; (b) whether an optimal subset of soil variables can improve models for the majority of species and habitat types and (c) how much variation in plant species distributions soil variables alone explain. Results Geochemical variables (i.e. CaO, pH and inorganic carbon) and a drainage indicator (i.e. bulk soil water content) improved the predictive abilities of the models across the large majority of alpine plant species. The improvement of the models after the addition of soil information varied strongly between plant species and habitat types, but a trade‐off was found between the number of soil variables and the associated gain in model performance. Finally, across all species, one specific combination of soil variables – bulk soil water content + total phosphorus +δ13C – outperformed the commonly used topo‐climatic variables. Main conclusions Several soil variables significantly increased the predictive power of plant species distribution models in the temperate mountain region. Geochemical and drainage variables proved most important.
Journal Article
StrataSeq: A Workflow for Rapid Development of Molecular Databases for Hard‐To‐Identify Species
by
Baranski, Damian
,
Woodhouse, Jason
,
Merges, Anna K.
in
Arthropods
,
Biodiversity
,
Biodiversity loss
2025
Biodiversity loss necessitates improved monitoring of small, species‐rich taxa, such as protists, phyto‐ and zooplankton and terrestrial invertebrates. Traditional biomonitoring is often infeasible for these taxa due to complex morphology and few taxonomists. DNA‐based approaches offer promising solutions by enabling rapid species identification. However, the effectiveness of these methods depends on the completeness of molecular reference databases, which remain incomplete, particularly for remote and biodiverse regions. To address this, we propose the StrataSeq workflow, a systematic approach to optimise the generation of DNA reference databases for hard‐to‐identify taxa. Reference sequences allow us to connect molecular operational taxonomic units to a wealth of information available for many described taxa. StrataSeq consists of four key steps: (1) Habitat‐stratified sample subsetting selects a minimal but ecologically representative sample set by stratifying along key environmental gradients. (2) Prioritising morphospecies involves sorting specimens into morphospecies and ranking them based on their occurrence across samples, prioritising common taxa for detailed identification. (3) Detailed morphological identification focuses on common morphospecies to maximise taxonomic coverage while minimising effort. (4) Reference DNA sequence generation targets taxa lacking molecular references, with sequenced specimens deposited as museum vouchers. We benchmarked the StrataSeq workflow using two datasets of Collembola from grassland soils in Germany. In comparison with a species list generated by a more labour‐intensive traditional approach (identification of randomly selected individuals from all samples), the StrataSeq workflow captured 69% of species but required only 22% of the effort. StrataSeq is adaptable to various organism groups and environmental settings, including both spatial and temporal gradients. The workflow enhances the cost‐effectiveness of generating reference DNA databases, supporting improved biodiversity monitoring and ecological research. StrataSeq offers a scalable solution to accelerate the completion of molecular databases, thereby improving biomonitoring and ecosystem assessments under global change pressures. Biodiversity loss calls for better monitoring of small, species‐rich taxa, such as soil invertebrates, but traditional methods are limited due to complex morphology and lack of expertise. DNA‐based approaches offer a solution, but their effectiveness depends on incomplete molecular reference databases. The StrataSeq workflow optimises DNA reference generation through habitat stratification, prioritising common species for detailed identification and sequencing specimens lacking molecular references, making biodiversity monitoring more efficient and cost‐effective, as shown by its successful application to Collembola in German grasslands.
Journal Article
Improving IoT Security: The Impact of Dimensionality and Size Reduction on Intrusion Detection Performance
by
Nailah Al-madi
,
Amal Saif
,
Remah Younisse
in
dimensionality reduction; data reduction; autoencoders; stratified sampling; machine learning
2025
Intrusion detection in the Internet of Things (IoT) environments is essential to guarantee computer network security. Machine learning (ML) models are widely used to improve efficient detection systems. Meanwhile, with the increasing complexity and size of intrusion detection data, analyzing vast datasets using ML models is becoming more challenging and demanding in terms of computational resources. Datasets related to IoT environments usually come in very large sizes. This study investigates the impact of dataset reduction techniques on machine learning-based Intrusion Detection Systems (IDS) performance and efficiency. We propose a two-stage framework incorporating deep autoencoder-based feature reduction with stratified sampling to reduce the dimensionality and size of six publicly available IDS datasets, including BoT-IoT, CSE-CIC-IDS2018, and others. Multiple machine learning models, such as Random Forest, XGBoost, K-Nearest Neighbors, SVM, and AdaBoost, were evaluated using default parameters. Our results show that dataset reduction can decrease training time by up to 99% with minimal loss in F1-score, typically less than 1%. It is recognized that excessive size reduction can compromise detection accuracy for minority attack classes. However, employing a stratified sampling method can effectively maintain class distributions. The study highlights significant feature redundancy, particularly high correlation among features, across multiple IoT security-related datasets, motivating the use of dimensionality reduction techniques. These findings support the feasibility of efficient, scalable IDS implementations for real-world environments, especially in resource-constrained or real-time settings. [JJCIT 2025; 11(3.000): 351-368]
Journal Article
Survey design and analysis considerations when utilizing misclassified sampling strata
by
Mitani, Aya A.
,
Haneuse, Sebastien
,
Schildcrout, Jonathan S.
in
Analysis
,
Biobanks
,
Complex survey
2021
Background
A large multi-center survey was conducted to understand patients’ perspectives on biobank study participation with particular focus on racial and ethnic minorities. In order to enrich the study sample with racial and ethnic minorities, disproportionate stratified sampling was implemented with strata defined by electronic health records (EHR) that are known to be inaccurate. We investigate the effect of sampling strata misclassification in complex survey design.
Methods
Under non-differential and differential misclassification in the sampling strata, we compare the validity and precision of three simple and common analysis approaches for settings in which the primary exposure is used to define the sampling strata. We also compare the precision gains/losses observed from using a disproportionate stratified sampling scheme compared to using a simple random sample under varying degrees of strata misclassification.
Results
Disproportionate stratified sampling can result in more efficient parameter estimates of the rare subgroups (race/ethnic minorities) in the sampling strata compared to simple random sampling. When sampling strata misclassification is non-differential with respect to the outcome, a design-agnostic analysis was preferred over model-based and design-based analyses. All methods yielded unbiased parameter estimates but standard error estimates were lowest from the design-agnostic analysis. However, when misclassification is differential, only the design-based method produced valid parameter estimates of the variables included in the sampling strata.
Conclusions
In complex survey design, when the interest is in making inference on rare subgroups, we recommend implementing disproportionate stratified sampling over simple random sampling even if the sampling strata are misclassified. If the misclassification is non-differential, we recommend a design-agnostic analysis. However, if the misclassification is differential, we recommend using design-based analyses.
Journal Article
Inference on diversity from forest inventories: a review
2017
A number of international agreements and commitments emphasize the importance of appropriate monitoring protocols and assessments as prerequisites for sound conservation and management of the world’s forest ecosystems. Mandated periodic surveys, like forest inventories, provide a unique opportunity to identify and properly satisfy natural resource management information needs. Distinctively, there is an increasing need for detecting diversity by means of unambiguous diversity measures. Because all diversity measures are functions of tree species abundances, estimation of tree diversity indices and profiles is inevitably performed by estimating tree species abundances and then estimating indices and profiles as functions of the abundance estimates. This strategy can be readily implemented in the framework of current forest inventory approaches, where tree species abundances are routinely estimated by means of plots placed onto the surveyed area in accordance with probabilistic schemes. The purpose of this paper is to assess the effectiveness of this strategy by reviewing theoretical results from published case studies. Under uniform random sampling (URS), that is when plots are uniformly and independently located on the study region, consistency and asymptotic normality of diversity index estimators follow from standard limit theorems as the sampling effort increases. In addition, variance estimation and bias reduction are achieved using the jackknife method. Despite its theoretical simplicity, URS may lead to uneven coverage of the study region. In order to avoid unbalanced sampling, the use of tessellation stratified sampling (TSS) is suggested. TSS involves covering the study region by a polygonal grid and randomly selecting a plot in each polygon. Under TSS, the diversity index estimators are consistent, asymptotically normal and more precise than those achieved using URS. Variance estimation is possible and there is no need to reduce bias.
Journal Article
A spatiotemporal Richards–Schnute growth model and its estimation when data are collected through length-stratified sampling
2020
We propose a spatiotemporal generalized von Bertalanffy (vonB) growth model that also includes between-individual (BI) variation and male/female correlation. The generalized vonB model includes the effect of maturation on growth. The model and the methodology are applied to a long time-series of survey observations of age and length for American plaice on the Grand Bank off the northeast coast of Canada. The bias in age-length data due to size selectivity of the survey gear is accounted for. The survey design includes length-stratified age sampling which is a type of response selective sampling design for growth model estimation. We propose and implement a conditional empirical proportion likelihood approach for these data. Neglecting this sampling scheme can lead to seriously biased estimation results. We found that a 6-parameter growth model is necessary for capturing the biphasic growth patterns of the American plaice on the Grand Bank, and the survey gear selectivity and BI variation are important for a good model fit. We proposed an empirically optimal BI variation model for this data. Our estimation results indicate that there are substantial differences in size-at-age for male and female American plaice, and this changes over time and between regions.
Journal Article
Estimation for Two Sensitive Variables Using Randomization Response Model Under Stratified Random Sampling
2025
When direct survey are about sensitive characteristics such as addiction to drugs, alcoholism, proneness to tax invasion and sexual violence, nonresponse bias and response bias become serious problems because people oftentimes do not wish to give true information. In this study, when the population is composed of strata such as gender, region, age group, we consider the simple model and crossed model by applying stratified random sampling which can estimate not only the domain population proportion but also the whole population proportion for two sensitive attributes such as drug use and sexual violence in the same time. In addition, when the size of each population stratum is unknown in stratified random sampling, we propose the simple model and crossed model by using stratified double sampling method. In each proposed survey design, the sample allocation of each stratum is dealt with in consideration of proportional allocation and optimal one. We compare the efficiency between the simple model and the crossed model according to the proposed stratified random sampling design.
Journal Article
On the Expected ℒ2-Discrepancy of Stratified Samples from Parallel Lines
2023
We study the expected ℒ
-discrepancy of stratified samples generated from special equi-volume partitions of the unit square. The partitions are defined via parallel lines that are all orthogonal to the diagonal of the square. It is shown that the expected discrepancy of stratified samples derived from these partitions is a factor 2 smaller than the expected discrepancy of the same number of i.i.d uniformly distributed random points in the unit square. We conjecture that this is best possible among all partitions generated from parallel lines.
Journal Article
Copula Modeling and Uncertainty Propagation in Field‐Scale Simulation of CO2 Fault Leakage
by
Pettersson, Per
,
Sandve, Tor Harald
,
Keilegavlen, Eirik
in
Adaptive sampling
,
adaptive sampling methods
,
Brines
2025
Subsurface storage of CO2${\\mathrm{C}\\mathrm{O}}_{2}$is an important means to mitigate climate change, and the North Sea hosts considerable potential storage resources. To investigate the fate of CO2${\\mathrm{C}\\mathrm{O}}_{2}$over decades in vast reservoirs, numerical simulation based on realistic models is essential. Faults and other complex geological structures introduce modeling challenges as their effects on storage operations are subject to high uncertainty. We present a computational framework for forward propagation of uncertainty, including stochastic upscaling and copula representation of multivariate distributions for a CO2${\\mathrm{C}\\mathrm{O}}_{2}$storage site model with faults. The Vette fault zone in the Smeaheia formation in the North Sea is used as a test case. The stochastic upscaling method reduces the number of stochastic dimensions and the cost of evaluating the reservoir model. Copulas provide representation of dependent multidimensional random variables and a good fit to data, allow fast sampling and coupling to the forward propagation method via independent uniform random variables. The non‐stationary correlation within the upscaled flow functions are accurately captured by a data‐driven transformation model. The uncertainty in upscaled flow functions and other uncertain parameters are efficiently propagated to leakage estimates using numerical reservoir simulation of a two‐phase system of CO2 and brine. The expectations of leakage are estimated by an adaptive stratified sampling technique which effectively allocates samples in stochastic space. We demonstrate cost reduction compared to standard Monte Carlo of one or two orders of magnitude for simpler test cases, and factors 2–8 cost reduction for stochastic multi‐phase flow properties and more complex stochastic models. Plain Language Summary To limit global warming, greenhouse gases like CO2${\\mathrm{C}\\mathrm{O}}_{2}$can be injected into large reservoirs of porous rocks below the bottom of the sea instead of being emitted to the atmosphere. CO2${\\mathrm{C}\\mathrm{O}}_{2}$will slowly move in the reservoirs and may encounter faults, geological features that have properties that can either facilitate or stop the CO2 from moving further in the underground. It is important that the CO2 remains in the underground, and hence it is important to understand how it is affected by the fault, in particular when many physical rock properties are unknown due to very few or inexact measurements. We present methods to model the uncertainty in and surrounding the faults and show how more accurate computer simulations can be obtained by a combination of appropriate statistical models and adapted methods to investigate the effect of the fault uncertainty on the risk for leakage of CO2. Key Points Framework for efficient stochastic upscaling, modeling, and uncertainty propagation for CO2 storage, demonstrated on a North Sea test case Stochastic fault properties upscaled to two‐phase flow functions with reduced complexity and a format suitable for uncertainty propagation Significant computational cost reduction for adaptive stratified sampling compared to Monte Carlo sampling in estimation of CO2 leakage
Journal Article