Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
22,128 result(s) for "RANDOM SAMPLING"
Sort by:
RMS-FlowNet++: Efficient and Robust Multi-scale Scene Flow Estimation for Large-Scale Point Clouds
The proposed RMS-FlowNet++ is a novel end-to-end learning-based architecture for accurate and efficient scene flow estimation that can operate on high-density point clouds. For hierarchical scene flow estimation, existing methods rely on expensive Farthest-Point-Sampling (FPS) to sample the scenes, must find large correspondence sets across the consecutive frames and/or must search for correspondences at a full input resolution. While this can improve the accuracy, it reduces the overall efficiency of these methods and limits their ability to handle large numbers of points due to memory requirements. In contrast to these methods, our architecture is based on an efficient design for hierarchical prediction of multi-scale scene flow. To this end, we develop a special flow embedding block that has two advantages over the current methods: First, a smaller correspondence set is used, and second, the use of Random-Sampling (RS) is possible. In addition, our architecture does not need to search for correspondences at a full input resolution. Exhibiting high accuracy, our RMS-FlowNet++ provides a faster prediction than state-of-the-art methods, avoids high memory requirements and enables efficient scene flow on dense point clouds of more than 250K points at once. Our comprehensive experiments verify the accuracy of RMS-FlowNet++ on the established FlyingThings3D data set with different point cloud densities and validate our design choices. Furthermore, we demonstrate that our model has a competitive ability to generalize to the real-world scenes of the KITTI data set without fine-tuning.
Flood Susceptibility Assessment with Random Sampling Strategy in Ensemble Learning (RF and XGBoost)
Due to the complex interaction of urban and mountainous floods, assessing flood susceptibility in mountainous urban areas presents a challenging task in environmental research and risk analysis. Data-driven machine learning methods can evaluate flood susceptibility in mountainous urban areas lacking essential hydrological data, utilizing remote sensing data and limited historical inundation records. In this study, two ensemble learning algorithms, Random Forest (RF) and XGBoost, were adopted to assess the flood susceptibility of Kunming, a typical mountainous urban area prone to severe flood disasters. A flood inventory was created using flood observations from 2018 to 2022. The spatial database included 10 explanatory factors, encompassing climatic, geomorphic, and anthropogenic factors. Artificial Neural Network (ANN) and Support Vector Machine (SVM) were selected for model comparison. To minimize the influence of expert opinions on model training, this study employed a strategy of uniformly random sampling in historically non-flooded areas for negative sample selection. The results demonstrated that (1) ensemble learning algorithms offer higher accuracy than other machine learning methods, with RF achieving the highest accuracy, evidenced by an area under the curve (AUC) of 0.87, followed by XGBoost at 0.84, surpassing both ANN (0.83) and SVM (0.82); (2) the interpretability of ensemble learning highlighted the differences in the potential distribution of the training data’s positive and negative samples. Feature importance in ensemble learning can be utilized to minimize human bias in the collection of flooded-site samples, more targeted flood susceptibility maps of the study area’s road network were obtained; and (3) ensemble learning algorithms exhibited greater stability and robustness in datasets with varied negative samples, as evidenced by their performance in F1-Score, Kappa, and AUC metrics. This paper further substantiates the superiority of ensemble learning in flood susceptibility assessment tasks from the perspectives of accuracy, interpretability, and robustness, enhances the understanding of the impact of negative samples on such assessments, and optimizes the specific process for urban flood susceptibility assessment using data-driven methods.
Evaluating Sampling Methods for Content Analysis of Twitter Data
Despite the existing evaluation of the sampling options for periodical media content, only a few empirical studies have examined whether probability sampling methods can be applicable to social media content other than simple random sampling. This article tests the efficiency of simple random sampling and constructed week sampling, by varying the sample size of Twitter content related to the 2014 South Carolina gubernatorial election. We examine how many weeks were needed to adequately represent 5 months of tweets. Our findings show that a simple random sampling is more efficient than a constructed week sampling in terms of obtaining a more efficient and representative sample of Twitter data. This study also suggests that it is necessary to produce a sufficient sample size when analyzing social media content.
Assessing the Accuracy and Consistency of Six Fine-Resolution Global Land Cover Products Using a Novel Stratified Random Sampling Validation Dataset
Over the past decades, benefiting from the development of computing capacity and the free access to Landsat and Sentinel imagery, several fine-resolution global land cover (GLC) products (with a resolution of 10 m or 30 m) have been developed (GlobeLand30, FROM-GLC30, GLC_FCS30, FROM-GLC10, European Space Agency (ESA) WorldCover and ESRI Land Cover). However, there is still a lack of consistency analysis or comprehensive accuracy assessment using a common validation dataset for these GLC products. In this study, a novel stratified random sampling GLC validation dataset (SRS_Val) containing 79,112 validation samples was developed using a visual interpretation method, significantly increasing the number of samples of heterogeneous regions and rare land-cover types. Then, we quantitatively assessed the accuracy of these six GLC products using the developed SRS_Val dataset at global and regional scales. The results reveal that ESA WorldCover achieved the highest overall accuracy (of 70.54% ± 9%) among the global 10 m land cover products, followed by FROM-GLC10 (68.95% ± 8%) and ESRI Land Cover (58.90% ± 7%) and that GLC_FCS30 had the best overall accuracy (of 72.55% ± 9%) among the global 30 m land cover datasets, followed by GlobeLand30 (69.96% ± 9%) and FROM-GLC30 (66.30% ± 8%). The mapping accuracy of the GLC products decreased significantly with the increased heterogeneity of landscapes, and all GLC products had poor mapping accuracies in countries with heterogeneous landscapes, such as some countries in Central and Southern Africa. Finally, we investigated the consistency of six GLC products from the perspective of area distributions and spatial patterns. It was found that the area consistencies between the five GLC products (except ESRI Land Cover) were greater than 85% and that the six GLC products showed large discrepancies in area consistency for grassland, shrubland, wetlands and bare land. In terms of spatial patterns, the totally inconsistent pixel proportions of the 10 m and 30 m GLC products were 23.58% and 14.12%, respectively, and these inconsistent pixels were mainly distributed in transition zones, complex terrains regions, heterogeneous landscapes, or mixed land-cover types. Therefore, the SRS_Val dataset well supports the quantitative evaluation of fine-resolution GLC products, and the assessment results provide users with quantitative metrics to select GLC products suitable for their needs.
Variable surrogate model-based particle swarm optimization for high-dimensional expensive problems
Many industrial applications require time-consuming and resource-intensive evaluations of suitable solutions within very limited time frames. Therefore, many surrogate-assisted evaluation algorithms (SAEAs) have been widely used to optimize expensive problems. However, due to the curse of dimensionality and its implications, scaling SAEAs to high-dimensional expensive problems is still challenging. This paper proposes a variable surrogate model-based particle swarm optimization (called VSMPSO) to meet this challenge and extends it to solve 200-dimensional problems. Specifically, a single surrogate model constructed by simple random sampling is taken to explore different promising areas in different iterations. Moreover, a variable model management strategy is used to better utilize the current global model and accelerate the convergence rate of the optimizer. In addition, the strategy can be applied to any SAEA irrespective of the surrogate model used. To control the trade-off between optimization results and optimization time consumption of SAEAs, we consider fitness value and running time as a bi-objective problem. Applying the proposed approach to a benchmark test suite of dimensions ranging from 30 to 200 and comparisons with four state-of-the-art algorithms show that the proposed VSMPSO achieves high-quality solutions and computational efficiency for high-dimensional problems.
Area based stratified random sampling using geospatial technology in a community-based survey
Background Most studies among Hispanics have focused on individual risk factors of obesity, with less attention on interpersonal, community and environmental determinants. Conducting community based surveys to study these determinants must ensure representativeness of disparate populations. We describe the use of a novel Geographic Information System (GIS)-based population based sampling to minimize selection bias in a rural community based study. Methods We conducted a community based survey to collect and examine social determinants of health and their association with obesity prevalence among a sample of Hispanics and non-Hispanic whites living in a rural community in the Southeastern United States. To ensure a balanced sample of both ethnic groups, we designed an area stratified random sampling procedure involving three stages: (1) division of the sampling area into non-overlapping strata based on Hispanic household proportion using GIS software; (2) random selection of the designated number of Census blocks from each stratum; and (3) random selection of the designated number of housing units (i.e., survey participants) from each Census block. Results The proposed sample included 109 Hispanic and 107 non-Hispanic participants to be recruited from 44 Census blocks. The final sample included 106 Hispanic and 111 non-Hispanic participants. The proportion of Hispanic surveys completed per strata matched our proposed distribution: 7% for strata 1, 30% for strata 2, 58% for strata 3 and 83% for strata 4. Conclusion Utilizing a standardized area based randomized sampling approach allowed us to successfully recruit an ethnically balanced sample while conducting door to door surveys in a rural, community based study. The integration of area based randomized sampling using tools such as GIS in future community-based research should be considered, particularly when trying to reach disparate populations.
On Estimating Multi- Stress Strength Reliability for Inverted Kumaraswamy Under Ranked Set Sampling with Application in Engineering
The terrible operating constraints of many real-world events cause systems to malfunction regularly. The failure of systems to perform their intended duties when they reach their lowest, highest, or both extreme operating conditions is a phenomenon that researchers rarely focus on. The multi-stress strength reliability R = P ( W < X < Z ) is deemed in this study for a component whose strength X falls between two stresses, W , and Z , where X , W , and Z are independently inverted Kumaraswamy distributed. Both maximum likelihood and maximum product spacing procedures are employed to obtain the reliability estimator under simple random sampling (SRS) and ranked set sampling (RSS) methodologies. Four scenarios for reliability estimators are considered. The reliability estimator in the first and second cases can be determined by applying the same sample design (RSS/SRS) to the strength and stress distributions. When the sample data for W and Z originate from RSS while those for X are acquired from SRS, the third reliability estimator is calculated. The drawn data of the strength and stress random variables, which are obtained from SRS and RSS, respectively, are taken into consideration in the final scenario. The effectiveness of the suggested estimators is compared using a comprehensive computer simulation. Lastly, three real data sets have been used to determine reliability estimators.
On Solar‐Terrestrial Interactions: Correlation Between Intense Geomagnetic Storms and Global Strong Earthquakes
Solar‐terrestrial interactions are a topic of considerable interest and attention. In this paper, we introduce a new method called shift neighborhood matching correlation (SNMC) to investigate the relationship between geomagnetic storms and earthquakes across various time lags. To assess the significance of the correlations, we employ random sampling series to replace one or both of the time series. Analyzing nearly a century of data, our results reveal an increased likelihood of earthquakes following geomagnetic storms, particularly 27–28 days afterward. Conventional statistical methods confirm the significance of this correlation. However, when earthquakes are replaced with random series, the statistical significance derived from the SNMC method diminishes, highlighting the intrinsic properties of geomagnetic storm phenomena. We discuss two potential physical mechanisms to explain the correlations. While the earthquake probability gains solely based on geomagnetic storms is insufficient for reliable prediction, it may be useful in the integrated multi‐geophysical predictive models.
Regional Accuracy Assessment of 30-Meter GLC_FCS30, GlobeLand30, and CLCD Products: A Case Study in Xinjiang Area
With the development of remote sensing technology, a number of fine-resolution (30-m) global/national land cover (LC) products have been developed. However, accuracy assessments for the developed LC products are commonly conducted at global and national scales. Due to the limited availability of representative validation observations and reference data, knowledge relating to the accuracy and applicability of existing LC products on a regional scale is limited. Since Xinjiang, China, exhibits diverse surface cover and fragmented urban landscapes, existing LC products generally have high classification uncertainty in this region. This makes Xinjiang suitable for assessing the accuracy and consistency of exiting fine-resolution land cover products. In order to improve knowledge of the accuracy of existing fine-resolution LC products at the regional scale, Xinjiang province was selected as the case area. First, we employed an equal-area stratified random sampling approach with climate, population density, and landscape heterogeneity information as constraints, along with the hexagonal discrete global grid system (HDGGS) as basic sampling grids to develop a high-density land cover validation dataset for Xinjiang (HDLV-XJ) in 2020. This is the first publicly available regionally high-density validation dataset that can support analysis at a regional scale, comprising a total of 20,932 validation samples. Then, based on the generated HDLV-XJ dataset, the accuracies and consistency among three widely used 30-m LC products, GLC_FCS30, GlobeLand30, and CLCD, were quantitatively evaluated. The results indicated that the CLC_FCS30 exhibited the highest overall accuracy (88.10%) in Xinjiang, followed by GlobeLand30 (with an overall accuracy of 83.58%) and CLCD (81.57%). Moreover, through a comprehensive analysis of the relationship between different environmental conditions and land cover product performance, we found that GlobeLand30 performed best in regions with high landscape fragmentation, while GLC_FCS30 stood out as the most outstanding product in areas with uneven proportions of land cover types. Our study provides a novel insight into the suitability of these three widely-used LC products under various environmental conditions. The findings and dataset can provide valuable insights for the application of existing LC products in different environment conditions, offering insights into their accuracies and limitations.
Stable and semi-stable sampling approaches for continuously used samples
Knowledge and information systems are usually measured by labeling the relevance of results corresponding to a sample of user queries. In practical systems like search engines, such measurement needs to be performed continuously, such as daily or weekly. This creates a trade-off between (a) representativeness of query sample to current query traffic of the product; (b) labeling cost—if we keep the same query sample, results would be similar allowing us to reuse their labels; and (c) overfitting caused by continuous usage of same query sample. In this paper, we explicitly formulate this tradeoff, propose two new variants—stable and semi-stable—to simple and weighted random sampling and show that they outperform existing approaches for the continuous usage settings, including monitoring/debugging search engine or comparing ranker candidates.