Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,165 result(s) for "Massive data points"
Sort by:
Toward data-driven, dynamical complex systems approaches to disaster resilience
With rapid urbanization and increasing climate risks, enhancing the resilience of urban systems has never been more important. Despite the availability of massive datasets of human behavior (e.g., mobile phone data, satellite imagery), studies on disaster resilience have been limited to using static measures as proxies for resilience. However, static metrics have significant drawbacks such as their inability to capture the effects of compounding and accumulating disaster shocks; dynamic interdependencies of social, economic, and infrastructure systems; and critical transitions and regime shifts, which are essential components of the complex disaster resilience process. In this article, we argue that the disaster resilience literature needs to take the opportunities of big data and move toward a different research direction, which is to develop data-driven, dynamical complex systems models of disaster resilience. Data-driven complex systems modeling approaches could overcome the drawbacks of static measures and allow us to quantitatively model the dynamic recovery trajectories and intrinsic resilience characteristics of communities in a generic manner by leveraging large-scale and granular observations. This approach brings a paradigm shift in modeling the disaster resilience process and its linkage with the recovery process, paving the way to answering important questions for policy applications via counterfactual analysis and simulations.
Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review
Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.
A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams
Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.
CONCENTRATION OF TEMPERED POSTERIORS AND OF THEIR VARIATIONAL APPROXIMATIONS
While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family 𝓕. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video processing or NLP to name a few. However, despite nice results in practice, the theoretical properties of these approximations are not known. We propose a general oracle inequality that relates the quality of the VB approximation to the prior π and to the structure of 𝓕. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our theory to various examples. First, we show that for parametric models with log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.
Improving big citizen science data: Moving beyond haphazard sampling
Citizen science is mainstream: millions of people contribute data to a growing array of citizen science projects annually, forming massive datasets that will drive research for years to come. Many citizen science projects implement a \"leaderboard\" framework, ranking the contributions based on number of records or species, encouraging further participation. But is every data point equally \"valuable?\" Citizen scientists collect data with distinct spatial and temporal biases, leading to unfortunate gaps and redundancies, which create statistical and informational problems for downstream analyses. Up to this point, the haphazard structure of the data has been seen as an unfortunate but unchangeable aspect of citizen science data. However, we argue here that this issue can actually be addressed: we provide a very simple, tractable framework that could be adapted by broadscale citizen science projects to allow citizen scientists to optimize the marginal value of their efforts, increasing the overall collective knowledge.
A Multi-Resolution Approximation for Massive Spatial Datasets
Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method. Supplementary materials for this article are available online.
A literature review on one-class classification and its potential applications in big data
In severely imbalanced datasets, using traditional binary or multi-class classification typically leads to bias towards the class(es) with the much larger number of instances. Under such conditions, modeling and detecting instances of the minority class is very difficult. One-class classification (OCC) is an approach to detect abnormal data points compared to the instances of the known class and can serve to address issues related to severely imbalanced datasets, which are especially very common in big data. We present a detailed survey of OCC-related literature works published over the last decade, approximately. We group the different works into three categories: outlier detection, novelty detection, and deep learning and OCC. We closely examine and evaluate selected works on OCC such that a good cross section of approaches, methods, and application domains is represented in the survey. Commonly used techniques in OCC for outlier detection and for novelty detection, respectively, are discussed. We observed one area that has been largely omitted in OCC-related literature is its application context for big data and its inherently associated problems, such as severe class imbalance, class rarity, noisy data, feature selection, and data reduction. We feel the survey will be appreciated by researchers working in these areas of big data.
cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies
Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. However, in many cases the full potential of co-analyses is not reached due to technical variance between data from different experimental batches. Here, we present cyCombine, a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques, such as CITE-seq, flow cytometry, and mass cytometry. We demonstrate that cyCombine maintains the biological variance and the structure of the data, while minimizing the technical variance between datasets. cyCombine does not require technical replicates across datasets, and computation time scales linearly with the number of cells, allowing for integration of massive datasets. Robust, accurate, and scalable integration of cytometry data enables integration of multiple datasets for primary data analyses and the validation of results using public datasets. Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. Here, the authors present a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques.
A Review of Remote Sensing for Environmental Monitoring in China
The natural environment is essential for human survival and development since it provides water resources, land resources, biological resources and climate resources etc. As a developing country, China has witnessed a significant change in the natural environment in recent decades; and therefore, monitoring and mastering the status of the environment is of great significance. Due to the characteristics of large-scale and dynamic observation, remote sensing technology has been an indispensable approach for environmental monitoring. This paper reviews the satellite resources, institutions and policies for environmental monitoring in China, and the advances in research and application of remote sensing from five aspects: ecological index retrieval, environmental monitoring in protected areas, rural areas, urban areas and mining areas. The remote sensing models and methods for various types of environmental monitoring, and the specific applications in China are comprehensively summarized. This paper also points out major challenges existing at the current stage: satellite sensor problems, integrated use challenges of datasets, uncertainty in the retrieval process of ecological variables, scaling effect problems, a low degree of automation, the weak ability of forecasting and comprehensive analysis, and a lack of computational power for massive datasets. Finally, the development trend and future directions are put forward to direct the research and application of environmental monitoring and protection in the new era.
A Review of Deep Learning in Multiscale Agricultural Sensing
Population growth, climate change, and the worldwide COVID-19 pandemic are imposing increasing pressure on global agricultural production. The challenge of increasing crop yield while ensuring sustainable development of environmentally friendly agriculture is a common issue throughout the world. Autonomous systems, sensing technologies, and artificial intelligence offer great opportunities to tackle this issue. In precision agriculture (PA), non-destructive and non-invasive remote and proximal sensing methods have been widely used to observe crops in visible and invisible spectra. Nowadays, the integration of high-performance imagery sensors (e.g., RGB, multispectral, hyperspectral, thermal, and SAR) and unmanned mobile platforms (e.g., satellites, UAVs, and terrestrial agricultural robots) are yielding a huge number of high-resolution farmland images, in which rich crop information is compressed. However, this has been accompanied by challenges, i.e., ways to swiftly and efficiently making full use of these images, and then, to perform fine crop management based on information-supported decision making. In the past few years, deep learning (DL) has shown great potential to reshape many industries because of its powerful capabilities of feature learning from massive datasets, and the agriculture industry is no exception. More and more agricultural scientists are paying attention to applications of deep learning in image-based farmland observations, such as land mapping, crop classification, biotic/abiotic stress monitoring, and yield prediction. To provide an update on these studies, we conducted a comprehensive investigation with a special emphasis on deep learning in multiscale agricultural remote and proximal sensing. Specifically, the applications of convolutional neural network-based supervised learning (CNN-SL), transfer learning (TL), and few-shot learning (FSL) in crop sensing at land, field, canopy, and leaf scales are the focus of this review. We hope that this work can act as a reference for the global agricultural community regarding DL in PA and can inspire deeper and broader research to promote the evolution of modern agriculture.