Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
37,846 result(s) for "Data points"
Sort by:
Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review
Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.
A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams
Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.
Improving big citizen science data: Moving beyond haphazard sampling
Citizen science is mainstream: millions of people contribute data to a growing array of citizen science projects annually, forming massive datasets that will drive research for years to come. Many citizen science projects implement a \"leaderboard\" framework, ranking the contributions based on number of records or species, encouraging further participation. But is every data point equally \"valuable?\" Citizen scientists collect data with distinct spatial and temporal biases, leading to unfortunate gaps and redundancies, which create statistical and informational problems for downstream analyses. Up to this point, the haphazard structure of the data has been seen as an unfortunate but unchangeable aspect of citizen science data. However, we argue here that this issue can actually be addressed: we provide a very simple, tractable framework that could be adapted by broadscale citizen science projects to allow citizen scientists to optimize the marginal value of their efforts, increasing the overall collective knowledge.
A literature review on one-class classification and its potential applications in big data
In severely imbalanced datasets, using traditional binary or multi-class classification typically leads to bias towards the class(es) with the much larger number of instances. Under such conditions, modeling and detecting instances of the minority class is very difficult. One-class classification (OCC) is an approach to detect abnormal data points compared to the instances of the known class and can serve to address issues related to severely imbalanced datasets, which are especially very common in big data. We present a detailed survey of OCC-related literature works published over the last decade, approximately. We group the different works into three categories: outlier detection, novelty detection, and deep learning and OCC. We closely examine and evaluate selected works on OCC such that a good cross section of approaches, methods, and application domains is represented in the survey. Commonly used techniques in OCC for outlier detection and for novelty detection, respectively, are discussed. We observed one area that has been largely omitted in OCC-related literature is its application context for big data and its inherently associated problems, such as severe class imbalance, class rarity, noisy data, feature selection, and data reduction. We feel the survey will be appreciated by researchers working in these areas of big data.
Efficient algorithm for big data clustering on single machine
Big data analysis requires the presence of large computing powers, which is not always feasible. And so, it became necessary to develop new clustering algorithms capable of such data processing. This study proposes a new parallel clustering algorithm based on the k-means algorithm. It significantly reduces the exponential growth of computations. The proposed algorithm splits a dataset into batches while preserving the characteristics of the initial dataset and increasing the clustering speed. The idea is to define cluster centroids, which are also clustered, for each batch. According to the obtained centroids, the data points belong to the cluster with the nearest centroid. Real large datasets are used to conduct the experiments to evaluate the effectiveness of the proposed approach. The proposed approach is compared with k-means and its modification. The experiments show that the proposed algorithm is a promising tool for clustering large datasets in comparison with the k-means algorithm.
Adaptive Radial Basis Function Partition of Unity Interpolation: A Bivariate Algorithm for Unstructured Data
In this article we present a new adaptive algorithm for solving 2D interpolation problems of large scattered data sets through the radial basis function partition of unity method. Unlike other time-consuming schemes this adaptive method is able to efficiently deal with scattered data points with highly varying density in the domain. This target is obtained by decomposing the underlying domain in subdomains of variable size so as to guarantee a suitable number of points within each of them. The localization of such points is done by means of an efficient search procedure that depends on a partition of the domain in square cells. For each subdomain the adaptive process identifies a predefined neighborhood consisting of one or more levels of neighboring cells, which allows us to quickly find all the subdomain points. The algorithm is further devised for an optimal selection of the local shape parameters associated with radial basis function interpolants via leave-one-out cross validation and maximum likelihood estimation techniques. Numerical experiments show good performance of this adaptive algorithm on some test examples with different data distributions. The efficacy of our interpolation scheme is also pointed out by solving real world applications.
FAIR Data Point: A FAIR-Oriented Approach for Metadata Publication
Metadata, data about other digital objects, play an important role in FAIR with a direct relation to all FAIR principles. In this paper we present and discuss the FAIR Data Point (FDP), a software architecture aiming to define a common approach to publish semantically-rich and machine-actionable metadata according to the FAIR principles. We present the core components and features of the FDP, its approach to metadata provision, the criteria to evaluate whether an application adheres to the FDP specifications and the service to register, index and allow users to search for metadata content of available FDPs.
Research on the Characteristics of Water Based on Big Data of Automatic Pond Reservoir System
In shrimp cultivation, water is one of the crucial components in the cultivation process, as evident in several aspects such as oxygen, pH, and temperature. The recommended range for good oxygen levels is between 3 and 7, pH should be in the range of 7 to 9, and the temperature should be maintained between 24.5 ℃ and 31.9℃. To achieve optimal results, we use fuzzy logic as a controller for the existing system. After obtaining data from the aforementioned sensors, with the assistance of fuzzy logic, control will be implemented to maintain an optimal environment using fuzzy logic. With the aid of this fuzzy logic, accuracy up to 90% can be achieved with approximately 290 data points.
Big data/analytics platform for Industry 4.0 implementation in advanced manufacturing context
Industrial companies operate in increasingly competitive international environments; thus, they need to continuously innovate to improve their competitiveness, productivity, and quality. Digital transformation, which is one of the foundations of Industry 4.0, is essential to addressing these innovation challenges. The objective of this study is to present the methodology, development, and implementation of a new cloud computing platform that collects, stores, and processes data from shop floors. Manufacturing shop floors use connected, intelligent devices that produce thousands of data points that, once computed, provide a high added value. This study presents the architecture to collect this data, store it in a big data cloud computing solution, and then process it using advanced artificial intelligence algorithms and/or optimization techniques. The proposed platform has been developed to minimize the complexity and costs required to facilitate its adoption. The platform’s implementation and evaluation were conducted by two companies from two different sectors of the Brazilian industry. The objective of the first company was to diagnose production losses in a compressor production line. Using the developed solution, the company identified prospective changes in the layout and automation that could increase productivity by approximately 5%. The objective of the second company was to implement dynamic and optimized process planning in clothing manufacturing. The first assessment of the gains of the proposed solution exhibited an average productivity increase of 10.69% ± 1.82% (confidence interval).
A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering
Big data applications generate a huge range of evolving, real-time, and high-dimensional streaming data. In many applications, data stream clustering regarding efficiency and effectiveness becomes challenging. A major issue in data mining is clustering of data streams. The several clustering techniques were implemented for stream data, but they are mostly quite restricted approaches to cluster dynamics. Generally, the data stream is an arrival of data sequence and also several factors are added in the clustering, which is rather than the classical clustering. For every data point, the stream is mostly unbounded and also the data has been estimated atleast once. It leads to higher processing time and an additional requirement on memory. In addition, the clusters in each data and their statistical property vary over time, and streams can be noisy. To address these challenges, this research work aims to implement a novel data stream clustering which is developed with a hybrid meta-heuristic model. Initially, a data stream is collected, and the micro-clusters are formed by the K-Means Clustering (KMC) technique. Then, the formation of micro-clusters, merge and sorting of the data clusters, where the cluster optimization is performed by the Hybrid Group Search Pelican Optimization (HGSPO). The main objective of the clustering is performed to maximize the accuracy through the radius, distance and similarity measures and then, the thresholds of these metrics are optimized. In the training phase, a stream of clustering threshold is fixed for each cluster. When new data comes into this stream clustering model, the output of training data is measured with new data output that is decided to forward the data into the appropriate clusters based on the assigned threshold with minimum similarity. Through the performance analysis and the attained results, the clustering quality of the recommended system is ensured regarding standard performance metrics by estimating with various clustering and heuristic algorithms.