Catalogue Search | MBRL

Machine learning for data streams : with practical examples in MOA

by Bifet, Albert, author in Data mining. , Streaming technology (Telecommunications)

Book

Fuzzy Hoeffding Decision Tree for Data Stream Classification

by Marcelloni, Francesco , Ducange, Pietro , Pecori, Riccardo in Fuzzy decision tree , Hoeffding decision tree , Model interpretability

2021

Data stream mining has recently grown in popularity, thanks to an increasing number of applications which need continuous and fast analysis of streaming data. Such data are generally produced in application domains that require immediate reactions with strict temporal constraints. These particular characteristics make problematic the use of classical machine learning algorithms for mining knowledge from these fast data streams and call for appropriate techniques. In this paper, based on the well-known Hoeffding Decision Tree (HDT) for streaming data classification, we introduce FHDT, a fuzzy HDT that extends HDT with fuzziness, thus making HDT more robust to noisy and vague data. We tested FHDT on three synthetic datasets, usually adopted for analyzing concept drifts in data stream classification, and two real-world datasets, already exploited in some recent researches on fuzzy systems for streaming data. We show that FHDT outperforms HDT, especially in presence of concept drift. Furthermore, FHDT is characterized by a high level of interpretability, thanks to the linguistic rules that can be extracted from it.

Journal Article

Share this book

Add to My Shelf

Streaming, sharing, stealing : big data and the future of entertainment

by Smith, Michael D., 1968- author , Telang, Rahul, author in Streaming technology (Telecommunications) , Data transmission systems. , Big data.

Book

Share this book

Add to My Shelf

DSSM: Distributed Streaming Data Sharing Manager

by Tadahiro Hasegawa , Paul Leger , Ryota Gunji in Chemical technology , Communication , distributed streaming data sharing manager

2021

Developing robot control software systems is difficult because of a wide variety of requirements, including hardware systems and sensors, even though robots are demanding nowadays. Middleware systems, such as Robot Operating System (ROS), are being developed and widely used to tackle this difficulty. Streaming data Sharing Manager (SSM) is one of such middleware systems that allow developers to write and read sensor data with timestamps using a Personal Computer (PC). The timestamp feature is essential for the robot control system because it usually uses multiple sensors with their own measurement cycles, meaning that measured sensor values with different timestamps become useless for the robot control. Using SSM allows developers to use measured sensor values with the same timestamps; however, SSM assumes that only one PC is used. Thereby, if one process consumes CPU resources intensively, other processes cannot finish their assumed deadlines, leading to the unexpected behavior of a robot. This paper proposes an SSM middleware, named Distributed Streaming data Sharing Manager (DSSM), that enables distributing processes on SSM to different PCs. We have developed a prototype of DSSM and confirmed its behavior so far. In addition, we apply DSSM to an existing real SSM based robot control system that autonomously controls an unmanned vehicle robot. We then reveal its advantages and disadvantages via several experiments by measuring resource usages.

Journal Article

Share this book

Add to My Shelf

Streaming data : understanding the real-time pipeline

by Psaltis, Andrew G., author in Streaming technology (Telecommunications) , Real-time data processing. , Electronic data processing.

Book

Share this book

Add to My Shelf

Critical parameter analysis of Vertical Hoeffding Tree for optimized performance using SAMOA

by Prasad, Bakshi Rohit , Agarwal, Sonali in Algorithms , Artificial Intelligence , Big Data

2017

Streaming classification of big data is a method under stream data mining that learns from continuous, ordered sequences of data streams coming from diversified sources using limited computing and storage capabilities. SAMOA stands for scalable advanced massive online analysis, is a machine learning framework used to perform distributed data mining over streaming data. Vertical Hoeffding Tree (VHT) under SAMOA is a variant of very fast decision tree used for distributed classification of data streams. The performance of VHT depends on various critical parameters such as tie-threshold, grace value, confidence, split criterion, etc. Although, VHT is widely accepted as an efficient streaming classifier but one of the challenges in streaming classification is varying distribution of incoming data instances with respect to underlying classes in different datasets; therefore performance of VHT varies in different datasets. Therefore, achieving optimal performance from the stream classifier like VHT on different datasets is a challenging task and fixed set of values of critical parameters cannot be preconfigured for various types of datasets. This research work explores the capabilities of VHT streaming classifier of SAMOA in the light of various benchmarking performance statistics such as classification accuracy, kappa and kappa temporal. The work presented here, experimentally identifies suitable values of critical parameters of VHT that yield optimized performance on different datasets. Thus, this analytical study is extremely significant in developing streaming classifiers which achieve optimum performance via parameter tuning at run time.

Journal Article

Share this book

Add to My Shelf

Large-scale data streaming, processing, and blockchain security

by Saini, Hemraj, 1977- editor , Rathee, Geetanjali, 1990- editor , Saini, Dinesh Kumar, 1974- editor in Data mining. , Streaming technology (Telecommunications) , Blockchains (Databases)

\"This book explores the latest methodologies, modeling, and simulations for coping with the generation and management of large-scale data in both scientific and individual applications\"-- Provided by publisher.

Book

Share this book

Add to My Shelf

QUANTILE REGRESSION UNDER MEMORY CONSTRAINT

by Chen, Xi , Liu, Weidong , Zhang, Yichen in Asymptotic methods , Computer networks , Distributed processing

2019

This paper studies the inference problem in quantile regression (QR) for a large sample size n but under a limited memory constraint, where the memory can only store a small batch of data of size m. A natural method is the naive divide-and-conquer approach, which splits data into batches of size m, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when n = o(m²) and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as n grows polynomially in m, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality p goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.

Journal Article

Share this book

Add to My Shelf

Stream processing with Apache Flink : fundamentals, implementation, and operation of streaming applications

by Hueske, Fabian, author , Kalavri, Vasiliki, author in Apache Flink (Electronic resource) , Streaming technology (Telecommunications) Computer programs. , Big data.

\"Get started with Apache Flink, the open source framework that powers some of the world's largest stream processing applications. With this practical book, you'll explore the fundamental concepts of parallel stream processing and discover how this technology differs from traditional batch data processing. Longtime Apache Flink committers Fabian Hueske and Vasia Kalavri show you how to implement scalable streaming applications with Flink's DataStream API and continuously run and maintain these applications in operational environments. Stream processing is ideal for many use cases, including low-latency ETL, streaming analytics, and real-time dashboards as well as fraud detection, anomaly detection, and alerting. You can process continuous data of any kind, including user interactions, financial transactions, and loT data, as soon as you generate them.\"-- Provided by publisher

Book

Share this book

Add to My Shelf

Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incrementa Approach

by Shi, Wei , Santoro, Nicola , Yu, Kangqing in data-mining , incremental algorithm , outlier detections

2020

To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter