Catalogue Search | MBRL

On normalization and algorithm selection for unsupervised outlier detection

by Muñoz, Mario A , Hyndman, Rob J , Kandanaarachchi Sevvandi in Algorithms , Data analysis , Datasets

2020

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

Journal Article

Share this book

Add to My Shelf

A Learning-Based Ensemble Algorithm with Optimal Selection for Outlier Detection

by Lade, Srinivasa Chakravarthi , Ginni, Girish Reddy

2025

In this paper, we propose a Learning-based Ensemble Method with Optimal selection strategy (LbEM-OSS), which presents a new outlier detection algorithm that captures only outstanding ones of constituent models. Using KNN to define local regions and Pearson correlation to evaluate the detectors makes the ensemble robust. Our method can adapt and generalize better across different high-dimensional datasets by generating pseudo-ground truths with average and maximum aggregation strategies. On a wide range of benchmark datasets, LbEM-OSS outperformed both statistics-based and neural ensemble methods, which achieved stateof-the-art ROC-AUC as high as 97.78% in the best-case and 4-8% AUC improvements over existing methods on average. These results portray its potential for noise, different dimensionality, and heterogeneous data nature. Moreover, it is highly scalable and accurate, which makes it an essential application in practical fields like fraud detection, network security, and healthcare. This research highlights the need for dynamic selection approaches within ensemble methods, providing the groundwork for future developments in sound outlier detection.

Journal Article

Share this book

Add to My Shelf

An experimental study of existing tools for outlier detection and cleaning in trajectories

by Sakr, Mahmoud , Garcez Duarte, Mariana M in Accuracy , Algorithms , Cleaning

2025

Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.

Journal Article

Share this book

Add to My Shelf

Subspace histograms for outlier detection in linear time

by Sathe, Saket , Aggarwal, Charu C in Algorithms , Complexity , Data analysis

2018

Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly simple subspace outlier detection algorithm, which can be implemented in a few lines of code, and whose complexity is linear in the size of the data set and the space requirement is constant. We show that this outlier detection algorithm is much faster than both conventional and high-dimensional algorithms and also provides more accurate results. The approach uses randomized hashing to score data points and has a neat subspace interpretation. We provide a visual representation of this interpretability in terms of outlier sensitivity histograms. Furthermore, the approach can be easily generalized to data streams, where it provides an efficient approach to discover outliers in real time. We present experimental results showing the effectiveness of the approach over other state-of-the-art methods.

Journal Article

Share this book

Add to My Shelf

Inductively Coupled Plasma Mass Spectrometry Performance for the Measurement of Key Serum Minerals: A Comparative Study With Standard Quantification Methods

by Nishiyama, Hiroyuki , Tanaka, Takazo , Shimizu, Takuya in Adult , Aged , analytical performance

2025

Background Inductively coupled plasma mass spectrometry (ICP‐MS) is widely used for the accurate measurement of minerals. However, its application to serum essential mineral measurement has not been fully evaluated. The present study aimed to assess the performance of ICP‐MS for serum minerals by comparing its measurements to those obtained using standard quantification methods. Methods Cross‐sectional data were collected from 282 participants from a single facility in Japan. Serum concentrations of eight key minerals, namely sodium, potassium, calcium, phosphorus, magnesium, iron, zinc, and copper, measured via ICP‐MS and standard methods were compared using Passing–Bablok regression and Bland–Altman plots. Results All minerals, except phosphorus, exhibited good agreement with standard methods, with more stable regression coefficients observed for minerals with greater interindividual variability. After systematically filtering outliers, the mean relative errors were approximately −3% for sodium, potassium, calcium, and magnesium; +5% for iron; 0% for zinc; and −19% for copper. The outliers for iron were primarily due to mild hemolysis, whereas those for zinc were largely attributed to nonhemolysis factors. For phosphorus, the serum total phosphorus concentration measured using ICP‐MS was approximately 3.5 times higher than the serum inorganic phosphorus concentration measured using standard methods, with a weak correlation observed between the two methods. Conclusion This study provides a practical foundation for future research. Understanding ICP‐MS characteristics will facilitate the development of new approaches in clinical diagnostics. Analysis of real‐world cross‐sectional study data revealed relative errors between the standard method and ICP‐MS for the different minerals tested. Additionally, several outliers were observed exclusively in the ICP‐MS results, likely due to hemolysis or other unidentified factors. Although these limitations in ICP‐MS performance cannot be entirely dismissed, comparative results of the standard method and the ICP‐MS approach exhibited good agreement. The unique characteristics of ICP‐MS identified in this study lay a strong foundation for future research, not only for routine clinical measurement of specific minerals but also for disease‐specific analyses leveraging the ability of ICP‐MS to simultaneously measure a wide range of parameters.

Journal Article

Share this book

Add to My Shelf

The influence of feature grouping algorithm in outlier detection with categorical data

by Veerabahu, Vidhya , Viswanathan, Rajalakshmi , Nathaniel, Sharon Femi Paul Sunder in Algorithms , Anomalies , Data analysis

2024

Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.

Journal Article

Share this book

Add to My Shelf

First-arrival automatic picking based on improved energy ratio method and outlier detection theory

by Cui, Qinghui , Gou, Qiyong , Hu, Linhui in Accuracy , Algorithms , Arrivals

2021

Based on the energy ratio method, an automatic picking method with strong noise resistance is proposed. It considers the influence of the current point's position on the first-arrival characteristic value. Specifically, an outlier detection technique is proposed to eliminate abnormal first arrivals for low signal-to-noise ratio (SNR) seismic data. First, the first arrivals of adjacent shots obtained by the new method are arranged according to the offsets. Then, combined with the distribution characteristics of the first arrivals, a symmetric window centered on the current point is established as the calculation range, and the distance-based outlier detection method is adopted for the abnormal first arrivals. The size of the calculation time window is determined by scanning the given value range. In order to optimize the processing results, we further propose an outlier detection method based on grid density. After this step, the abnormal first arrivals will be further eliminated. Following these steps, the abnormal first arrivals of all shots can be removed effectively. The actual data processing results show that the proposed program can accurately pick up the first arrivals and has a good performance in detecting the abnormal first arrivals.

Journal Article

Share this book

Add to My Shelf

Improving the accuracy of a remotely-sensed flood warning system using a multi-objective pre-processing method for signal defects detection and elimination

by Gharabaghi, Bahram , Bonakdari, Hossein , Soltani, Keyvan in Cloud cover , Data collection , Decision tree classification

2020

One of the primary goals of watershed management is to proactively monitor and forecast flood water levels to provide early warning for timely evacuation plans and save lives. One of the most economical ways to accomplish this objective is to use remotely-sensed satellite signals. Previous studies have indicated that an Advanced Microwave Scanning Radiometer (AMSR) sensor can be used for river water level monitoring combined with a few in-situ hydrometric gauges for the ground-truth data collection. However, space-based signals are influnced by many error-inducing natural factors, such as dust and cloud cover. Hence, a hybrid method is proposed, which comprises of a multi-objective particle swarm optimization model, a decision tree classification algorithm, the Hotelling’s T 2 outlier detection, and a regression model to identify and replace inaccurate space-based signals. This complex hybrid method will be referred to, in this study, with the acronym (OCOR). In the first phase of this hybrid method, the outlier signals are detected and eliminated from the dataset, and in the second phase, the eliminated signals along with signals lost due to satellite technical problems are estimated by ground-truth data calibration using in situ hydrometric stations. The two case studies of the White and Willamette Rivers demonstrate the performance of OCOR in practical situations.

Journal Article

Share this book

Add to My Shelf

A survey on outlier explanations

by Gruenwald, Le , Silvia, Shejuti , Panjei, Egawati in Algorithms , Artificial intelligence , Investigations

2022

While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the detected outliers. To lessen this difficulty, when outliers are identified, they should be presented together with their explanations. There are survey papers on outlier detection, but none exists for outlier explanations. To fill this gap, in this paper, we present a survey on outlier explanations in which meaningful knowledge is mined from anomalous data to explain them. We define different types of outlier explanations and discuss the challenges in generating each type. We review the existing outlier explanation techniques and discuss how they address the challenges. We also discuss the applications of outlier explanations and review the existing methods used to evaluate outlier explanations. Furthermore, we discuss possible future research directions.

Journal Article

Share this book

Add to My Shelf

A Novel Framework to Detect Irrelevant Software Requirements Based on MultiPhiLDA as the Topic Model

by Darnoto, Brian Rizqi Paradisiaca , Siahaan, Daniel in Actors , Actresses , Analysis

2022

Noise in requirements has been known to be a defect in software requirements specifications (SRS). Detecting defects at an early stage is crucial in the process of software development. Noise can be in the form of irrelevant requirements that are included within an SRS. A previous study had attempted to detect noise in SRS, in which noise was considered as an outlier. However, the resulting method only demonstrated a moderate reliability due to the overshadowing of unique actor words by unique action words in the topic–word distribution. In this study, we propose a framework to identify irrelevant requirements based on the MultiPhiLDA method. The proposed framework distinguishes the topic–word distribution of actor words and action words as two separate topic–word distributions with two multinomial probability functions. Weights are used to maintain a proportional contribution of actor and action words. We also explore the use of two outlier detection methods, namely percentile-based outlier detection (PBOD) and angle-based outlier detection (ABOD), to distinguish irrelevant requirements from relevant requirements. The experimental results show that the proposed framework was able to exhibit better performance than previous methods. Furthermore, the use of the combination of ABOD as the outlier detection method and topic coherence as the estimation approach to determine the optimal number of topics and iterations in the proposed framework outperformed the other combinations and obtained sensitivity, specificity, F1-score, and G-mean values of 0.59, 0.65, 0.62, and 0.62, respectively.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter