Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,807
result(s) for
"outlier detection"
Sort by:
On normalization and algorithm selection for unsupervised outlier detection
by
Muñoz, Mario A
,
Hyndman, Rob J
,
Kandanaarachchi Sevvandi
in
Algorithms
,
Data analysis
,
Datasets
2020
This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.
Journal Article
A Learning-Based Ensemble Algorithm with Optimal Selection for Outlier Detection
In this paper, we propose a Learning-based Ensemble Method with Optimal selection strategy (LbEM-OSS), which presents a new outlier detection algorithm that captures only outstanding ones of constituent models. Using KNN to define local regions and Pearson correlation to evaluate the detectors makes the ensemble robust. Our method can adapt and generalize better across different high-dimensional datasets by generating pseudo-ground truths with average and maximum aggregation strategies. On a wide range of benchmark datasets, LbEM-OSS outperformed both statistics-based and neural ensemble methods, which achieved stateof-the-art ROC-AUC as high as 97.78% in the best-case and 4-8% AUC improvements over existing methods on average. These results portray its potential for noise, different dimensionality, and heterogeneous data nature. Moreover, it is highly scalable and accurate, which makes it an essential application in practical fields like fraud detection, network security, and healthcare. This research highlights the need for dynamic selection approaches within ensemble methods, providing the groundwork for future developments in sound outlier detection.
Journal Article
An experimental study of existing tools for outlier detection and cleaning in trajectories
2025
Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.
Journal Article
Subspace histograms for outlier detection in linear time
2018
Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly simple subspace outlier detection algorithm, which can be implemented in a few lines of code, and whose complexity is linear in the size of the data set and the space requirement is constant. We show that this outlier detection algorithm is much faster than both conventional and high-dimensional algorithms and also provides more accurate results. The approach uses randomized hashing to score data points and has a neat subspace interpretation. We provide a visual representation of this interpretability in terms of outlier sensitivity histograms. Furthermore, the approach can be easily generalized to data streams, where it provides an efficient approach to discover outliers in real time. We present experimental results showing the effectiveness of the approach over other state-of-the-art methods.
Journal Article
Inductively Coupled Plasma Mass Spectrometry Performance for the Measurement of Key Serum Minerals: A Comparative Study With Standard Quantification Methods
2025
Background Inductively coupled plasma mass spectrometry (ICP‐MS) is widely used for the accurate measurement of minerals. However, its application to serum essential mineral measurement has not been fully evaluated. The present study aimed to assess the performance of ICP‐MS for serum minerals by comparing its measurements to those obtained using standard quantification methods. Methods Cross‐sectional data were collected from 282 participants from a single facility in Japan. Serum concentrations of eight key minerals, namely sodium, potassium, calcium, phosphorus, magnesium, iron, zinc, and copper, measured via ICP‐MS and standard methods were compared using Passing–Bablok regression and Bland–Altman plots. Results All minerals, except phosphorus, exhibited good agreement with standard methods, with more stable regression coefficients observed for minerals with greater interindividual variability. After systematically filtering outliers, the mean relative errors were approximately −3% for sodium, potassium, calcium, and magnesium; +5% for iron; 0% for zinc; and −19% for copper. The outliers for iron were primarily due to mild hemolysis, whereas those for zinc were largely attributed to nonhemolysis factors. For phosphorus, the serum total phosphorus concentration measured using ICP‐MS was approximately 3.5 times higher than the serum inorganic phosphorus concentration measured using standard methods, with a weak correlation observed between the two methods. Conclusion This study provides a practical foundation for future research. Understanding ICP‐MS characteristics will facilitate the development of new approaches in clinical diagnostics. Analysis of real‐world cross‐sectional study data revealed relative errors between the standard method and ICP‐MS for the different minerals tested. Additionally, several outliers were observed exclusively in the ICP‐MS results, likely due to hemolysis or other unidentified factors. Although these limitations in ICP‐MS performance cannot be entirely dismissed, comparative results of the standard method and the ICP‐MS approach exhibited good agreement. The unique characteristics of ICP‐MS identified in this study lay a strong foundation for future research, not only for routine clinical measurement of specific minerals but also for disease‐specific analyses leveraging the ability of ICP‐MS to simultaneously measure a wide range of parameters.
Journal Article
The influence of feature grouping algorithm in outlier detection with categorical data
by
Veerabahu, Vidhya
,
Viswanathan, Rajalakshmi
,
Nathaniel, Sharon Femi Paul Sunder
in
Algorithms
,
Anomalies
,
Data analysis
2024
Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.
Journal Article
Improving the accuracy of a remotely-sensed flood warning system using a multi-objective pre-processing method for signal defects detection and elimination
by
Gharabaghi, Bahram
,
Bonakdari, Hossein
,
Soltani, Keyvan
in
Cloud cover
,
Data collection
,
Decision tree classification
2020
One of the primary goals of watershed management is to proactively monitor and forecast flood water levels to provide early warning for timely evacuation plans and save lives. One of the most economical ways to accomplish this objective is to use remotely-sensed satellite signals. Previous studies have indicated that an Advanced Microwave Scanning Radiometer (AMSR) sensor can be used for river water level monitoring combined with a few in-situ hydrometric gauges for the ground-truth data collection. However, space-based signals are influnced by many error-inducing natural factors, such as dust and cloud cover. Hence, a hybrid method is proposed, which comprises of a multi-objective particle swarm optimization model, a decision tree classification algorithm, the Hotelling’s \\(T^2\\) outlier detection, and a regression model to identify and replace inaccurate space-based signals. This complex hybrid method will be referred to, in this study, with the acronym (OCOR). In the first phase of this hybrid method, the outlier signals are detected and eliminated from the dataset, and in the second phase, the eliminated signals along with signals lost due to satellite technical problems are estimated by ground-truth data calibration using in situ hydrometric stations. The two case studies of the White and Willamette Rivers demonstrate the performance of OCOR in practical situations.
Journal Article
A Novel Framework to Detect Irrelevant Software Requirements Based on MultiPhiLDA as the Topic Model
2022
Noise in requirements has been known to be a defect in software requirements specifications (SRS). Detecting defects at an early stage is crucial in the process of software development. Noise can be in the form of irrelevant requirements that are included within an SRS. A previous study had attempted to detect noise in SRS, in which noise was considered as an outlier. However, the resulting method only demonstrated a moderate reliability due to the overshadowing of unique actor words by unique action words in the topic–word distribution. In this study, we propose a framework to identify irrelevant requirements based on the MultiPhiLDA method. The proposed framework distinguishes the topic–word distribution of actor words and action words as two separate topic–word distributions with two multinomial probability functions. Weights are used to maintain a proportional contribution of actor and action words. We also explore the use of two outlier detection methods, namely percentile-based outlier detection (PBOD) and angle-based outlier detection (ABOD), to distinguish irrelevant requirements from relevant requirements. The experimental results show that the proposed framework was able to exhibit better performance than previous methods. Furthermore, the use of the combination of ABOD as the outlier detection method and topic coherence as the estimation approach to determine the optimal number of topics and iterations in the proposed framework outperformed the other combinations and obtained sensitivity, specificity, F1-score, and G-mean values of 0.59, 0.65, 0.62, and 0.62, respectively.
Journal Article
A Comprehensive Survey of Anomaly Detection Algorithms
2023
Anomaly or outlier detection is consider as one of the vital application of data mining, which deals with anomalies or outliers. Anomalies are considered as data points that are dramatically different from the rest of the data points. In this survey, we comprehensively present anomaly detection algorithms in an organized manner. We begin this survey with the definition of anomaly, then provide essential elements of anomaly detection, such as different types of anomaly, different application domains, and evaluation measures. Such anomaly detection algorithms are categorized in seven categories based on their working mechanisms, which includes total of 52 algorithms. The categories are anomaly detection algorithms based on statistics, density, distance, clustering, isolation, ensemble and subspace. For each category, we provide the time complexity of each algorithm and their general advantages and disadvantages. In the end, we compared all discussed anomaly detection algorithms in detail.
Journal Article
A Survey of Methods for Finding Outliers in Wireless Sensor Networks
2015
Outlier detection is a well studied problem in various fields. The unique characteristics and constraints of wireless sensor networks (WSN) make this problem especially challenging. Sensors can detect outliers for a plethora of reasons and these reasons need to be inferred in real time. Here, we survey the current state of research in this area, compare them and present some future directions for smarter handling of outliers in WSN.
Journal Article