Catalogue Search | MBRL

Finding the Genomic Basis of Local Adaptation

by Hoban, Sean , Lowry, David B. , Storfer, Andrew in Accuracy , Adaptation , Adaptation, Physiological

2016

Uncovering the genetic and evolutionary basis of local adaptation is a major focus of evolutionary biology. The recent development of cost-effective methods for obtaining high-quality genome-scale data makes it possible to identify some of the loci responsible for adaptive differences among populations. Two basic approaches for identifying putatively locally adaptive loci have been developed and are broadly used: one that identifies loci with unusually high genetic differentiation among populations (differentiation outlier methods) and one that searches for correlations between local population allele frequencies and local environments (genetic-environment association methods). Here, we review the promises and challenges of these genome scan methods, including correcting for the confounding influence of a species’ demographic history, biases caused by missing aspects of the genome, matching scales of environmental data with population structure, and other statistical considerations. In each case, we make suggestions for best practices for maximizing the accuracy and efficiency of genome scans to detect the underlying genetic basis of local adaptation. With attention to their current limitations, genome scan methods can be an important tool in finding the genetic basis of adaptive evolutionary change.

Journal Article

Share this book

Add to My Shelf

On normalization and algorithm selection for unsupervised outlier detection

by Muñoz, Mario A , Hyndman, Rob J , Kandanaarachchi Sevvandi in Algorithms , Data analysis , Datasets

2020

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

Journal Article

Share this book

Add to My Shelf

Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of FST

by Lotterhos, Katie E. , Whitlock, Michael C.

2015

Loci responsible for local adaptation are likely to have more genetic differentiation among populations than neutral loci. However, neutral loci can vary widely in their amount of genetic differentiation, even over the same geographic range. Unfortunately, the distribution of differentiation—as measured by an index such as FST—depends on the details of the demographic history of the populations in question, even without spatially heterogeneous selection. Many methods designed to detect FST outliers assume a specific model of demographic history, which can result in extremely high false positive rates for detecting loci under selection. We develop a new method that infers the distribution of FST for loci unlikely to be strongly affected by spatially diversifying selection, using data on a large set of loci with unknown selective properties. Compared to previous methods, this approach, called OutFLANK, has much lower false positive rates and comparable power, as shown by simulation.

Journal Article

Share this book

Add to My Shelf

ROBUST COVARIANCE AND SCATTER MATRIX ESTIMATION UNDER HUBER’S CONTAMINATION MODEL

by Gao, Chao , Chen, Mengjie , Ren, Zhao in Complexity theory , Contamination , Covariance matrix

2018

Covariance matrix estimation is one of the most important problems in statistics. To accommodate the complexity of modern datasets, it is desired to have estimation procedures that not only can incorporate the structural assumptions of covariance matrices, but are also robust to outliers from arbitrary sources. In this paper, we define a new concept called matrix depth and then propose a robust covariance matrix estimator by maximizing the empirical depth function. The proposed estimator is shown to achieve minimax optimal rate under Huber’s ε-contamination model for estimating covariance/scatter matrices with various structures including bandedness and sparsity.

Journal Article

Share this book

Add to My Shelf

A Comprehensive Survey of Anomaly Detection Algorithms

by Thakkar, Amit , Samariya, Durgesh in Algorithms , Anomalies , Artificial Intelligence

2023

Anomaly or outlier detection is consider as one of the vital application of data mining, which deals with anomalies or outliers. Anomalies are considered as data points that are dramatically different from the rest of the data points. In this survey, we comprehensively present anomaly detection algorithms in an organized manner. We begin this survey with the definition of anomaly, then provide essential elements of anomaly detection, such as different types of anomaly, different application domains, and evaluation measures. Such anomaly detection algorithms are categorized in seven categories based on their working mechanisms, which includes total of 52 algorithms. The categories are anomaly detection algorithms based on statistics, density, distance, clustering, isolation, ensemble and subspace. For each category, we provide the time complexity of each algorithm and their general advantages and disadvantages. In the end, we compared all discussed anomaly detection algorithms in detail.

Journal Article

Share this book

Add to My Shelf

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

by Alsini, Raed , Ma, Xiaogang , Soule, Terence in Aircraft , Algorithms , Canonical forms

2021

Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.

Journal Article

Share this book

Add to My Shelf

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

by Kriegel, Hans-Peter , Schubert, Erich , Zimek, Arthur in Algorithms , Artificial Intelligence , Chemistry and Earth Sciences

2014

Outlier detection research has been seeing many new algorithms every year that often appear to be only slightly different from existing methods along with some experiments that show them to “clearly outperform” the others. However, few approaches come along with a clear analysis of existing methods and a solid theoretical differentiation. Here, we provide a formalized method of analysis to allow for a theoretical comparison and generalization of many existing methods. Our unified view improves understanding of the shared properties and of the differences of outlier detection models. By abstracting the notion of locality from the classic distance-based notion, our framework facilitates the construction of abstract methods for many special data types that are usually handled with specialized algorithms. In particular, spatial neighborhood can be seen as a special case of locality. Here we therefore compare and generalize approaches to spatial outlier detection in a detailed manner. We also discuss temporal data like video streams, or graph data such as community networks. Since we reproduce results of specialized approaches with our general framework, and even improve upon them, our framework provides reasonable baselines to evaluate the true merits of specialized approaches. At the same time, seeing spatial outlier detection as a special case of local outlier detection, opens up new potentials for analysis and advancement of methods.

Journal Article

Share this book

Add to My Shelf

Detecting Deviating Data Cells

by Rousseeuw, Peter J. , Bossche, Wannes Van Den in Cellwise outlier , Data analysis , Datasets

2018

A multivariate dataset consists of n cases in d dimensions, and is often stored in an n by d data matrix. It is well-known that real data may contain outliers. Depending on the situation, outliers may be (a) undesirable errors, which can adversely affect the data analysis, or (b) valuable nuggets of unexpected information. In statistics and data analysis, the word outlier usually refers to a row of the data matrix, and the methods to detect such outliers only work when at least half the rows are clean. But often many rows have a few contaminated cell values, which may not be visible by looking at each variable (column) separately. We propose the first method to detect deviating data cells in a multivariate sample which takes the correlations between the variables into account. It has no restriction on the number of clean rows, and can deal with high dimensions. Other advantages are that it provides predicted values of the outlying cells, while imputing missing values at the same time. We illustrate the method on several real datasets, where it uncovers more structure than found by purely columnwise methods or purely rowwise methods. The proposed method can help to diagnose why a certain row is outlying, for example, in process control. It also serves as an initial step for estimating multivariate location and scatter matrices.

Journal Article

Share this book

Add to My Shelf

Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures

by Campello, Ricardo J. G. B , Röchner, Philipp , Zimek, Arthur in Algorithms , Bins , Calibration

2024

An outlier probability is the probability that an observation is an outlier. Typically, outlier detection algorithms calculate real-valued outlier scores to identify outliers. Converting outlier scores into outlier probabilities increases the interpretability of outlier scores for domain experts and makes outlier scores from different outlier detection algorithms comparable. Although several transformations to convert outlier scores to outlier probabilities have been proposed in the literature, there is no common understanding of good outlier probabilities and no standard approach to evaluate outlier probabilities. We require that good outlier probabilities be sharp, refined, and calibrated. To evaluate these properties, we adapt and propose novel measures that use ground-truth labels indicating which observation is an outlier or an inlier. The refinement and calibration measures partition the outlier probabilities into bins or use kernel smoothing. Compared to the evaluation of probability in supervised learning, several aspects are relevant when evaluating outlier probabilities, mainly due to the imbalanced and often unsupervised nature of outlier detection. First, stratified and weighted measures are necessary to evaluate the probabilities of outliers well. Second, the joint use of the sharpness, refinement, and calibration errors makes it possible to independently measure the corresponding characteristics of outlier probabilities. Third, equiareal bins, where the product of observations per bin times bin length is constant, balance the number of observations per bin and bin length, allowing accurate evaluation of different outlier probability ranges. Finally, we show that good outlier probabilities, according to the proposed measures, improve the performance of the follow-up task of converting outlier probabilities into labels for outliers and inliers.

Journal Article

Share this book

Add to My Shelf

Strength of Stacking Technique of Ensemble Learning in Rockburst Prediction with Imbalanced Data: Comparison of Eight Single and Ensemble Models

by Wu, Jian , Yin, Xin , Liu, Quansheng in Algorithms , Artificial intelligence , Artificial neural networks

2021

Rockburst is a common dynamic geological hazard, severely restricting the development and utilization of underground space and resources. As the depth of excavation and mining increases, rockburst tends to occur frequently. Hence, it is necessary to carry out a study on rockburst prediction. Due to the nonlinear relationship between rockburst and its influencing factors, artificial intelligence was introduced. However, the collected data were typically imbalanced. Single algorithms trained by such data have low recognition for minority classes. In order to handle the problem, this paper employed stacking technique of ensemble learning to establish rockburst prediction models. In total, 246 sets of data were collected. In the preprocessing stage, three data mining techniques including principal component analysis, local outlier factor and expectation maximization algorithm were used for dimension reduction, outlier detection and outlier substitution, respectively. Then, the pre-processed data were split into a training set (75%) and a test set (25%) with stratified sampling. Based on the four classical single intelligent algorithms, namely k -nearest neighbors (KNN), support vector machine (SVM), deep neural network (DNN) and recurrent neural network (RNN), four ensemble models (KNN–RNN, SVM–RNN, DNN–RNN and KNN–SVM–DNN–RNN) were built by stacking technique of ensemble learning. The prediction performance of eight models was evaluated, and the differences between single models and ensemble models were analyzed. Additionally, a sensitivity analysis was conducted, revealing the importance of input variables on the models. Finally, the impact of class imbalance on the prediction accuracy and fitting effect of models was quantitatively discussed. The results showed that stacking technique of ensemble learning provides a new and promising way for rockburst prediction, which exhibits unique advantages especially when using imbalanced data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter