Catalogue Search | MBRL

Assessing the Performance of Deep Learning Algorithms for Short-Term Surface Water Quality Prediction

by Ki, Seo Jin , Suh, Sang-Ik , Han, Eun Jin in Accuracy , Algorithms , Chemical oxygen demand

2021

This study aimed to investigate the applicability of deep learning algorithms to (monthly) surface water quality forecasting. A comparison was made between the performance of an autoregressive integrated moving average (ARIMA) model and four deep learning models. All prediction algorithms, except for the ARIMA model working on a single variable, were tested with univariate inputs consisting of one of two dependent variables as well as multivariate inputs containing both dependent and independent variables. We found that deep learning models (6.31–18.78%, in terms of the mean absolute percentage error) showed better performance than the ARIMA model (27.32–404.54%) in univariate data sets, regardless of dependent variables. However, the accuracy of prediction was not improved for all dependent variables in the presence of other associated water quality variables. In addition, changes in the number of input variables, sliding window size (i.e., input and output time steps), and relevant variables (e.g., meteorological and discharge parameters) resulted in wide variation of the predictive accuracy of deep learning models, reaching as high as 377.97%. Therefore, a refined search identifying the optimal values on such influencing factors is recommended to achieve the best performance of any deep learning model in given multivariate data sets.

Journal Article

Share this book

Add to My Shelf

LCSS-Based Algorithm for Computing Multivariate Data Set Similarity: A Case Study of Real-Time WSN Data

by Zakarya, Muhammad , Khan, Anwar , Khan, Rahim in Algorithms , Archives & records , Computer science

2019

Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.

Journal Article

Share this book

Add to My Shelf

Visual Analysis of Relationships between Heterogeneous Networks and Texts: An Application on the IEEE VIS Publication Dataset

by Zimmer, Björn , Kerren, Andreas , Sahlgren, Magnus in Analytics , graph drawing , heterogeneous networks

2017

The visual exploration of large and complex network structures remains a challenge for many application fields. Moreover, a growing number of real-world networks is multivariate and often interconnected with each other. Entities in a network may have relationships with elements of other related datasets, which do not necessarily have to be networks themselves, and these relationships may be defined by attributes that can vary greatly. In this work, we propose a comprehensive visual analytics approach that supports researchers to specify and subsequently explore attribute-based relationships across networks, text documents and derived secondary data. Our approach provides an individual search functionality based on keywords and semantically similar terms over the entire text corpus to find related network nodes. For examining these nodes in the interconnected network views, we introduce a new interaction technique, called Hub2Go, which facilitates the navigation by guiding the user to the information of interest. To showcase our system, we use a large text corpus collected from research papers listed in the visualization publication dataset that consists of 2752 documents over a period of 25 years. Here, we analyze relationships between various heterogeneous networks, a bag-of-words index and a word similarity matrix, all derived from the initial corpus and metadata.

Journal Article

Share this book

Add to My Shelf

Robust Statistical Analysis

by Hron, Karel , Filzmoser, Peter in different orthonormal bases on simplex ‐ and orthonormal transformations of corresponding coordinates , discriminant analysis, and data set from Oslo ‐ discriminant analysis for compositional data , main task, viewing of robustness ‐ for compositional data

2011

This chapter contains sections titled: Introduction Elements of Robust Statistics from a Compositional Point of View Robust Methods for Compositional Data Case Studies Summary Acknowledgement References

Book Chapter

Share this book

Add to My Shelf

SUMMARIZING RANDOM SAMPLES

by Rossi, Richard J in graphical statistics , inferential statistics , multivariate data sets

2022

This chapter discusses statistical methods for estimating the unknown values of the population parameters for a univariate or a multivariate population. It also discusses the two types of statistics, which are graphical statistics that result in plots, charts, or graphs to display the summarized sample information and numerical statistics that result in numbers that can be used to summarize a sample, estimate parameters, or test hypotheses concerning an unknown parameter. The values of the estimates that are computed from a well‐designed random sample should contain fairly reliable information about the values of unknown parameters. The main emphasis of this chapter is the correct use and interpretation of a statistic. The formulas for computing the values of each of the statistics discussed in this chapter are presented and their use illustrated, but in most cases it is recommended that a statistical computing package be used when computing the values of the statistics.

Book Chapter

Share this book

Add to My Shelf

Geostatistics for Compositions

by Pawlowsky‐Glahn, Vera , Tolosana‐Delgado, Raimon , van den Boogaart, Karl Gerald in application, in Lyons West oil field in Kansas, USA ‐ analysing total porosity , cokriging of regionalised compositions ‐ using geostatistics, transforming data with any black‐box isometric log‐ratio transformation ilr , diagonal terms, auto‐covariances or direct variograms ‐ spatial continuity of given variable

2011

This chapter contains sections titled: Introduction A Brief Summary of Geostatistics Cokriging of Regionalised Compositions Structural Analysis of Regionalised Composition Dealing with Zeros: Replacement Strategies and Simplicial Indicator Cokriging Application Conclusions Acknowledgements References

Book Chapter

Share this book

Add to My Shelf

Multiplicity Control and Closed Testing

by Salmaso, Luigi , Pesarin, Fortunato in closure tree , Mult data set‐testing equality of multivariate distribution of three variables , multiple comparison procedures (MCPs) and global type I error

2010

This chapter contains sections titled: Defining Raw and Adjusted p‐Values Controlling for Multiplicity Multiple Testing The Closed Testing Approach Mult Data Example Washing Test Data Weighted Methods for Controlling FWE and FDR Adjusting Stepwise p‐Values

Book Chapter

Share this book

Add to My Shelf

Partially ordered data sets and a new efficient method for calculating multivariate conditional value-at-risk

by Lee, Jinwook , Kim, Jongpil in Algorithms , Apexes , Approximation

2025

Recent studies in Lee and Prékopa (Oper Res Lett 45:19–24, 2017) and Lee (Oper Res Lett 45:1204–1220, 2017) showed that a union of partially ordered orthants in Rn can be decomposed only into the largest and the second largest chains. This allows us to calculate the probability of the union of such events in a recursive manner. If the vertices of such orthants designate p-level efficient points, i.e., the multivariate quantile or the multivariate value-at-risk (MVaR) in Rn, then the number of them, say N, is typically very large, which makes it almost impossible to calculate the multivariate conditional value-at-risk (MCVaR) introduced by Prékopa (Ann Oper Res 193(1):49–69, 2012). This is because it takes O(2N) in case of N MVaRs in Rn to find the exact value of MCVaR. In this paper, upon the basis of ideas in Lee and Prékopa (Oper Res Lett 45:19–24, 2017) and Lee (Oper Res Lett 45:1204–1220, 2017), together with proper adjustments, we study efficient methods for the calculation of the MCVaR without resorting to an approximation. In fact, the proposed methods not only have polynomial time complexity but also computes the exact value of MCVaR. We also discuss additional benefits MCVaR has to offer over its univariate counter part, the conditional value-at-risk, by providing numerical results. Numerical examples are presented with computing time in both cases of given population and sample data sets.

Journal Article

Share this book

Add to My Shelf

A Topologically Valid Definition of Depth for Functional Data

by Nieto-Reyes, Alicia , Battey, Heather in Borel sets , Data analysis , Data sampling

2016

The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function.

Journal Article

Share this book

Add to My Shelf

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

by Goldstein, Markus , Uchida, Seiichi in Algorithms , Analysis , Anomalies

2016

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter