Catalogue Search | MBRL

Scatterplot selection for dimensionality reduction in multidimensional data visualization

by Okada, Kaya , Itoh, Takayuki in Accuracy , Data visualization , Datasets

2025

Dimensionality reduction (DR) techniques for multidimensional data serve as powerful tools for visualization and understanding of the structure of the data. Various DR methods have been developed to extract specific features of the data over the years. However, selection of the optimal DR method and fine-tuning parameters are still challenging, as these choices vary based on the characteristics of the dataset. Consequently, data scientists often rely on their experience or undertake extensive experimentation to identify the most suitable approach. This paper proposes a semi-automatic method for selecting appropriate DR techniques through scatterplot evaluation. Initially, our approach applies a range of DR methods to the given multidimensional data to compute two-dimensional values. Next, we generate scatterplots from the two-dimensional data and calculate scores reflecting the distribution and spatial relationships among the points. Scatterplots that provide insights achieve higher scores, enabling an efficient selection of DR methods based on their visualization. We demonstrate the effectiveness of the presented method through two case studies: The first one is an e-commerce review dataset, and the second focuses on a dataset derived from music feature extraction.

Journal Article

Share this book

Add to My Shelf

Trajectories for Energy Transition in EU-28 Countries over the Period 2000–2019: a Multidimensional Approach

in Climate change , Data analysis , Economic analysis

2022

Environmental issues have become a major concern for policymakers faced with the threat of global warming. The European Climate Energy Package is an ambitious plan which drives the trajectories of European countries in three directions: reducing greenhouse gas emissions, increasing the share of renewable energy and improving energy efficiency. This article is original in that it considers the three targets together using multidimensional data analysis methods, a methodology which makes it possible to propose temporal and spatial typologies for the energy transition of European countries over the period 2000–2019. Results show evidence of a gradual transition over three sub-periods towards a more environmentally conscious economy. Four distinct types of energy transition profiles are identified, highlighting the contrasting performances of EU Members in terms of energy transition. In particular, some economically more advanced countries, namely Germany, Ireland, Belgium, Luxembourg and the Netherlands, are lagging in achieving their targets. Finally, discriminant analyses suggest that economic performance, trade performance, innovation system and policy mix design have been particularly effective in promoting energy transition over the period 2000–2019, while only innovation system helps to explain the contrasting results observed at country level over that time.

Journal Article

Share this book

Add to My Shelf

The examination of the effect of the criterion for neural network’s learning on the effectiveness of the qualitative analysis of multidimensional data

by Jamróz Dariusz in Criteria , Data analysis , Data visualization

2020

A variety of multidimensional visualization methods are applied for the qualitative analysis of multidimensional data. One of the multidimensional data visualization methods is a method using autoassociative neural networks. In order to perform visualizations of n-dimensional data, such a network has n inputs, n outputs and one of the interlayers consisting of two outputs whose values represent coordinates of the analyzed sample’s image on the screen. Such a criterion for the network’s learning consists in that the same value as the one at the ith input appears at each ith output. If the network is trained in this way, the whole information from n inputs was compressed to two outputs of the interlayer and then decompressed to n network outputs. The paper shows the application of different learning criteria can be more beneficial from the point of view of the results’ readability. Overall analysis was conducted on seven-dimensional real data representing three coal classes, five-dimensional data representing printed characters, 216-dimensional data representing hand-written digits and, additionally, in order to illustrate additional explanations using artificially generated seven-dimensional data. Readability of results of the qualitative analysis of these data was compared using the multidimensional visualization utilizing neural networks for different learning criteria. Also, the obtained results of applying all analyzed criteria on 20 randomly selected sets of multidimensional data obtained from one of the publicly available repositories are presented.

Journal Article

Share this book

Add to My Shelf

Analysis of Personalized Cardiovascular Drug Therapy: From Monitoring Technologies to Data Integration and Future Perspectives

by Huang, Ziyu , Liu, Yu , Lin, Runxing in Accuracy , Algorithms , Anticoagulants

2025

Cardiovascular diseases have long been a major challenge to human health, and the treatment differences caused by individual variability remain unresolved. In recent years, personalized cardiovascular drug therapy has attracted widespread attention. This paper reviews the strategies for achieving personalized cardiovascular drug therapy through traditional dynamic monitoring and multidimensional data integration and analysis. It focuses on key technologies for dynamic monitoring, dynamic monitoring based on individual differences, and multidimensional data integration and analysis. By systematically reviewing the relevant literature, the main challenges in current research and the proposed potential directions for future studies were summarized.

Journal Article

Share this book

Add to My Shelf

A multidimensional data warehouse design to combat the health pandemics

by Peker, Serhat , Turcan, Gizem in Artificial Intelligence , Big Data , Business and Management

2022

The Covid-19 pandemic has brought about a new lifestyle for across the globe. Throughout this period, the use of holistic methods has become indispensable to deal with the enormous amount of data in this regard. It appears that the simplest way to tackle this issue is to spread the digitalization efforts concerning all data-based applications. Given the significance of pandemic data management, it is essential to have a data warehouse that collects, associates, and communicates these data. Containing a significant volume of structured data, warehousing can provide the necessary foundation for data mining and the development of analytical tools. To this end, the present paper proposes a data warehouse for combatting and managing pandemics, with the possibility to be enhanced for other personal or public health-related initiatives. In this research, the bottom-up data warehouse building methodology is used to construct a warehouse. A fact constellation schema model is utilized to accommodate the information ranging from citizen demographics to physician-prescribed drugs and laboratory tests. Sample queries are executed based on the proposed data warehouse for different purposes, and desired query results are obtained within proper response times. The proposed data warehouse contributes to countrywide implementation of pandemic practices and illuminates research on faster, less expensive, and safer management of citywide, nationwide, or worldwide health emergencies within a robust technical framework by governments.

Journal Article

Share this book

Add to My Shelf

Benchmarking Maintenance Practices for Allocating Features Affecting Hydraulic System Maintenance: A West-Balkan Perspective

by Šević, Dragoljub , Orošnjak, Marko in agglomerative hierarchical clustering , Algorithms , Artificial intelligence

2023

As a consequence of the application advanced maintenance practices, the theoretical probability of failures occurring is relatively low. However, observations of low levels of market intelligence and maintenance management have been reported. This comprehensive study investigates the determinants of maintenance practices in companies utilising hydraulic machinery, drawing on empirical evidence from a longitudinal questionnaire-based survey across the West-Balkan countries. This research identifies critical predictors of technical and sustainable maintenance performance metrics by employing the CA-AHC (Correspondence Analysis with Agglomerative Hierarchical Clustering) method combined with non-parametric machine learning models. Key findings highlight the significant roles of the number of maintenance personnel employed; equipment size, determined on the basis of nominal power consumption; machinery age; and maintenance activities associated with fluid cleanliness in influencing hydraulic machine maintenance outcomes. These insights challenge current perceptions and introduce novel considerations with respect to aspects such as equipment size, maintenance skills and activities with the aim of preserving peak performance. However, the study acknowledges the variability resulting from differing operational conditions, and calls for further research for broader validation. As large-scale heterogeneous datasets are becoming mainstream, this research underscores the importance of using multidimensional data analysis techniques to better understand operational outcomes.

Journal Article

Share this book

Add to My Shelf

Toward a taxonomy for 2D non-paired General Line Coordinates: a comprehensive survey

by Ganuza, María Luján , Castro, Silvia M. , Antonini, Antonella S. in Artificial Intelligence , Business Information Systems , Computational Biology/Bioinformatics

2023

Multidimensional data visualization is one of the primary foundations supporting data analysis used for understanding the hidden relationships between items and dimensions of complex data. The line-based visualization techniques are a fundamental class of multidimensional visualization techniques and cover an important set of methods that are relevant to the visual exploratory analysis. Recently, General Line Coordinates (GLCs) were introduced. These are losslessly line-based visualization techniques for multidimensional data. Particular cases of GLCs are the non-paired GLCs, which generalize the radial and parallel coordinates and have proved to be highly suitable for visualizing multidimensional data. In this context, we conduct a systematic paper review of the 2D non-paired GLC ( 2D-NP-GLC ) visualization techniques present in the literature. We organize the 2D-NP-GLC contributions in a unified reference framework in which both the representations and the associated interactions are considered. Focusing jointly on these two criteria, we provide a useful common space for the design and development of 2D-NP-GLC techniques. Besides, this framework integrates the 2D-NP-GLC contributions and helps to identify under-explored areas that may be candidates for further research.

Journal Article

Share this book

Add to My Shelf

Brain Activity is Influenced by How High Dimensional Data are Represented: An EEG Study of Scatterplot Diagnostic (Scagnostics) Measures

by Shereen, A. Duke , Etemadpour, Ronak , Shintree, Sonali in Ambiguity , Biomedical Engineering and Bioengineering , Bivariate analysis

2024

Visualization and visual analytic tools amplify one’s perception of data, facilitating deeper and faster insights that can improve decision making. For multidimensional data sets, one of the most common approaches of visualization methods is to map the data into lower dimensions. Scatterplot matrices (SPLOM) are often used to visualize bivariate relationships between combinations of variables in a multidimensional dataset. However, the number of scatterplots increases quadratically with respect to the number of variables. For high dimensional data, the corresponding enormous number of scatterplots makes data exploration overwhelmingly complex, thereby hindering the usefulness of SPLOM in human decision making processes. One approach to address this difficulty utilizes Graph-theoretic Scatterplot Diagnostic (Scagnostics) to automatically extract a subset of scatterplots with salient features and of manageable size with the hope that the data will be sufficient for improving human decisions. In this paper, we use Electroencephalogram (EEG) to observe brain activity while participants make decisions informed by scatterplots created using different visual measures. We focused on 4 categories of Scagnostics measures: Clumpy, Monotonic, Striated, and Stringy. Our findings demonstrate that by adjusting the level of difficulty in discriminating between data sets based on the Scagnostics measures, different parts of the brain are activated: easier visual discrimination choices involve brain activity mostly in visual sensory cortices located in the occipital lobe, while more difficult discrimination choices tend to recruit more parietal and frontal regions as they are known to be involved in resolving ambiguities. Our results imply that patterns of neural activity are predictive markers of which specific Scagnostics measures most assist human decision making based on visual stimuli such as ours.

Journal Article

Share this book

Add to My Shelf

Properties of individual differences scaling and its interpretation

by Gower, John C , Le Roux, Niël J , Gardner-Lubbe, Sugnet in Algorithms , Convergence , Criteria

2022

Indscal models consider symmetric matrices Bk=XWkX′ for k=1,…,K, where X:n×R is a compromise matrix termed the group-average and Wk is a diagonal matrix of weights given by the kth individual to the R, specified in advance, columns of X; non-negative weights are preferred and usually R

Journal Article

Share this book

Add to My Shelf

Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster

by Lee, Suan , Kim, Jinho , Kang, Seok in Algorithms , Communications traffic , Computer Communication Networks

2019

A data cube is a powerful analytical tool that stores all aggregate values over a set of dimensions. It provides users with a simple and efficient means of performing complex data analysis while assisting in decision making. Since the computation time for building a data cube is very large, however, efficient methods for reducing the data cube computation time are needed. Previous works have developed various algorithms for efficiently generating data cubes using MapReduce, which is a large-scale distributed parallel processing framework. However, MapReduce incurs the overhead of disk I/Os and network traffic. To overcome these MapReduce limitations, Spark was recently proposed as a memory-based parallel/distributed processing framework. It has attracted considerable research attention owing to its high performance. In this paper, we propose two algorithms for efficiently building data cubes. The algorithms fully leverage Spark’s mechanisms and properties: Resilient Distributed Top-Down Computation ( RDTDC ) and Resilient Distributed Bottom-Up Computation ( RDBUC ). The former is an algorithm for computing the components (i.e., cuboids) of a data cube in a top-down approach; the latter is a bottom-up approach. The RDTDC algorithm has three key functions. (1) It approximates the size of the cuboid using the cardinality without additional Spark action computation to determine the size of each cuboid during top-down computation. Thus, one cuboid can be computed from the upper cuboid of a smaller size. (2) It creates an execution plan that is optimized to input the smaller sized cuboid. (3) Lastly, it uses a method of reusing the result of the already computed cuboid by top-down computation and simultaneously computes the cuboid of several dimensions. In addition, we propose the RDBUC bottom-up algorithm in Spark, which is widely used in computing Iceberg cubes to maintain only cells satisfying a certain condition of minimum support. This algorithm incorporates two primary strategies: (1) reducing the input size to compute aggregate values for a dimension combination (e.g., A , B , and C ) by removing the input, which does not satisfy the Iceberg cube condition at its lower dimension combination (e.g., A and B ) computed earlier. (2) We use a lazy materialization strategy that computes every combination of dimensions using only transformation operations without any action operation. It then stores them in a single action operation. To prove the efficiency of the proposed algorithms using a lazy materialization strategy by employing only one action operation, we conducted extensive experiments. We compared them to the cube() function, a built-in cube computation library of Spark SQL. The results showed that the proposed RDTDC and RDBUC algorithms outperformed Spark SQL cube().

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter