Catalogue Search | MBRL

Group-Wise Principal Component Analysis for Exploratory Data Analysis

by Rodríguez-Gómez, Rafael A. , Camacho, José , Saccenti, Edoardo in Correlation , Data analysis , Exploratory data analysis

2017

In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Estimation of Unmeasured Room Temperature, Relative Humidity, and CO2 Concentrations for a Smart Building Using Machine Learning and Exploratory Data Analysis

by Goro Fujita , Tagami Keisuke , Abraham Kaligambe in Accuracy , Algorithms , Artificial intelligence

2022

Smart buildings that utilize innovative technologies such as artificial intelligence (AI), the internet of things (IoT), and cloud computing to improve comfort and reduce energy waste are gaining popularity. Smart buildings comprise a range of sensors to measure real-time indoor environment variables essential for the heating, ventilation, and air conditioning (HVAC) system control strategies. For accuracy and smooth operation, current HVAC system control strategies require multiple sensors to capture the indoor environment variables. However, using too many sensors creates an extensive network that is costly and complex to maintain. Our proposed research solves the mentioned problem by implementing a machine-learning algorithm to estimate unmeasured variables utilizing a limited number of sensors. Using a six-month data set collected from a three-story smart building in Japan, several extreme gradient boosting (XGBoost) models were designed and trained to estimate unmeasured room temperature, relative humidity, and CO2 concentrations. Our models accurately estimated temperature, humidity, and CO2 concentration under various case studies with an average root mean squared error (RMSE) of 0.3 degrees, 2.6%, and 26.25 ppm, respectively. Obtained results show an accurate estimation of indoor environment measurements that is applicable for optimal HVAC system control in smart buildings with a reduced number of required sensors.

Journal Article

Share this book

Add to My Shelf

Cluster Analysis, 5th Edition

by Brian S. Everitt , Morven Leese , Daniel Stahl in Exploratory Data Analysis

2011

Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics. This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data. Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis. Key Features: Presents a comprehensive guide to clustering techniques, with focus on the practical aspects of cluster analysis. Provides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from bioinformatics and gene studies Updates the chapter on mixture models to include recent developments and presents a new chapter on mixture modeling for structured data. Practitioners and researchers working in cluster analysis and data analysis will benefit from this book.

eBook

Share this book

Add to My Shelf

Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data

by Barter, Rebecca L. , Yu, Bin in Data visualization , Exploratory data analysis , Heatmap

2018

The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.

Journal Article

Share this book

Add to My Shelf

Discovering knowledge in data

by Larose, Daniel T in Computers , Data mining , Data Warehousing

2014

The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before. This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will \"learn data mining by doing data mining\". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining. The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis. Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization Offers extensive coverage of the R statistical programming language Contains 280 end-of-chapter exercises Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

eBook

Share this book

Add to My Shelf

Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation

by Pitkin, Emil , Goldstein, Alex , Bleich, Justin in Algorithms , Complexity theory , Exploratory data analysis

2015

This article presents individual conditional expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. Classical partial dependence plots (PDPs) help visualize the average partial relationship between the predicted response and one or more features. In the presence of substantial interaction effects, the partial response relationship can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate the complexity of the modeled relationship. Accordingly, ICE plots refine the PDP by graphing the functional relationship between the predicted response and the feature for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate, suggesting where and to what extent heterogeneities might exist. In addition to providing a plotting suite for exploratory analysis, we include a visual test for additive structure in the data-generating model. Through simulated examples and real datasets, we demonstrate how ICE plots can shed light on estimated models in ways PDPs cannot. Procedures outlined are available in the R package ICEbox .

Journal Article

Share this book

Add to My Shelf

Intelligent Data Analytics for Terror Threat Prediction

by Lalit Garg , Ram Bilas Pachori , Subhendu Kumar Pani in Exploratory Data Analysis

2021

Intelligent data analytics for terror threat prediction is an emerging field of research at the intersection of information science and computer science, bringing with it a new era of tremendous opportunities and challenges due to plenty of easily available criminal data for further analysis. This book provides innovative insights that will help obtain interventions to undertake emerging dynamic scenarios of criminal activities. Furthermore, it presents emerging issues, challenges and management strategies in public safety and crime control development across various domains. The book will play a vital role in improvising human life to a great extent. Researchers and practitioners working in the fields of data mining, machine learning and artificial intelligence will greatly benefit from this book, which will be a good addition to the state-of-the-art approaches collected for intelligent data analytics. It will also be very beneficial for those who are new to the field and need to quickly become acquainted with the best performing methods. With this book they will be able to compare different approaches and carry forward their research in the most important areas of this field, which has a direct impact on the betterment of human life by maintaining the security of our society. No other book is currently on the market which provides such a good collection of state-of-the-art methods for intelligent data analytics-based models for terror threat prediction, as intelligent data analytics is a newly emerging field and research in data mining and machine learning is still in the early stage of development.

eBook

Share this book

Add to My Shelf

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

by Rahnenführer, Jörg , De Bin, Riccardo , Benner, Axel in Algorithms , Analysis , Analytical goals

2023

Background In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Journal Article

Share this book

Add to My Shelf

pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components

by Marini, Federico , Binder, Harald in Algorithms , Biochemistry , Bioinformatics

2019

Background Principal component analysis (PCA) is frequently used in genomics applications for quality assessment and exploratory analysis in high-dimensional data, such as RNA sequencing (RNA-seq) gene expression assays. Despite the availability of many software packages developed for this purpose, an interactive and comprehensive interface for performing these operations is lacking. Results We developed the pcaExplorer software package to enhance commonly performed analysis steps with an interactive and user-friendly application, which provides state saving as well as the automated creation of reproducible reports. pcaExplorer is implemented in R using the Shiny framework and exploits data structures from the open-source Bioconductor project. Users can easily generate a wide variety of publication-ready graphs, while assessing the expression data in the different modules available, including a general overview, dimension reduction on samples and genes, as well as functional interpretation of the principal components. Conclusion pcaExplorer is distributed as an R package in the Bioconductor project ( http://bioconductor.org/packages/pcaExplorer/ ), and is designed to assist a broad range of researchers in the critical step of interactive data exploration.

Journal Article

Share this book

Add to My Shelf

Next-gen agriculture: integrating AI and XAI for precision crop yield predictions

by Mohan, R. N. V. Jagan , Sree, R. Praneetha , Rayanoothala, Pravallika Sree in Agricultural production , Agriculture , Artificial intelligence

2025

Climate change poses significant challenges to global food security by altering precipitation patterns and increasing the frequency of extreme weather events such as droughts, heatwaves, and floods. These phenomena directly affect agricultural productivity, leading to lower crop yields and economic losses for farmers. This study leverages Artificial Intelligence (AI) and Explainable Artificial Intelligence (XAI) techniques to predict crop yields and assess the impacts of climate change on agriculture, providing a novel approach to understanding complex interactions between climatic and agronomic factors. Using Exploratory Data Analysis (EDA), the study identifies temperature as the most critical factor influencing crop yields, with notable interactions observed between rainfall patterns and macronutrient levels. Advanced regression models, including Decision Tree Regressor, Random Forest Regressor, and LightGBM Regressor, achieved exceptional predictive performance, with R² scores reaching 0.92, mean squared errors as low as 0.02, and mean absolute errors of 0.015. Additionally, XAI techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) enhanced the interpretability of the predictions, offering actionable insights into the relative importance of key features. These insights inform strategies for agricultural decision-making and climate adaptation. By integrating AI-driven predictions with XAI-based interpretability, this research presents a robust and transparent framework for mitigating the adverse effects of climate change on agriculture, emphasizing its potential for scalable application in precision farming and policy development.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter