Catalogue Search | MBRL

Handling Incomplete Data with Regression Imputation

by Arisandi, R , Kartikasari, P , Annas, S in Data points , Handling incomplete data , Physics

2021

Regression analysis is widely used in various fields due to easy-to-understand. One of purposes of regression analysis is to predict the response variable using the predictor variables. Unfortunately, in real cases, some values may be missing. This circumstance will produce large error, indeed, poor prediction. Missing value lead us to trade-off remove the paired data points or replace. The purpose of this study was to estimate the missing value with regression imputation. This study conducted two scenarios of amount of missing values, 10% and 15%. The study results showed that the higher of amount of missing values, the higher the value of MSE was. The first section in your paper.

Journal Article

Share this book

Add to My Shelf

Multiple Imputation

by Murray, Jared S. in Identification methods , Mathematical models , Missing data

2018

Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for generating and using multiple imputations. A review of strategies for generating imputations follows, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches. Finally, we compare and contrast different methods for generating imputations on a range of criteria before identifying promising avenues for future research.

Journal Article

Share this book

Add to My Shelf

Reporting Proficiency Levels for Examinees With Incomplete Data

by Sinharay, Sandip in Accuracy , Classification , Computation

2022

Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (AP®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on these tests. The reporting of proficiency levels to the examinees with incomplete data requires estimation of the performance of the examinees on the missing part and essentially involves imputation of missing data. In this article, six approaches from the literature on missing data analysis are brought to bear on the problem of reporting of proficiency levels to the examinees with incomplete data. Data from several large-scale educational tests are used to compare the performances of the six approaches to the approach that is operationally used for reporting proficiency levels for these tests. A multiple imputation approach based on chained equations is shown to lead to the most accurate reporting of proficiency levels for data that were missing at random or completely at random, while the model-based approach of Holman and Glas performed the best for data that are missing not at random. Several recommendations are made on the reporting of proficiency levels to the examinees with incomplete data.

Journal Article

Share this book

Add to My Shelf

An Integrated Intuitionistic Fuzzy-Clustering Approach for Missing Data Imputation

by Bridge-Nduwimana, Charlène Béatrice , El Ouaazizi, Aziza , Benyakhlef, Majid in Accuracy , Algorithms , Analysis

2025

Missing data imputation is a critical preprocessing task that directly impacts the quality and reliability of data-driven analyses, yet many existing methods treat numerical and categorical data separately and lack the integration of advanced techniques. We suggest a novel imputation technique to overcome these restrictions that synergistically combines regression imputation using HistGradientBoostingRegressor and fuzzy rule-based systems and is enhanced by a tailored clustering process. This integrated approach effectively handles mixed data types and complex data structures using regression models to predict missing numerical values, fuzzy logic to incorporate expert knowledge and interpretability, and clustering to capture latent data patterns. Categorical variables are managed by mode imputation and label encoding. We evaluated the method on twelve tabular datasets with artificially introduced missingness, employing a comprehensive set of metrics focused on originally missing entries. The results demonstrate that our iterative imputer performs competitively with other established imputation techniques, achieving better and comparable error rates and accuracy. By combining statistical learning with fuzzy and clustering frameworks, the method achieves 15% lower Root Mean Square Error (RMSE), 10% lower Mean Absolute Error (MAE), and 80% higher precision in UCI datasets, thus offering a promising advance in data preprocessing in practical applications.

Journal Article

Share this book

Add to My Shelf

Statistical Evaluation of Item Nonresponse Methods Using the World Bank’s 2015 Philippines Enterprise Survey

by Et.al, Madeline D. Cabauatan in Cluster Grouping , Clustering , Coefficient of variation

2021

The main objective of the study was to evaluate item nonresponse procedures through a simulation study of different nonresponse levels or missing rates. A simulation study was used to explore how each of the response rates performs under a variety of circumstances. It also investigated the performance of procedures suggested for item nonresponse under various conditions and variable trends. The imputation methods considered were the cell mean imputation, random hotdeck, nearest neighbor, and simple regression. These variables are some of the major indicators for measuring productive labor and decent work in the country. For the purpose of this study, the researcher is interested in evaluating methods for imputing missing data for the number of workers and total cost of labor per establishment from the World Bank’s 2015 Enterprise Survey for the Philippines. The performances of the imputation techniques for item nonresponse were evaluated in terms of bias and coefficient of variation for accuracy and precision. Based on the results, the cell-mean imputation was seen to be most appropriate for imputing missing values for the total number of workers and total cost of labor per establishment. Since the study was limited to the variables cited, it is recommended to explore other labor indicators. Moreover, exploring choice of other clustering groups is highly recommended as clustering groups have great effect in the resulting estimates of imputation estimation. It is also recommended to explore other imputation techniques like multiple regression and other parametric models for nonresponse such as the Bayes estimation method. For regression based imputation, since the study is limited only in using the cluster groupings estimation, it is highly recommended to use other possible variables that might be related to the variable of interest to verify the results of this study.

Journal Article

Share this book

Add to My Shelf

CONVEX MIXTURES IMPUTATION AND APPLICATIONS

by Liou, Michelle , Ning, Jianhui , Cheng, Philip E.

2019

Nearest neighbor regression and kernel regression have been discussed toward imputing missing data in survey sampling for decades. In this study, methods of regression imputation are examined for estimating the mean of an incomplete variable and for predicting unidentified objects in the data. Novel convex mixtures of these two regression imputation estimators are constructed for keeping stable performance when the underlying missing data conditions are non-regular. Using a simulation study of two typical non-regularity conditions, the mixture imputation is shown to yield improved estimation against the existing competitors. The performance of predicting unidentified classes by the convex mixtures imputation estimators is also examined using two data sets from the UCI Machine Learning Repository.

Journal Article

Share this book

Add to My Shelf

Link handling for the atmospheric turbulence using LSTM neural networks in free space optical (FSO) communication

by Vasava, Priteshkumar B. , Lapsiwala, Pranav B. in Adaptive optics , Artificial neural networks , Atmospheric attenuation

2024

Free-space optical (FSO) communication is an emerging technology that uses light waves to transmit data, providing a faster and more efficient alternative to traditional wired communication. However, FSO communication is susceptible to atmospheric turbulence caused by factors such as rain, snow, and fog. To overcome this challenge, this study employs artificial neural network (ANN) and long short-term memory (LSTM) models to analyze the impact of atmospheric turbulence on FSO communication. The results indicate that higher wavelengths experience less attenuation than lower wavelengths in the presence of fog. The use of ANN and LSTM models to analyze the attenuation of various wavelengths in the presence of fog has shown that higher wavelengths experience less attenuation than lower wavelengths. Additionally, the LSTM model outperforms the ANN model in handling atmospheric turbulence, with an accuracy of 64.68 % compared to 63.98 %. These findings highlight the need for adaptive networks that can quickly adjust to traffic situations while being cost-effective. As the fiber optics industry continues to expand and evolve, there is potential for further developments in optical communications that prioritize speed, efficiency, and flexibility. As technology advances, the pursuit of faster and more reliable communication will continue to drive innovation in this field.

Journal Article

Share this book

Add to My Shelf

A unified approach to linearization variance estimation from survey data after imputation for item nonresponse

by KIM, JAE KWANG , RAO, J. N. K. in Applications , Biology, psychology, social sciences , Composite imputation

2009

Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism.

Journal Article

Share this book

Add to My Shelf

Data prediction for cases of incorrect data in multi-node electrocardiogram monitoring

by Nugroho, Heru , Erawati Rajab, Tati Latifah , Surendro, Kridanto in Damage , Data transmission , Electrocardiography

2022

The development of a mesh topology in multi-node electrocardiogram (ECG) monitoring based on the ZigBee protocol still has limitations. When more than one active ECG node sends a data stream, there will be incorrect data or damage due to a failure of synchronization. The incorrect data will affect signal interpretation. Therefore, a mechanism is needed to correct or predict the damaged data. In this study, the method of expectation-maximization (EM) and regression imputation (RI) was proposed to overcome these problems. Real data from previous studies are the main modalities used in this study. The ECG signal data that has been predicted is then compared with the actual ECG data stored in the main controller memory. Root mean square error (RMSE) is calculated to measure system performance. The simulation was performed on 13 ECG waves, each of them has 1000 samples. The simulation results show that the EM method has a lower predictive error value than the RI method. The average RMSE for the EM and RI methods is 4.77 and 6.63, respectively. The proposed method is expected to be used in the case of multi-node ECG monitoring, especially in the ZigBee application to minimize errors.

Journal Article

Share this book

Add to My Shelf

Empirical Likelihood-Based Inference under Imputation for Missing Response Data

by Wang, Qihua , Rao, J. N. K. in 62E20 , 62G05 , Approximation

2002

Inference under kernel regression imputation for missing response data is considered. An adjusted empirical likelihood approach to inference for the mean of the response variable is developed. A nonparametric version of Wilks' theorem is proved for the adjusted empirical log-likelihood ratio by showing that it has an asymptotic standard chi-squared distribution, and the corresponding empirical likelihood confidence interval for the mean is constructed. With auxiliary information, an empirical likelihood-based estimator is defined and an adjusted empirical log-likelihood ratio is derived. Asymptotic normality of the estimator is proved. Also, it is shown that the adjusted empirical log-likelihood ratio obeys Wilks' theorem. A simulation study is conducted to compare the adjusted empirical likelihood and the normal approximation methods in terms of coverage accuracies and average lengths of confidence intervals. Based on biases and standard errors, a comparision is also made by simulation between the empirical likelihood-based estimator and related estimators. Our simulation indicates that the adjusted empirical likelihood method performs competitively and that the use of auxiliary information provides improved inferences.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter