Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
607
result(s) for
"regression imputation"
Sort by:
Handling Incomplete Data with Regression Imputation
2021
Regression analysis is widely used in various fields due to easy-to-understand. One of purposes of regression analysis is to predict the response variable using the predictor variables. Unfortunately, in real cases, some values may be missing. This circumstance will produce large error, indeed, poor prediction. Missing value lead us to trade-off remove the paired data points or replace. The purpose of this study was to estimate the missing value with regression imputation. This study conducted two scenarios of amount of missing values, 10% and 15%. The study results showed that the higher of amount of missing values, the higher the value of MSE was. The first section in your paper.
Journal Article
Multiple Imputation
2018
Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for generating and using multiple imputations. A review of strategies for generating imputations follows, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches. Finally, we compare and contrast different methods for generating imputations on a range of criteria before identifying promising avenues for future research.
Journal Article
Reporting Proficiency Levels for Examinees With Incomplete Data
2022
Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (AP®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on these tests. The reporting of proficiency levels to the examinees with incomplete data requires estimation of the performance of the examinees on the missing part and essentially involves imputation of missing data. In this article, six approaches from the literature on missing data analysis are brought to bear on the problem of reporting of proficiency levels to the examinees with incomplete data. Data from several large-scale educational tests are used to compare the performances of the six approaches to the approach that is operationally used for reporting proficiency levels for these tests. A multiple imputation approach based on chained equations is shown to lead to the most accurate reporting of proficiency levels for data that were missing at random or completely at random, while the model-based approach of Holman and Glas performed the best for data that are missing not at random. Several recommendations are made on the reporting of proficiency levels to the examinees with incomplete data.
Journal Article
An Integrated Intuitionistic Fuzzy-Clustering Approach for Missing Data Imputation
by
Bridge-Nduwimana, Charlène Béatrice
,
El Ouaazizi, Aziza
,
Benyakhlef, Majid
in
Accuracy
,
Algorithms
,
Analysis
2025
Missing data imputation is a critical preprocessing task that directly impacts the quality and reliability of data-driven analyses, yet many existing methods treat numerical and categorical data separately and lack the integration of advanced techniques. We suggest a novel imputation technique to overcome these restrictions that synergistically combines regression imputation using HistGradientBoostingRegressor and fuzzy rule-based systems and is enhanced by a tailored clustering process. This integrated approach effectively handles mixed data types and complex data structures using regression models to predict missing numerical values, fuzzy logic to incorporate expert knowledge and interpretability, and clustering to capture latent data patterns. Categorical variables are managed by mode imputation and label encoding. We evaluated the method on twelve tabular datasets with artificially introduced missingness, employing a comprehensive set of metrics focused on originally missing entries. The results demonstrate that our iterative imputer performs competitively with other established imputation techniques, achieving better and comparable error rates and accuracy. By combining statistical learning with fuzzy and clustering frameworks, the method achieves 15% lower Root Mean Square Error (RMSE), 10% lower Mean Absolute Error (MAE), and 80% higher precision in UCI datasets, thus offering a promising advance in data preprocessing in practical applications.
Journal Article
Statistical Evaluation of Item Nonresponse Methods Using the World Bank’s 2015 Philippines Enterprise Survey
The main objective of the study was to evaluate item nonresponse procedures through a simulation study of different nonresponse levels or missing rates. A simulation study was used to explore how each of the response rates performs under a variety of circumstances. It also investigated the performance of procedures suggested for item nonresponse under various conditions and variable trends. The imputation methods considered were the cell mean imputation, random hotdeck, nearest neighbor, and simple regression. These variables are some of the major indicators for measuring productive labor and decent work in the country. For the purpose of this study, the researcher is interested in evaluating methods for imputing missing data for the number of workers and total cost of labor per establishment from the World Bank’s 2015 Enterprise Survey for the Philippines. The performances of the imputation techniques for item nonresponse were evaluated in terms of bias and coefficient of variation for accuracy and precision. Based on the results, the cell-mean imputation was seen to be most appropriate for imputing missing values for the total number of workers and total cost of labor per establishment. Since the study was limited to the variables cited, it is recommended to explore other labor indicators. Moreover, exploring choice of other clustering groups is highly recommended as clustering groups have great effect in the resulting estimates of imputation estimation. It is also recommended to explore other imputation techniques like multiple regression and other parametric models for nonresponse such as the Bayes estimation method. For regression based imputation, since the study is limited only in using the cluster groupings estimation, it is highly recommended to use other possible variables that might be related to the variable of interest to verify the results of this study.
Journal Article
CONVEX MIXTURES IMPUTATION AND APPLICATIONS
2019
Nearest neighbor regression and kernel regression have been discussed toward imputing missing data in survey sampling for decades. In this study, methods of regression imputation are examined for estimating the mean of an incomplete variable and for predicting unidentified objects in the data. Novel convex mixtures of these two regression imputation estimators are constructed for keeping stable performance when the underlying missing data conditions are non-regular. Using a simulation study of two typical non-regularity conditions, the mixture imputation is shown to yield improved estimation against the existing competitors. The performance of predicting unidentified classes by the convex mixtures imputation estimators is also examined using two data sets from the UCI Machine Learning Repository.
Journal Article
Link handling for the atmospheric turbulence using LSTM neural networks in free space optical (FSO) communication
by
Vasava, Priteshkumar B.
,
Lapsiwala, Pranav B.
in
Adaptive optics
,
Artificial neural networks
,
Atmospheric attenuation
2024
Free-space optical (FSO) communication is an emerging technology that uses light waves to transmit data, providing a faster and more efficient alternative to traditional wired communication. However, FSO communication is susceptible to atmospheric turbulence caused by factors such as rain, snow, and fog. To overcome this challenge, this study employs artificial neural network (ANN) and long short-term memory (LSTM) models to analyze the impact of atmospheric turbulence on FSO communication. The results indicate that higher wavelengths experience less attenuation than lower wavelengths in the presence of fog. The use of ANN and LSTM models to analyze the attenuation of various wavelengths in the presence of fog has shown that higher wavelengths experience less attenuation than lower wavelengths. Additionally, the LSTM model outperforms the ANN model in handling atmospheric turbulence, with an accuracy of 64.68 % compared to 63.98 %. These findings highlight the need for adaptive networks that can quickly adjust to traffic situations while being cost-effective. As the fiber optics industry continues to expand and evolve, there is potential for further developments in optical communications that prioritize speed, efficiency, and flexibility. As technology advances, the pursuit of faster and more reliable communication will continue to drive innovation in this field.
Journal Article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
by
KIM, JAE KWANG
,
RAO, J. N. K.
in
Applications
,
Biology, psychology, social sciences
,
Composite imputation
2009
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism.
Journal Article
Data prediction for cases of incorrect data in multi-node electrocardiogram monitoring
by
Nugroho, Heru
,
Erawati Rajab, Tati Latifah
,
Surendro, Kridanto
in
Damage
,
Data transmission
,
Electrocardiography
2022
The development of a mesh topology in multi-node electrocardiogram (ECG) monitoring based on the ZigBee protocol still has limitations. When more than one active ECG node sends a data stream, there will be incorrect data or damage due to a failure of synchronization. The incorrect data will affect signal interpretation. Therefore, a mechanism is needed to correct or predict the damaged data. In this study, the method of expectation-maximization (EM) and regression imputation (RI) was proposed to overcome these problems. Real data from previous studies are the main modalities used in this study. The ECG signal data that has been predicted is then compared with the actual ECG data stored in the main controller memory. Root mean square error (RMSE) is calculated to measure system performance. The simulation was performed on 13 ECG waves, each of them has 1000 samples. The simulation results show that the EM method has a lower predictive error value than the RI method. The average RMSE for the EM and RI methods is 4.77 and 6.63, respectively. The proposed method is expected to be used in the case of multi-node ECG monitoring, especially in the ZigBee application to minimize errors.
Journal Article
Empirical Likelihood-Based Inference under Imputation for Missing Response Data
2002
Inference under kernel regression imputation for missing response data is considered. An adjusted empirical likelihood approach to inference for the mean of the response variable is developed. A nonparametric version of Wilks' theorem is proved for the adjusted empirical log-likelihood ratio by showing that it has an asymptotic standard chi-squared distribution, and the corresponding empirical likelihood confidence interval for the mean is constructed. With auxiliary information, an empirical likelihood-based estimator is defined and an adjusted empirical log-likelihood ratio is derived. Asymptotic normality of the estimator is proved. Also, it is shown that the adjusted empirical log-likelihood ratio obeys Wilks' theorem. A simulation study is conducted to compare the adjusted empirical likelihood and the normal approximation methods in terms of coverage accuracies and average lengths of confidence intervals. Based on biases and standard errors, a comparision is also made by simulation between the empirical likelihood-based estimator and related estimators. Our simulation indicates that the adjusted empirical likelihood method performs competitively and that the use of auxiliary information provides improved inferences.
Journal Article