Catalogue Search | MBRL

by Bartlett, Jonathan W. , von Hippel, Paul T. in Bayesian analysis , Bootstrap method , Estimates

2021

Multiple imputation (MI) is a method for repairing and analyzing data with missing values. MI replaces missing values with a sample of random values drawn from an imputation model. The most popular form of MI, which we call posterior draw multiple imputation (PDMI), draws the parameters of the imputation model from a Bayesian posterior distribution. An alternative, which we call maximum likelihood multiple imputation (MLMI), estimates the parameters of the imputation model using maximum likelihood (or equivalent). Compared to PDMI, MLMI is faster and yields slightly more efficient point estimates. A past barrier to using MLMI was the difficulty of estimating the standard errors of MLMI point estimates. We derive, implement and evaluate three consistent standard error formulas: (1) one combines variances within and between the imputed datasets, (2) one uses the score function and (3) one uses the bootstrap with two imputations of each bootstrapped sample. Formula (1) modifies for MLMI a formula that has long been used under PDMI, while formulas (2) and (3) can be used without modification under either PDMI or MLMI. We have implemented MLMI and the standard error estimators in the mlmi and bootImpute packages for R.

Journal Article

Share this book

Add to My Shelf

Learning the Structure of Bayesian Networks from Incomplete Data Using a Mixture Model

by Vomlel, Jiří , Salman, Issam

2023

In this paper, we provide an approach to learning optimal Bayesian network (BN) structures from incomplete data based on the BIC score function using a mixture model to handle miss- ing values. We have compared the proposed approach with other methods. Our experiments have been conducted on different models, some of them Belief Noisy-Or (BNO) ones. We have performed experiments using datasets with values missing completely at random having differ- ent missingness rates and data sizes. We have analyzed the significance of differences between the algorithm performance levels using the Wilcoxon test. The new approach typically learns additional edges in the case of Belief Noisy-or models. We have analyzed this issue using the Chi-square test of independence between the variables in the true models; this approach reveals that additional edges can be explained by strong dependence in generated data. An important property of our new method for learning BNs from incomplete data is that it can learn not only optimal general BNs but also specific Belief Noisy-Or models which is using in many applica- tions such as medical application.

Journal Article

Share this book

Add to My Shelf

SEMIPARAMETRIC OPTIMAL ESTIMATION WITH NONIGNORABLE NONRESPONSE DATA

by Kim, Jae Kwang , Morikawa, Kosuke in Asymptotic methods , Asymptotic properties , Econometrics

2021

When the response mechanism is believed to be not missing at random (NMAR), a valid analysis requires stronger assumptions on the response mechanism than standard statistical methods would otherwise require. Semiparametric estimators have been developed under the parametric model assumptions on the response mechanism. In this paper, a new statistical test is proposed to guarantee model identifiability without using instrumental variable assumption. Furthermore, we develop optimal semiparametric estimation for parameters such as the population mean. Specifically, we propose two semiparametric optimal estimators that do not require any model assumptions other than the response mechanism. Asymptotic properties of the proposed estimators are discussed. An extensive simulation study is presented to compare with some existing methods. We present an application of our method using Korean labor and income panel survey data.

Journal Article

Share this book

Add to My Shelf

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

by Xue, Bing , Zhang, Mengjie , Al-Helali, Baligh in Artificial Intelligence , Computational Intelligence , Control

2021

Incompleteness is one of the problematic data quality challenges in real-world machine learning tasks. A large number of studies have been conducted for addressing this challenge. However, most of the existing studies focus on the classification task and only a limited number of studies for symbolic regression with missing values exist. In this work, a new imputation method for symbolic regression with incomplete data is proposed. The method aims to improve both the effectiveness and efficiency of imputing missing values for symbolic regression. This method is based on genetic programming (GP) and weighted K-nearest neighbors (KNN). It constructs GP-based models using other available features to predict the missing values of incomplete features. The instances used for constructing such models are selected using weighted KNN. The experimental results on real-world data sets show that the proposed method outperforms a number of state-of-the-art methods with respect to the imputation accuracy, the symbolic regression performance, and the imputation time.

Journal Article

Share this book

Add to My Shelf

Assessments of attrition bias in Cochrane systematic reviews are highly inconsistent and thus hindering trial comparability

by Puljak, Livia , Vuka, Ivana , Miosic, Ivana in Attrition bias , Bias , Clinical trials

2019

Background An important part of the systematic review methodology is appraisal of the risk of bias in included studies. Cochrane systematic reviews are considered golden standard regarding systematic review methodology, but Cochrane’s instructions for assessing risk of attrition bias are vague, which may lead to inconsistencies in authors’ assessments. The aim of this study was to analyze consistency of judgments and support for judgments of attrition bias in Cochrane reviews of interventions published in the Cochrane Database of Systematic Reviews (CDSR). Methods We analyzed Cochrane reviews published from July 2015 to June 2016 in the CDSR. We extracted data on number of included trials, judgment of attrition risk of bias for each included trial (low, unclear or high) and accompanying support for the judgment (supporting explanation). We also assessed how many Cochrane reviews had different judgments for the same supporting explanations. Results In the main analysis we included 10,292 judgments and supporting explanations for attrition bias from 729 Cochrane reviews. We categorized supporting explanations for those judgments into four categories and we found that most of the supporting explanations were unclear. Numerical indicators for percent of attrition, as well as statistics related to attrition were judged very differently. One third of Cochrane review authors had more than one category of supporting explanation; some had up to four different categories. Inconsistencies were found even with the number of judgments, names of risk of bias domains and different judgments for the same supporting explanations in the same Cochrane review. Conclusion We found very high inconsistency in methods of appraising risk of attrition bias in recent Cochrane reviews. Systematic review authors need clear guidance about different categories they should assess and judgments for those explanations. Clear instructions about appraising risk of attrition bias will improve reliability of the Cochrane’s risk of bias tool, help authors in making decisions about risk of bias and help in making reliable decisions in healthcare.

Journal Article

Share this book

Add to My Shelf

Fuzzy Model Identification Using Monolithic and Structured Approaches in Decision Problems with Partially Incomplete Data

by Sałabun, Wojciech , Shekhovtsov, Andrii , Kołodziejczyk, Joanna

2020

A significant challenge in the current trend in decision-making methods is the problem’s class in which the decision-maker makes decisions based on partially incomplete data. Classic methods of multicriteria decision analysis are used to analyze alternatives described by using numerical values. At the same time, fuzzy set modifications are usually used to include uncertain data in the decision-making process. However, data incompleteness is something else. In this paper, we show two approaches to identify fuzzy models with partially incomplete data. The monolithic approach assumes creating one model that requires many queries to the expert. In the structured approach, the problem is decomposed into several interrelated models. The main aim of the work is to compare their accuracy empirically and to determine the sensitivity of the obtained model to the used criteria. For this purpose, a study case will be presented. In order to compare the proposed approaches and analyze the significance of the decision criteria, we use two ranking similarity coefficients, i.e., symmetric rw and asymmetric WS. In this work, the limitations of each approach are presented, and the results show great similarity despite the use of two structurally different approaches. Finally, we show an example of calculations performed for alternatives with partially incomplete data.

Journal Article

Share this book

Add to My Shelf

Effective density-based clustering algorithms for incomplete data

by Xue, Zhonghao , Wang, Hongzhi in Algorithms , Big Data , Clustering

2021

Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems, we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.

Journal Article

Share this book

Add to My Shelf

Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level

by Murray, David W. , Rombach, Ines , Gray, Alastair M. in Accuracy , Clinical trials , Data analysis

2018

Background Missing data can introduce bias in the results of randomised controlled trials (RCTs), but are typically unavoidable in pragmatic clinical research, especially when patient reported outcome measures (PROMs) are used. Traditionally applied to the composite PROMs score of multi-item instruments, some recent research suggests that multiple imputation (MI) at the item level may be preferable under certain scenarios. This paper presents practical guidance on the choice of MI models for handling missing PROMs data based on the characteristics of the trial dataset. The comparative performance of complete cases analysis, which is commonly used in the analysis of RCTs, is also considered. Methods Realistic missing at random data were simulated using follow-up data from an RCT considering three different PROMs (Oxford Knee Score (OKS), EuroQoL 5 Dimensions 3 Levels (EQ-5D-3L), 12-item Short Form Survey (SF-12)). Data were multiply imputed at the item (using ordinal logit and predicted mean matching models), sub-scale and score level; unadjusted mean outcomes, as well as treatment effects from linear regression models were obtained for 1000 simulations. Performance was assessed by root mean square errors (RMSE) and mean absolute errors (MAE). Results Convergence problems were observed for MI at the item level. Performance generally improved with increasing sample sizes and lower percentages of missing data. Imputation at the score and subscale level outperformed imputation at the item level in small sample sizes ( n ≤ 200). Imputation at the item level is more accurate for high proportions of item-nonresponse. All methods provided similar results for large sample sizes (≥500) in this particular case study. Conclusions Many factors, including the prevalence of missing data in the study, sample size, the number of items within the PROM and numbers of levels within the individual items, and planned analyses need consideration when choosing an imputation model for missing PROMs data.

Journal Article

Share this book

Add to My Shelf

Alternative ways to handle missing values problem: A case study in earthquake dataset

by Pradana, Kenny Candra , Fakhruddin, Muhammad , Syazali, Muhamad in Data retrieval , Datasets , Earthquakes

2021

Dataset is a basic foundation that is often used in understanding a problem. It provides information for researchers to get solutions to the problem. In the data retrieval process, some errors may occur and cause the data to be incomplete for any reason. It was a problem in how to recover the missing values in a dataset. The first step is to look at the characteristics of the data. In this paper, we proposed three alternative ways to obtain the missing values of the dataset. In this case, we used the earthquake dataset that has special properties. We then present the results to see the performance of the proposed methods. The results show a good agreement for the missing data. This is a preliminary result of our research related to missing data in the earthquake dataset. This study has some limitations such as if the missing values occur in a large enough data block, the methods need to be improved.

Journal Article

Share this book

Add to My Shelf

Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data

by Yuan, Lei , Ye, Jieping , Wang, Yalin in Accuracy , Aged , Algorithms

2012

Analysis of incomplete data is a big challenge when integrating large-scale brain imaging datasets from different imaging modalities. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), for example, over half of the subjects lack cerebrospinal fluid (CSF) measurements; an independent half of the subjects do not have fluorodeoxyglucose positron emission tomography (FDG-PET) scans; many lack proteomics measurements. Traditionally, subjects with missing measures are discarded, resulting in a severe loss of available information. In this paper, we address this problem by proposing an incomplete Multi-Source Feature (iMSF) learning method where all the samples (with at least one available data source) can be used. To illustrate the proposed approach, we classify patients from the ADNI study into groups with Alzheimer's disease (AD), mild cognitive impairment (MCI) and normal controls, based on the multi-modality data. At baseline, ADNI's 780 participants (172AD, 397 MCI, 211 NC), have at least one of four data types: magnetic resonance imaging (MRI), FDG-PET, CSF and proteomics. These data are used to test our algorithm. Depending on the problem being solved, we divide our samples according to the availability of data sources, and we learn shared sets of features with state-of-the-art sparse learning methods. To build a practical and robust system, we construct a classifier ensemble by combining our method with four other methods for missing value estimation. Comprehensive experiments with various parameters show that our proposed iMSF method and the ensemble model yield stable and promising results.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter