Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
670 result(s) for "Logistic regression analysis -- Data processing"
Sort by:
Logistic regression models
This text presents an overview of the full range of logistic models, including binary, proportional, ordered, and categorical response regression procedures. It illustrates how to apply the models to medical, health, environmental/ecological, physical, and social science data. Stata is used to develop, evaluate, and display most models while R code is given at the end of most chapters. The author examines the theoretical foundation of the models and describes how each type of model is established, interpreted, and evaluated as to its goodness of fit. Example data sets are available online in various formats and a solutions manual is available upon qualifying course adoption.
Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree
Preparation of landslide susceptibility maps is considered as the first important step in landslide risk assessments, but these maps are accepted as an end product that can be used for land use planning. The main objective of this study is to explore some new state-of-the-art sophisticated machine learning techniques and introduce a framework for training and validation of shallow landslide susceptibility models by using the latest statistical methods. The Son La hydropower basin (Vietnam) was selected as a case study. First, a landslide inventory map was constructed using the historical landslide locations from two national projects in Vietnam. A total of 12 landslide conditioning factors were then constructed from various data sources. Landslide locations were randomly split into a ratio of 70:30 for training and validating the models. To choose the best subset of conditioning factors, predictive ability of the factors were assessed using the Information Gain Ratio with 10-fold cross-validation technique. Factors with null predictive ability were removed to optimize the models. Subsequently, five landslide models were built using support vector machines (SVM), multi-layer perceptron neural networks (MLP Neural Nets), radial basis function neural networks (RBF Neural Nets), kernel logistic regression (KLR), and logistic model trees (LMT). The resulting models were validated and compared using the receive operating characteristic (ROC), Kappa index, and several statistical evaluation measures. Additionally, Friedman and Wilcoxon signed-rank tests were applied to confirm significant statistical differences among the five machine learning models employed in this study. Overall, the MLP Neural Nets model has the highest prediction capability (90.2 %), followed by the SVM model (88.7 %) and the KLR model (87.9 %), the RBF Neural Nets model (87.1 %), and the LMT model (86.1 %). Results revealed that both the KLR and the LMT models showed promising methods for shallow landslide susceptibility mapping. The result from this study demonstrates the benefit of selecting the optimal machine learning techniques with proper conditioning selection method in shallow landslide susceptibility mapping.
Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.
Logistic regression model training based on the approximate homomorphic encryption
Background Security concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives. Methods This paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov’s accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier. Results Our method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable. Conclusions We present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality.
Improved naive Bayes classification algorithm for traffic risk management
Naive Bayesian classification algorithm is widely used in big data analysis and other fields because of its simple and fast algorithm structure. Aiming at the shortcomings of the naive Bayes classification algorithm, this paper uses feature weighting and Laplace calibration to improve it, and obtains the improved naive Bayes classification algorithm. Through numerical simulation, it is found that when the sample size is large, the accuracy of the improved naive Bayes classification algorithm is more than 99%, and it is very stable; when the sample attribute is less than 400 and the number of categories is less than 24, the accuracy of the improved naive Bayes classification algorithm is more than 95%. Through empirical research, it is found that the improved naive Bayes classification algorithm can greatly improve the correct rate of discrimination analysis from 49.5 to 92%. Through robustness analysis, the improved naive Bayes classification algorithm has higher accuracy.
Digitalization and third-party logistics performance: exploring the roles of customer collaboration and government support
PurposeThe authors investigate how logistics digitalization affects two types of third-party logistics (3PL) performance: financial performance and service performance. In particular, the authors explore the mediating role of customer collaboration between logistics digitalization and firm performance based on organizational information processing theory and examine the moderating role of government support.Design/methodology/approachThe authors use an SPSS macro program (PROCESS regression analysis) to analyze survey data from 235 3PL firms in China. The mediation model, moderation model and moderated mediation model are tested.FindingsThe empirical results show that in the new age of digitalization transformation, logistics digitalization positively affects 3PL's financial performance and service performance by strengthening customer collaboration. Additionally, government support amplifies the positive effect of customer collaboration on service performance but not financial performance. The moderated mediation test further indicates that government support strengthens the positive indirect effect of digitalization on service performance through customer collaboration.Originality/valueThis study offers empirical insights into the growing body of 3PL literature, and the findings contribute to the theoretical and practical understanding of the emerging research topic of digital transformation (DT) and sustainability issues in 3PL firms.
Optimization of Causative Factors for Landslide Susceptibility Evaluation Using Remote Sensing and GIS Data in Parts of Niigata, Japan
This paper assesses the potentiality of certainty factor models (CF) for the best suitable causative factors extraction for landslide susceptibility mapping in the Sado Island, Niigata Prefecture, Japan. To test the applicability of CF, a landslide inventory map provided by National Research Institute for Earth Science and Disaster Prevention (NIED) was split into two subsets: (i) 70% of the landslides in the inventory to be used for building the CF based model; (ii) 30% of the landslides to be used for the validation purpose. A spatial database with fifteen landslide causative factors was then constructed by processing ALOS satellite images, aerial photos, topographical and geological maps. CF model was then applied to select the best subset from the fifteen factors. Using all fifteen factors and the best subset factors, landslide susceptibility maps were produced using statistical index (SI) and logistic regression (LR) models. The susceptibility maps were validated and compared using landslide locations in the validation data. The prediction performance of two susceptibility maps was estimated using the Receiver Operating Characteristics (ROC). The result shows that the area under the ROC curve (AUC) for the LR model (AUC = 0.817) is slightly higher than those obtained from the SI model (AUC = 0.801). Further, it is noted that the SI and LR models using the best subset outperform the models using the fifteen original factors. Therefore, we conclude that the optimized factor model using CF is more accurate in predicting landslide susceptibility and obtaining a more homogeneous classification map. Our findings acknowledge that in the mountainous regions suffering from data scarcity, it is possible to select key factors related to landslide occurrence based on the CF models in a GIS platform. Hence, the development of a scenario for future planning of risk mitigation is achieved in an efficient manner.
Combining instance-based learning and logistic regression for multilabel classification
Multilabel classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Recent research has shown that, just like for conventional classification, instance-based learning algorithms relying on the nearest neighbor estimation principle can be used quite successfully in this context. However, since hitherto existing algorithms do not take correlations and interdependencies between labels into account, their potential has not yet been fully exploited. In this paper, we propose a new approach to multilabel classification, which is based on a framework that unifies instance-based learning and logistic regression, comprising both methods as special cases. This approach allows one to capture interdependencies between labels and, moreover, to combine model-based and similarity-based inference for multilabel classification. As will be shown by experimental studies, our approach is able to improve predictive accuracy in terms of several evaluation criteria for multilabel prediction.
Prediction of premature all-cause mortality: A prospective general population cohort study comparing machine-learning and standard epidemiological approaches
Prognostic modelling using standard methods is well-established, particularly for predicting risk of single diseases. Machine-learning may offer potential to explore outcomes of even greater complexity, such as premature death. This study aimed to develop novel prediction algorithms using machine-learning, in addition to standard survival modelling, to predict premature all-cause mortality. A prospective population cohort of 502,628 participants aged 40-69 years were recruited to the UK Biobank from 2006-2010 and followed-up until 2016. Participants were assessed on a range of demographic, biometric, clinical and lifestyle factors. Mortality data by ICD-10 were obtained from linkage to Office of National Statistics. Models were developed using deep learning, random forest and Cox regression. Calibration was assessed by comparing observed to predicted risks; and discrimination by area under the 'receiver operating curve' (AUC). 14,418 deaths (2.9%) occurred over a total follow-up time of 3,508,454 person-years. A simple age and gender Cox model was the least predictive (AUC 0.689, 95% CI 0.681-0.699). A multivariate Cox regression model significantly improved discrimination by 6.2% (AUC 0.751, 95% CI 0.748-0.767). The application of machine-learning algorithms further improved discrimination by 3.2% using random forest (AUC 0.783, 95% CI 0.776-0.791) and 3.9% using deep learning (AUC 0.790, 95% CI 0.783-0.797). These ML algorithms improved discrimination by 9.4% and 10.1% respectively from a simple age and gender Cox regression model. Random forest and deep learning achieved similar levels of discrimination with no significant difference. Machine-learning algorithms were well-calibrated, while Cox regression models consistently over-predicted risk. Machine-learning significantly improved accuracy of prediction of premature all-cause mortality in this middle-aged population, compared to standard methods. This study illustrates the value of machine-learning for risk prediction within a traditional epidemiological study design, and how this approach might be reported to assist scientific verification.
Robust Mislabel Logistic Regression without Modeling Mislabel Probabilities
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ -logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression.