Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
4 result(s) for "Nourelahi, Mehdi"
Sort by:
Raising awareness of potential biases in medical machine learning: Experience from a Datathon
Objective: To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score. Methods: Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report. Results: Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias. Discussion: Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.
An Analysis of Explainability of Predictions in Deep Networks: Methods and Applications
Convolutional Neural Networks (CNNs) are pivotal in computer vision tasks, with their evaluation often centered around test-set accuracy, out-of-distribution performance, and explainability via feature attribution methods. However, the interplay between these criteria remains unclear. This study bridges this gap by conducting a comprehensive analysis across 12 ImageNet-trained CNNs, encompassing three training algorithms and five architectures. We evaluate nine feature attribution methods to elucidate their relationships and implications for machine learning practitioners. Our findings reveal insights into CNN performance across the evaluated criteria. Firstly, adversarially robust CNNs exhibit higher explainability scores with gradient-based attribution methods, contrasting with CAM-based or perturbation-based methods. Secondly, despite their high accuracy, AdvProp models do not consistently excel in explainability, highlighting a decoupling of these metrics. Thirdly, among the attribution methods, Grad-CAM and RISE consistently emerge as superior choices, underscoring their reliability across diverse CNN architectures. Moreover, our analysis exposes biases in attribution methods. For instance, Insertion and Deletion methods show preferences towards vanilla and robust models, respectively, reflecting their alignment with CNN confidence score distributions.Furthermore, we explore the impact of saliency-based data augmentation on CNN performance in both vanilla and adversarial training settings. Through meticulous evaluations in a single-sample augmentation framework, we contrast methods that preserve versus remove salient regions. Our results demonstrate that saliency-based augmentation consistently outperforms random methods, substantiating its efficacy in enhancing CNN training.In conclusion, this study contributes a dual perspective: elucidating the intricate relationships between test-set accuracy, out-of-distribution performance, and explainability in CNNs, while also shedding light on the influential role of saliency-based data augmentation in improving CNN training outcomes. These findings provide actionable insights for ML practitioners, advocating for thoughtful selection of attribution methods and augmentation strategies tailored to specific application requirements and CNN architectures.
Machine Learning Predicts Bleeding Risk in Atrial Fibrillation Patients on Direct Oral Anticoagulant
•ML models outperformed conventional scores in predicting major bleeding in AF.•Random forest achieved an AUC of 0.76 vs HAS-BLED's AUC of 0.57 (p < 0.001).•SHAP analysis identified new bleeding risk factors like BMI and cholesterol profile.•Study included 24,468 AF patients on DOACs with a 5-year follow-up for bleeding events.•ML models offer more personalized bleeding risk assessment for AF patients on DOACs. Predicting major bleeding in nonvalvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized care. Alternatives like left atrial appendage closure devices lower stroke risk with fewer nonprocedural bleeds. This study compares machine learning (ML) models with conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) for predicting bleeding events requiring hospitalization in AF patients on DOACs at their index cardiologist visit. This retrospective cohort study used electronic health records from 2010 to 2022 at the University of Pittsburgh Medical Center. It included 24,468 nonvalvular AF patients (age ≥18) on DOACs, excluding those with prior significant bleeding or warfarin use. The primary outcome was hospitalization for bleeding within one year, with follow-up at one, two, and five years. ML algorithms (logistic regression, classification trees, random forest, XGBoost, k-nearest neighbor, naïve Bayes) were compared for performance. Of 24,468 patients, 553 (2.3%) had bleeding within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years. ML models outperformed HAS-BLED, ATRIA, and ORBIT in 1-year predictions. The random forest model achieved an AUC of 0.76 (0.70 to 0.81), G-Mean of 0.67, and net reclassification index of 0.14 compared to HAS-BLED's AUC of 0.57 (p < 0.001). ML models showed superior results across all timepoints and for hemorrhagic stroke. SHAP analysis identified new risk factors, including BMI, cholesterol profile, and insurance type. In conclusion, ML models demonstrated improved performance to conventional bleeding risk scores and uncovered novel risk factors, offering potential for more personalized bleeding risk assessment in AF patients on DOACs.
How explainable are adversarially-robust CNNs?
Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs have a higher explainability score on gradient-based attribution methods (but not CAM-based or perturbation-based methods). Second, AdvProp models, despite being highly accurate more than both vanilla and robust models alone, are not superior in explainability. Third, among 9 feature attribution methods tested, GradCAM and RISE are consistently the best methods. Fourth, Insertion and Deletion are biased towards vanilla and robust models respectively, due to their strong correlation with the confidence score distributions of a CNN. Fifth, we did not find a single CNN to be the best in all three criteria, which interestingly suggests that CNNs are harder to interpret as they become more accurate.