Catalogue Search | MBRL

Distributionally robust learning-to-rank under the Wasserstein metric

by Chen, Ruidi , Sotudian, Shahabeddin , Paschalidis, Ioannis Ch in Algorithms , Analysis , Annotations

2023

Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish to degrade the algorithm’s performance. It has been shown that Distributionally Robust Optimization (DRO) is resilient against various types of noise and perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR) . Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. In this way, we are able to incorporate the LTR metrics into our model. DRMRR uses a Wasserstein DRO framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We present a compact and computationally solvable reformulation of the min-max formulation of DRMRR. Our experiments were conducted on two real-world applications: medical document retrieval and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted an extensive analysis to examine the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. Accordingly, DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data.

Journal Article

Share this book

Add to My Shelf

Detection of dementia on voice recordings using deep learning: a Framingham Heart Study

by Karjadi, Cody , Paschalidis, Ioannis Ch , Kolachalama, Vijaya B. in Accuracy , Alzheimer's disease , Artificial Intelligence in Dementia Research

2021

Background Identification of reliable, affordable, and easy-to-use strategies for detection of dementia is sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data are not readily available. Methods and findings We used 1264 voice recordings of neuropsychological examinations administered to participants from the Framingham Heart Study (FHS), a community-based longitudinal observational study. The recordings were 73 min in duration, on average, and contained at least two speakers (participant and examiner). Of the total voice recordings, 483 were of participants with normal cognition (NC), 451 recordings were of participants with mild cognitive impairment (MCI), and 330 were of participants with dementia (DE). We developed two deep learning models (a two-level long short-term memory (LSTM) network and a convolutional neural network (CNN)), which used the audio recordings to classify if the recording included a participant with only NC or only DE and to differentiate between recordings corresponding to those that had DE from those who did not have DE (i.e., NDE (NC+MCI)). Based on 5-fold cross-validation, the LSTM model achieved a mean (±std) area under the receiver operating characteristic curve (AUC) of 0.740 ± 0.017, mean balanced accuracy of 0.647 ± 0.027, and mean weighted F1 score of 0.596 ± 0.047 in classifying cases with DE from those with NC. The CNN model achieved a mean AUC of 0.805 ± 0.027, mean balanced accuracy of 0.743 ± 0.015, and mean weighted F1 score of 0.742 ± 0.033 in classifying cases with DE from those with NC. For the task related to the classification of participants with DE from NDE, the LSTM model achieved a mean AUC of 0.734 ± 0.014, mean balanced accuracy of 0.675 ± 0.013, and mean weighted F1 score of 0.671 ± 0.015. The CNN model achieved a mean AUC of 0.746 ± 0.021, mean balanced accuracy of 0.652 ± 0.020, and mean weighted F1 score of 0.635 ± 0.031 in classifying cases with DE from those who were NDE. Conclusion This proof-of-concept study demonstrates that automated deep learning-driven processing of audio recordings of neuropsychological testing performed on individuals recruited within a community cohort setting can facilitate dementia screening.

Journal Article

Share this book

Add to My Shelf

Physiological and socioeconomic characteristics predict COVID-19 mortality and resource utilization in Brazil

by Paschalidis, Ioannis Ch , Silva, Amanda A. B. , Fleck, Julia L. in Brazil , Chronic illnesses , Comorbidity

2020

Given the severity and scope of the current COVID-19 pandemic, it is critical to determine predictive features of COVID-19 mortality and medical resource usage to effectively inform health, risk-based physical distancing, and work accommodation policies. Non-clinical sociodemographic features are important explanatory variables of COVID-19 outcomes, revealing existing disparities in large health care systems. We use nation-wide multicenter data of COVID-19 patients in Brazil to predict mortality and ventilator usage. The dataset contains hospitalized patients who tested positive for COVID-19 and had either recovered or were deceased between March 1 and June 30, 2020. A total of 113,214 patients with 50,387 deceased, were included. Both interpretable (sparse versions of Logistic Regression and Support Vector Machines) and state-of-the-art non-interpretable (Gradient Boosted Decision Trees and Random Forest) classification methods are employed. Death from COVID-19 was strongly associated with demographics, socioeconomic factors, and comorbidities. Variables highly predictive of mortality included geographic location of the hospital (OR = 2.2 for Northeast region, OR = 2.1 for North region); renal (OR = 2.0) and liver (OR = 1.7) chronic disease; immunosuppression (OR = 1.7); obesity (OR = 1.7); neurological (OR = 1.6), cardiovascular (OR = 1.5), and hematologic (OR = 1.2) disease; diabetes (OR = 1.4); chronic pneumopathy (OR = 1.4); immunosuppression (OR = 1.3); respiratory symptoms, ranging from respiratory discomfort (OR = 1.4) and dyspnea (OR = 1.3) to oxygen saturation less than 95% (OR = 1.7); hospitalization in a public hospital (OR = 1.2); and self-reported patient illiteracy (OR = 1.1). Validation accuracies (AUC) for predicting mortality and ventilation need reach 79% and 70%, respectively, when using only pre-admission variables. Models that use post-admission disease progression information reach accuracies (AUC) of 86% and 87% for predicting mortality and ventilation use, respectively. The results highlight the predictive power of socioeconomic information in assessing COVID-19 mortality and medical resource allocation, and shed light on existing disparities in the Brazilian health care system during the COVID-19 pandemic.

Journal Article

Share this book

Add to My Shelf

The impact of payer status on hospital admissions: evidence from an academic medical center

by Hu, Jianqiang , Zhao, Yanying , Paschalidis, Ioannis Ch in Admission and discharge , Demographic aspects , economics and financing systems

2021

Background There are plenty of studies investigating the disparity of payer status in accessing to care. However, most studies are either disease-specific or cohort-specific. Quantifying the disparity from the level of facility through a large controlled study are rare. This study aims to examine how the payer status affects patient hospitalization from the perspective of a facility. Methods We extracted all patients with visiting record in a medical center between 5/1/2009-4/30/2014, and then linked the outpatient and inpatient records three year before target admission time to patients. We conduct a retrospective observational study using a conditional logistic regression methodology. To control the illness of patients with different diseases in training the model, we construct a three-dimension variable with data stratification technology. The model is validated on a dataset distinct from the one used for training. Results Patients covered by private insurance or uninsured are less likely to be hospitalized than patients insured by government. For uninsured patients, inequity in access to hospitalization is observed. The value of standardized coefficients indicates that government-sponsored insurance has the greatest impact on improving patients’ hospitalization. Conclusion Attention is needed on improving the access to care for uninsured patients. Also, basic preventive care services should be enhanced, especially for people insured by government. The findings can serve as a baseline from which to measure the anticipated effect of measures to reduce disparity of payer status in hospitalization.

Journal Article

Share this book

Add to My Shelf

Personalized hypertension treatment recommendations by a data-driven model

by Cordella, Nicholas , Paschalidis, Ioannis Ch , Mishuris, Rebecca G. in Antihypertensives , Beta blockers , Blood pressure

2023

Background Hypertension is a prevalent cardiovascular disease with severe longer-term implications. Conventional management based on clinical guidelines does not facilitate personalized treatment that accounts for a richer set of patient characteristics. Methods Records from 1/1/2012 to 1/1/2020 at the Boston Medical Center were used, selecting patients with either a hypertension diagnosis or meeting diagnostic criteria (≥ 130 mmHg systolic or ≥ 90 mmHg diastolic, n = 42,752). Models were developed to recommend a class of antihypertensive medications for each patient based on their characteristics. Regression immunized against outliers was combined with a nearest neighbor approach to associate with each patient an affinity group of other patients. This group was then used to make predictions of future Systolic Blood Pressure (SBP) under each prescription type. For each patient, we leveraged these predictions to select the class of medication that minimized their future predicted SBP. Results The proposed model, built with a distributionally robust learning procedure, leads to a reduction of 14.28 mmHg in SBP, on average. This reduction is 70.30% larger than the reduction achieved by the standard-of-care and 7.08% better than the corresponding reduction achieved by the 2nd best model which uses ordinary least squares regression. All derived models outperform following the previous prescription or the current ground truth prescription in the record. We randomly sampled and manually reviewed 350 patient records; 87.71% of these model-generated prescription recommendations passed a sanity check by clinicians. Conclusion Our data-driven approach for personalized hypertension treatment yielded significant improvement compared to the standard-of-care. The model implied potential benefits of computationally deprescribing and can support situations with clinical equipoise.

Journal Article

Share this book

Add to My Shelf

Early prediction of level-of-care requirements in patients with COVID-19

by Breen, Kerry , Paschalidis, Ioannis Ch , Hao, Boran in Adult , Aged , Area Under Curve

2020

This study examined records of 2566 consecutive COVID-19 patients at five Massachusetts hospitals and sought to predict level-of-care requirements based on clinical and laboratory data. Several classification methods were applied and compared against standard pneumonia severity scores. The need for hospitalization, ICU care, and mechanical ventilation were predicted with a validation accuracy of 88%, 87%, and 86%, respectively. Pneumonia severity scores achieve respective accuracies of 73% and 74% for ICU care and ventilation. When predictions are limited to patients with more complex disease, the accuracy of the ICU and ventilation prediction models achieved accuracy of 83% and 82%, respectively. Vital signs, age, BMI, dyspnea, and comorbidities were the most important predictors of hospitalization. Opacities on chest imaging, age, admission vital signs and symptoms, male gender, admission laboratory results, and diabetes were the most important risk factors for ICU admission and mechanical ventilation. The factors identified collectively form a signature of the novel COVID-19 disease. The new coronavirus (now named SARS-CoV-2) causing the disease pandemic in 2019 (COVID-19), has so far infected over 35 million people worldwide and killed more than 1 million. Most people with COVID-19 have no symptoms or only mild symptoms. But some become seriously ill and need hospitalization. The sickest are admitted to an Intensive Care Unit (ICU) and may need mechanical ventilation to help them breath. Being able to predict which patients with COVID-19 will become severely ill could help hospitals around the world manage the huge influx of patients caused by the pandemic and save lives. Now, Hao, Sotudian, Wang, Xu et al. show that computer models using artificial intelligence technology can help predict which COVID-19 patients will be hospitalized, admitted to the ICU, or need mechanical ventilation. Using data of 2,566 COVID-19 patients from five Massachusetts hospitals, Hao et al. created three separate models that can predict hospitalization, ICU admission, and the need for mechanical ventilation with more than 86% accuracy, based on patient characteristics, clinical symptoms, laboratory results and chest x-rays. Hao et al. found that the patients’ vital signs, age, obesity, difficulty breathing, and underlying diseases like diabetes, were the strongest predictors of the need for hospitalization. Being male, having diabetes, cloudy chest x-rays, and certain laboratory results were the most important risk factors for intensive care treatment and mechanical ventilation. Laboratory results suggesting tissue damage, severe inflammation or oxygen deprivation in the body's tissues were important warning signs of severe disease. The results provide a more detailed picture of the patients who are likely to suffer from severe forms of COVID-19. Using the predictive models may help physicians identify patients who appear okay but need closer monitoring and more aggressive treatment. The models may also help policy makers decide who needs workplace accommodations such as being allowed to work from home, which individuals may benefit from more frequent testing, and who should be prioritized for vaccination when a vaccine becomes available.

Journal Article

Share this book

Add to My Shelf

Prescriptive analytics for reducing 30-day hospital readmissions after general surgery

by Bertsimas, Dimitris , Paschalidis, Ioannis Ch , Li, Michael Lingzhi in Accuracy , Biology and Life Sciences , Blood

2020

New financial incentives, such as reduced Medicare reimbursements, have led hospitals to closely monitor their readmission rates and initiate efforts aimed at reducing them. In this context, many surgical departments participate in the American College of Surgeons National Surgical Quality Improvement Program (NSQIP), which collects detailed demographic, laboratory, clinical, procedure and perioperative occurrence data. The availability of such data enables the development of data science methods which predict readmissions and, as done in this paper, offer specific recommendations aimed at preventing readmissions. This study leverages NSQIP data for 722,101 surgeries to develop predictive and prescriptive models, predicting readmissions and offering real-time, personalized treatment recommendations for surgical patients during their hospital stay, aimed at reducing the risk of a 30-day readmission. We applied a variety of classification methods to predict 30-day readmissions and developed two prescriptive methods to recommend pre-operative blood transfusions to increase the patient's hematocrit with the objective of preventing readmissions. The effect of these interventions was evaluated using several predictive models. Predictions of 30-day readmissions based on the entire collection of NSQIP variables achieve an out-of-sample accuracy of 87% (Area Under the Curve-AUC). Predictions based only on pre-operative variables have an accuracy of 74% AUC, out-of-sample. Personalized interventions, in the form of pre-operative blood transfusions identified by the prescriptive methods, reduce readmissions by 12%, on average, for patients considered as candidates for pre-operative transfusion (pre-operative hematoctic <30). The prediction accuracy of the proposed models exceeds results in the literature. This study is among the first to develop a methodology for making specific, data-driven, personalized treatment recommendations to reduce the 30-day readmission rate. The reported predicted reduction in readmissions can lead to more than $20 million in savings in the U.S. annually.

Journal Article

Share this book

Add to My Shelf

Informative predictors of pregnancy after first IVF cycle using eIVF practice highway electronic health records

by Mahalingaiah, Shruthi , Hammer, Karissa C. , Paschalidis, Ioannis Ch in 639/705/1042 , 692/699/2732/1577 , Adult

2022

The aim of this study is to determine the most informative pre- and in-cycle variables for predicting success for a first autologous oocyte in-vitro fertilization (IVF) cycle. This is a retrospective study using 22,413 first autologous oocyte IVF cycles from 2001 to 2018. Models were developed to predict pregnancy following an IVF cycle with a fresh embryo transfer. The importance of each variable was determined by its coefficient in a logistic regression model and the prediction accuracy based on different variable sets was reported. The area under the receiver operating characteristic curve (AUC) on a validation patient cohort was the metric for prediction accuracy. Three factors were found to be of importance when predicting IVF success: age in three groups (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos. For predicting first-cycle IVF pregnancy using all available variables, the predictive model achieved an AUC of 68% + /− 0.01%. A parsimonious predictive model utilizing age (38–40, 41–42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos achieved an AUC of 65% + /− 0.01%. The proposed models accurately predict a single IVF cycle pregnancy outcome and identify important predictive variables associated with the outcome. These models are limited to predicting pregnancy immediately after the IVF cycle and not live birth. These models do not include indicators of multiple gestation and are not intended for clinical application.

Journal Article

Share this book

Add to My Shelf

Accounting for racial bias and social determinants of health in a model of hypertension control

by Cordella, Nicholas , Paschalidis, Ioannis Ch , Mishuris, Rebecca G. in Adult , Aged , Analysis

2025

Background Hypertension control remains a critical problem and most of the existing literature views it from a clinical perspective, overlooking the role of sociodemographic factors. This study aims to identify patients with not well-controlled hypertension using readily available demographic and socioeconomic features and elucidate important predictive variables. Methods In this retrospective cohort study, records from 1/1/2012 to 1/1/2020 at the Boston Medical Center were used. Patients with either a hypertension diagnosis or related records (≥ 130 mmHg systolic or ≥ 90 mmHg diastolic, n = 164,041) were selected. Models were developed to predict which patients had uncontrolled hypertension defined as systolic blood pressure (SBP) records exceeding 160 mmHg. Results The predictive model of high SBP reached an Area Under the Receiver Operating Characteristic Curve of 74.49% ± 0.23%. Age, race, Social Determinants of Health (SDoH), mental health, and cigarette use were predictive of high SBP. Being Black or having critical social needs led to higher probability of uncontrolled SBP. To mitigate model bias and elucidate differences in predictive variables, two separate models were trained for Black and White patients. Black patients face a 4.7 × higher False Positive Rate (FPR) and a 0.58 × lower False Negative Rate (FNR) compared to White patients. Decision threshold differentiation was implemented to equalize FNR. Race-specific models revealed different sets of social variables predicting high SBP, with Black patients being affected by structural barriers (e.g., food and transportation) and White patients by personal and demographic factors (e.g., marital status). Conclusions Models using non-clinical factors can predict which patients exhibit poorly controlled hypertension. Racial and SDoH variables are significant predictors but lead to biased predictive models. Race-specific models are not sufficient to resolve such biases and require further decision threshold tuning. A host of structural socioeconomic factors are identified to be targeted to reduce disparities in hypertension control.

Journal Article

Share this book

Add to My Shelf

Social determinants of health and the prediction of missed breast imaging appointments

by Paschalidis, Ioannis Ch , Fishman, Michael D. C. , Sotudian, Shahabeddin in Algorithms , Analysis , Biopsy

2022

Background Predictive models utilizing social determinants of health (SDH), demographic data, and local weather data were trained to predict missed imaging appointments (MIA) among breast imaging patients at the Boston Medical Center (BMC). Patients were characterized by many different variables, including social needs, demographics, imaging utilization, appointment features, and weather conditions on the date of the appointment. Methods This HIPAA compliant retrospective cohort study was IRB approved. Informed consent was waived. After data preprocessing steps, the dataset contained 9,970 patients and 36,606 appointments from 1/1/2015 to 12/31/2019. We identified 57 potentially impactful variables used in the initial prediction model and assessed each patient for MIA. We then developed a parsimonious model via recursive feature elimination, which identified the 25 most predictive variables. We utilized linear and non-linear models including support vector machines (SVM), logistic regression (LR), and random forest (RF) to predict MIA and compared their performance. Results The highest-performing full model is the nonlinear RF, achieving the highest Area Under the ROC Curve (AUC) of 76% and average F1 score of 85%. Models limited to the most predictive variables were able to attain AUC and F1 scores comparable to models with all variables included. The variables most predictive of missed appointments included timing, prior appointment history, referral department of origin, and socioeconomic factors such as household income and access to caregiving services. Conclusions Prediction of MIA with the data available is inherently limited by the complex, multifactorial nature of MIA. However, the algorithms presented achieved acceptable performance and demonstrated that socioeconomic factors were useful predictors of MIA. In contrast with non-modifiable demographic factors, we can address SDH to decrease the incidence of MIA.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter