Catalogue Search | MBRL

Semi-supervised approaches to efficient evaluation of model prediction performance

by Cai, Tianxi , Gronsbell, Jessica L. in Algorithms , Arthritis , artificial intelligence

2018

In many modern machine learning applications, the outcome is expensive or time consuming to collect whereas the predictor information is easy to obtain. Semi-supervised (SS) learning aims at utilizing large amounts of ‘unlabelled’ data along with small amounts of ‘labelled’ data to improve the efficiency of a classical supervised approach. Though numerous SS learning classification and prediction procedures have been proposed in recent years, no methods currently exist to evaluate the prediction performance of a working regression model. In the context of developing phenotyping algorithms derived from electronic medical records, we present an efficient two-step estimation procedure for evaluating a binary classifier based on various prediction performance measures in the SS setting. In step I, the labelled data are used to obtain a non-parametrically calibrated estimate of the conditional risk function. In step II, SS estimates of the prediction accuracy parameters are constructed based on the estimated conditional risk function and the unlabelled data. We demonstrate that, under mild regularity conditions, the estimators proposed are consistent and asymptotically normal. Importantly, the asymptotic variance of the SS estimators is always smaller than that of the supervised counterparts under correct model specification. We also correct for potential overfitting bias in the SS estimators in finite samples with cross-validation and we develop a perturbation resampling procedure to approximate their distributions. Our proposals are evaluated through extensive simulation studies and illustrated with two real electronic medical record studies aiming to develop phenotyping algorithms for rheumatoid arthritis and multiple sclerosis.

Journal Article

Share this book

Add to My Shelf

Changes in the top 25 reasons for primary care visits during the COVID-19 pandemic in a high-COVID region of Canada

by O’Neill, Braden , Butt, Debra A. , Tu, Karen in Anxiety , Canada , Chronic illnesses

2021

We aimed to determine the degree to which reasons for primary care visits changed during the COVID-19 pandemic. We used data from the University of Toronto Practice Based Research Network (UTOPIAN) to compare the most common reasons for primary care visits before and after the onset of the COVID-19 pandemic, focusing on the number of visits and the number of patients seen for each of the 25 most common diagnostic codes. The proportion of visits involving virtual care was assessed as a secondary outcome. UTOPIAN family physicians (N = 379) conducted 702,093 visits, involving 264,942 patients between March 14 and December 31, 2019 (pre-pandemic period), and 667,612 visits, involving 218,335 patients between March 14 and December 31, 2020 (pandemic period). Anxiety was the most common reason for visit, accounting for 9.2% of the total visit volume during the pandemic compared to 6.5% the year before. Diabetes and hypertension remained among the top 5 reasons for visit during the pandemic, but there were 23.7% and 26.2% fewer visits and 19.5% and 28.8% fewer individual patients accessing care for diabetes and hypertension, respectively. Preventive care visits were substantially reduced, with 89.0% fewer periodic health exams and 16.2% fewer well-baby visits. During the pandemic, virtual care became the dominant care format (77.5% virtual visits). Visits for anxiety and depression were the most common reasons for a virtual visit (90.6% virtual visits). The decrease in primary care visit volumes during the COVID-19 pandemic varied based on the reason for the visit, with increases in visits for anxiety and decreases for preventive care and visits for chronic diseases. Implications of increased demands for mental health services and gaps in preventive care and chronic disease management may require focused efforts in primary care.

Journal Article

Share this book

Add to My Shelf

Trends in pulmonary exercise testing utilization after the COVID-19 pandemic in Ontario: A population-cohort study

by Moineddin, Rahim , O’Neill, Braden , Butt, Debra A. in Adult , Aged , Asthma

2026

Pulmonary exercise testing, including six-minute walk tests, exercise oximetry, and independent exercise assessments, are critical tools for managing chronic respiratory and cardiac conditions, evaluating treatment response, and determining long-term oxygen therapy needs. During the COVID-19 pandemic, testing was reduced to limit viral spread. This study aimed to evaluate post-pandemic trends of pulmonary exercise testing utilization in Ontario overall and across demographic groups. We conducted a population-based cohort study using Ontario administrative data between April 2015 and December 2023 to evaluate pulmonary exercise testing before, during, and after the COVID-19 pandemic. We used an Auto-Regressive Integrated Moving Average Model (ARIMA) model and incidence rate ratios to evaluate recovery trends. Subgroup analysis examined if trends were similar in different groups. During the study period, 505,902 tests were performed for 362,888 individuals. As of December 2023, testing rates were still 21% below pre-pandemic levels (IRR 0.79, 95%CI 0.70-0.89). Recovery was lower in males (IRR 0.76, 95%CI 0.66-0.86) and individuals living in lower socioeconomic status neighborhoods (IRR 0.71, 95%CI 0.58-0.86). Northern Ontario saw the most pronounced shortfall compared to other regions, with testing rates one-third of pre-pandemic levels (IRR 0.33, 95% CI 0.26-0.43). More than three years after the pandemic began, pulmonary exercise testing rates have yet to return to pre-pandemic levels, with certain groups disproportionately affected. This highlights a significant and ongoing disruption in diagnostic capacity and quality of care for people with respiratory and cardiac diseases.

Journal Article

Share this book

Add to My Shelf

Sociodemographic and Health Behaviour of Frequent, Avoidable Emergency Department Users in Ontario, Canada: A Population-based Descriptive Study

by Schull, Michael J , Thompson, Cameron , Rosella, Laura CA in Adolescent , Adult , Aged

2025

Introduction: Frequent users are a small but important group of patients in the emergency department (ED). This group is often the target of interventions that redirect visits to other areas of the healthcare system under the premise that some of these visits could be best managed elsewhere. Most existing interventions do not consider sociodemographic factors when targeting specific populations, while larger scale policy initiatives often do not reach those who would most benefit from alternative points of healthcare access. In this study we use population-level survey data linked to health administrative data to describe frequent ED users and those whose visits are potentially avoidable and could benefit from additional points of healthcare access. Methods: This was a population-based cohort study of responses from 18-74 year-old Ontario residents to the Canadian Community Health Survey from 2001–2014, which we linked to administrative health data for one-year following survey completion. We categorized participants according to the frequency of their ED use in the year following survey date and whether any of their visits were potentially avoidable. Associations between category of ED use and various sociodemographic, health, and behavioural factors were examined with multinomial logistic regression. Results: A total of 181,369 eligible respondents were included in this study. Of these, 1,460 (0.8%) were frequent users (four or more visits) with one or more potentially avoidable visits in the year following survey date. Compared to non-ED users, frequent users with avoidable visits were associated with the lowest quintile of household income (aOR: 1.91, 95% CI: 1.37, 2.65), rural-dwelling (aOR: 1.44, 95% CI: 1.18, 1.77), and the highest quintile of material resource deprived neighbourhoods (aOR: 2.23, 95% CI: 1.47, 3.36). They were more likely to have poor self-reported physical (17.2% vs 9.0%) and mental health (4.1% vs 2.7%) compared to total cohort, and more likely to have comorbidities (63.3% vs 48.7%), but less likely to access a usual provider of care for their healthcare needs (33.3% vs 28.2% without a usual provider of care). Conclusion: This study provides a novel description of frequent ED users for whom some of their visits were potentially avoidable. As efforts are made to redesign access to primary and community care, and with increasing emphasis on virtual care and other initiatives to reduce avoidable ED use, the healthcare system should ensure that these interventions are responsive to the needs of the people at higher likelihood of needing them.

Journal Article

Share this book

Add to My Shelf

Changes in primary care visits arising from the COVID-19 pandemic: an international comparative study by the International Consortium of Primary Care Big Data Researchers (INTRePID)

by Wong, William CW , Kim, Young Sik , Kristiansson, Robert Sarkadi in Big Data , Capitation , Chronic illnesses

2022

IntroductionThrough the INTernational ConsoRtium of Primary Care BIg Data Researchers (INTRePID), we compared the pandemic impact on the volume of primary care visits and uptake of virtual care in Australia, Canada, China, Norway, Singapore, South Korea, Sweden, the UK and the USA.MethodsVisit definitions were agreed on centrally, implemented locally across the various settings in INTRePID countries, and weekly visit counts were shared centrally for analysis. We evaluated the weekly rate of primary care physician visits during 2019 and 2020. Rate ratios (RRs) of total weekly visit volume and the proportion of weekly visits that were virtual in the pandemic period in 2020 compared with the same prepandemic period in 2019 were calculated.ResultsIn 2019 and 2020, there were 80 889 386 primary care physician visits across INTRePID. During the pandemic, average weekly visit volume dropped in China, Singapore, South Korea, and the USA but was stable overall in Australia (RR 0.98 (95% CI 0.92 to 1.05, p=0.59)), Canada (RR 0.96 (95% CI 0.89 to 1.03, p=0.24)), Norway (RR 1.01 (95% CI 0.88 to 1.17, p=0.85)), Sweden (RR 0.91 (95% CI 0.79 to 1.06, p=0.22)) and the UK (RR 0.86 (95% CI 0.72 to 1.03, p=0.11)). In countries that had negligible virtual care prepandemic, the proportion of visits that were virtual were highest in Canada (77.0%) and Australia (41.8%). In Norway (RR 8.23 (95% CI 5.30 to 12.78, p<0.001), the UK (RR 2.36 (95% CI 2.24 to 2.50, p<0.001)) and Sweden (RR 1.33 (95% CI 1.17 to 1.50, p<0.001)) where virtual visits existed prepandemic, it increased significantly during the pandemic.ConclusionsThe drop in primary care in-person visits during the pandemic was a global phenomenon across INTRePID countries. In several countries, primary care shifted to virtual visits mitigating the drop in in-person visits.

Journal Article

Share this book

Add to My Shelf

The association between care modality and hospitalizations and emergency department visits for ambulatory care-sensitive conditions during and after the pandemic in Ontario, Canada

by Ortigoza, Angela , Moineddin, Rahim , Valencia, Javier Silva in Adult , Aged , Ambulatory care

2025

The COVID-19 pandemic required a rapid transition to virtual care as a key strategy to maintain healthcare access while minimizing virus transmission risks. However, the impact of this shift on hospitalizations and emergency department (ED) visits for ambulatory care-sensitive conditions (ACSCs) remains unclear. This study aims to assess the relationship between the modality of outpatient care for ACSCs and their outcomes in Ontario, Canada. In this population-based retrospective cohort study, we analyzed hospitalization and ED visit data for ACSCs, including diabetes, epilepsy, congestive heart failure, hypertension, and angina, during the pandemic (April 2020 to April 2023) and post-pandemic (May 2023 to August 2023) periods. Monthly trends in hospitalizations and ED visits were evaluated using Generalized Additive Models and Generalized Additive Mixed Models, accounting for the effects of virtual and in-person care within 30 days and 60 days preceding each event. Despite a notable decrease in virtual visits and a corresponding rise in in-person visits, overall hospitalizations and ED visits for ACSCs remained relatively stable. Our analysis found no significant association between care modality and changes in hospitalizations and ED visits, suggesting that virtual care, particularly during the early pandemic, effectively supported chronic disease management and contributed to the stability of acute care needs. In conclusion, virtual care proved to be a sustainable component of ACSC management during and after the COVID-19 pandemic, complementing in-person care.

Journal Article

Share this book

Add to My Shelf

Testing regular expression searches and machine learning models to determine housing instability and low income status from primary care electronic medical record data in Toronto, Ontario

by Weyman, Karen , Meaney, Christopher , Wang, Ri in Adult , Aged , Artificial intelligence

2025

Background Housing and income are important social determinants of health (SDoH). Primary care providers often do not have information about these determinants, which could be used to support equitable health system planning and care delivery. The aim of this study was to use primary care electronic medical record (EMR) data to test two approaches (machine learning and regular expression searches) to obtain information about patients’ housing instability and low income status. Methods We used de-identified EMR data from the St. Michael’s Hospital Academic Family Health Team (Toronto, Ontario, Canada). A Health Equity Questionnaire is also routinely distributed to patients and includes questions about income and housing status; this formed the reference standard. First, a regular expression (REGEX) classifier was created using key text terms and codes; the second approach used supervised machine learning models (XGBoost). Discrimination and calibration metrics were calculated as compared to the patient-reported responses. Results 11,794 eligible patients were included in the housing cohort and 10,454 were in the income cohort. Overall, both approaches had poor sensitivity for determining both housing instability (XGBoost: 3.1%, REGEX: 29.0%) and low income status (XGBoost: 41.7%, REGEX: 17.6%). Positive predictive value (PPV) was satisfactory for the machine learning approach (83.3% for housing, 72.9% for income). Conclusion While the machine learning approach demonstrated reasonable PPV, the overall metrics were poor and unlikely to be useful in a clinical setting for identifying patients with housing or economic needs. More robust analysis could be explored, but continued patient-captured SDoH information is necessary.

Journal Article

Share this book

Add to My Shelf

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies

by Lin, Yucong , Jemielita, Thomas , Zhao, Rachel in Algorithms , Archives & records , Artificial intelligence

2023

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.

Journal Article

Share this book

Add to My Shelf

When algorithms infer gender: revisiting computational phenotyping with electronic health records data

by Chaudhury, Diksha Sen , O’Neill, Braden , Bonneville, Rebecca in Accuracy , Algorithms , Asthma

2025

Computational phenotyping has emerged as a practical solution to the incomplete collection of data on gender in electronic health records (EHRs). This approach relies on algorithms to infer a patient’s gender using the available data in their health record, such as diagnosis codes, medication histories, and information in clinical notes. Although intended to improve the visibility of trans and gender-expansive populations in EHR-based biomedical research, computational phenotyping raises significant methodological and ethical concerns related to the potential misuse of algorithm outputs. In this paper, we provide a narrative review of computational phenotyping of gender and examine its challenges through a critical lens. We also highlight existing recommendations for biomedical researchers and propose priorities for future work in this domain. Highlights Sex and gender are inconsistently recorded in electronic health records (EHRs), limiting the scope of biomedical research using these data. Computational phenotyping algorithms attempt to fill these gaps by inferring gender-related information from patients’ historical health data. While these approaches aim to improve the visibility of trans and gender expansive people in biomedical research, they also introduce important methodological and ethical concerns, including (1) data quality issues, (2) underlying assumptions about gender, (3) bias in algorithm design and validation, and (4) potential for misuse. Future research should focus on building just and conceptually sound foundations for gender-based inquiry, such as creating and using measurement tools that accommodate fluidity, center lived experience rather than biological proxies, and allow for individualized data collection without defaulting to gender assignment.

Journal Article

Share this book

Add to My Shelf

A study protocol for a predictive model to assess population‑based risk of adverse pregnancy outcomes: The Adverse Pregnancy Outcomes Population Risk Tool (PregPoRT)

by Chiodo, Sabrina , Rosella, Laura C. , Grandi, Sonia M. in Adverse pregnancy outcomes , Alcohol use , Biomedicine

2026

Background Adverse pregnancy outcomes (APOs), such as gestational diabetes, preeclampsia, and placental abruption, are major contributors to maternal and fetal morbidity and mortality, with implications for individual long-term health and health system performance. Existing prediction models for APOs rely primarily on clinical or biomarker data, with few incorporating social, behavioral, or environmental determinants that are critical for shaping perinatal outcomes. This study describes the development and validation protocol for the Adverse Pregnancy Outcomes Population Risk Tool (PregPoRT), a novel, population-based prediction model designed to estimate APO risk using population-based and routinely collected survey and administrative data in Canada. Methods PregPoRT will be developed using a retrospective cohort of female-identifying individuals, aged 15–49, who participated in the Canadian Community Health Survey (CCHS) between 2000 and 2017, and had a subsequent delivery hospitalization within two years recorded in the Discharge Abstract Database (DAD). Pre-pregnancy predictors were selected according to a health equity-informed framework by Kramer and colleagues (2019), and include biomedical, behavioral, social, and environmental variables from the CCHS, the Canadian Marginalization Index (CAN-Marg), the Canadian Urban Environmental Health Research Consortium (CANUE), and the Canadian Active Living Environments (Can-ALE) dataset. The primary outcome is a composite measure of APOs (gestational diabetes, preeclampsia, or placental abruption), identified using validated ICD codes. A Weibull accelerated failure time model will be used to estimate the risk of experiencing an APO. Continuous variables will be modeled with restricted cubic splines. Variable selection will be performed using the Least Absolute Shrinkage and Selection Operator (LASSO), and model performance will be assessed via discrimination, calibration, and overall accuracy. Validation strategies include split-sample, bootstrap, and temporal validation using later CCHS cycles. Survey weights will be applied throughout to ensure national representativeness. Discussion PregPoRT will be the first Canadian prediction model for APOs that leverages nationally representative, linked survey and administrative data and explicitly integrates social, behavioral, and environmental determinants of health, domains that have been largely absent from prior models. By incorporating modifiable and socially patterned risk factors, the tool is designed to support public health planning, resource allocation, and maternal health equity monitoring.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter