Catalogue Search | MBRL

Predictors of burnout among academic family medicine faculty: Looking back to plan forward

by Krueger, Paul , Antao, Viola , Kwong, Jeffrey C. in Adult , Biology and Life Sciences , Bivariate analysis

2026

To identify the prevalence and predictors of burnout among academic family medicine faculty. A comprehensive survey of academic family medicine faculty on burnout, perceptions of work life, and practice in 2011. A large, distributed Department of Family and Community Medicine at the University of Toronto. All 1029 faculty members were invited to participate. Maslach Burnout Inventory three subscales (emotional exhaustion, depersonalization, personal accomplishment). The survey response rate was 66.8% (687/1029). The prevalence of high emotional exhaustion scores was 27.0% and high depersonalization was 9.2%, whereas the prevalence of high personal accomplishment scores was 99.4%. Bivariate analyses identified 27 variables associated with emotional exhaustion and 18 variables associated with depersonalization, including: ratings of the practice setting; leadership and mentorship experiences; job satisfaction; health status; and demographic variables. Multivariate analyses found four predictors of emotional exhaustion: lower ratings of job satisfaction, poorer ratings of workplace quality, working ≥50 hrs/week, and poorer ratings of health status. Predictors of depersonalization included lower ratings of job satisfaction, ≤5 years in practice, lower ratings of health status, and poor ratings of mentorship received. This study describes the prevalence and predictors of burnout among physicians prior to the COVID-19 pandemic. Predictors that are potentially modifiable at local practice and systems levels include job satisfaction, workplace quality, hours worked, and mentorship received. New family physicians (≤5 years in practice) were at increased risk of depersonalization; strategies specific to this group may limit burnout and address the healthcare workforce crisis. Periodic studies are recommended to identify the impact of strategies implemented, emergent predictors, trends, and mitigating factors associated with burnout. The current crisis in family medicine indicates an urgent need to look back and plan forward.

Journal Article

Share this book

Add to My Shelf

Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users

by Stukel, Therese A. , Guan, Jun , Meaney, Christopher in Algorithms , Bayes Theorem , Clinical predictive modelling

2025

Background Supervised machine learning is increasingly being used to estimate clinical predictive models. Several supervised machine learning models involve hyper-parameters, whose values must be judiciously specified to ensure adequate predictive performance. Objective To compare several (nine) hyper-parameter optimization (HPO) methods, for tuning the hyper-parameters of an extreme gradient boosting model, with application to predicting high-need high-cost health care users. Methods Extreme gradient boosting models were estimated using a randomly sampled training dataset. Models were separately trained using nine different HPO methods: 1) random sampling, 2) simulated annealing, 3) quasi-Monte Carlo sampling, 4-5) two variations of Bayesian hyper-parameter optimization via tree-Parzen estimation, 6-7) two implementations of Bayesian hyper-parameter optimization via Gaussian processes, 8) Bayesian hyper-parameter optimization via random forests, and 9) the covariance matrix adaptation evolutionary strategy. For each HPO method, we estimated 100 extreme gradient boosting models at different hyper-parameter configurations; and evaluated model performance using an AUC metric on a randomly sampled validation dataset. Using the best model identified by each HPO method, we evaluated generalization performance in terms of discrimination and calibration metrics on a randomly sampled held-out test dataset (internal validation) and a temporally independent dataset (external validation). Results The extreme gradient boosting model estimated using default hyper-parameter settings had reasonable discrimination (AUC=0.82) but was not well calibrated. Hyper-parameter tuning using any HPO algorithm/sampler improved model discrimination (AUC=0.84), resulted in models with near perfect calibration, and consistently identified features predictive of high-need high-cost health care users. Conclusions In our study, all HPO algorithms resulted in similar gains in model performance relative to baseline models. This finding likely relates to our study dataset having a large sample size, a relatively small number of features, and a strong signal to noise ratio; and would likely apply to other datasets with similar characteristics.

Journal Article

Share this book

Add to My Shelf

Emergency department boarding: a descriptive analysis and measurement of impact on outcomes

by Amin, Qamar , Phalpher, Prashant , Mercuri, Mathew in Aged , Comorbidity , Crowding

2018

What is known about the topic? Bed boarding is one of the major contributors to emergency department overcrowding. What did this study ask? What are the characteristics of patients with prolonged boarding times, and what are the impacts on patient-oriented outcomes? What did this study find? Patients who were older, sicker, and had isolation and telemetry requirements experienced longer boarding times, and longer inpatient length of stay even after correcting for confounders. Why does this study matter to clinicians? Organization-wide interventions to improve efficiency and flow are required to mitigate the burden of bed boarding.

Journal Article

Share this book

Add to My Shelf

Testing regular expression searches and machine learning models to determine housing instability and low income status from primary care electronic medical record data in Toronto, Ontario

by Weyman, Karen , Meaney, Christopher , Wang, Ri in Adult , Aged , Artificial intelligence

2025

Background Housing and income are important social determinants of health (SDoH). Primary care providers often do not have information about these determinants, which could be used to support equitable health system planning and care delivery. The aim of this study was to use primary care electronic medical record (EMR) data to test two approaches (machine learning and regular expression searches) to obtain information about patients’ housing instability and low income status. Methods We used de-identified EMR data from the St. Michael’s Hospital Academic Family Health Team (Toronto, Ontario, Canada). A Health Equity Questionnaire is also routinely distributed to patients and includes questions about income and housing status; this formed the reference standard. First, a regular expression (REGEX) classifier was created using key text terms and codes; the second approach used supervised machine learning models (XGBoost). Discrimination and calibration metrics were calculated as compared to the patient-reported responses. Results 11,794 eligible patients were included in the housing cohort and 10,454 were in the income cohort. Overall, both approaches had poor sensitivity for determining both housing instability (XGBoost: 3.1%, REGEX: 29.0%) and low income status (XGBoost: 41.7%, REGEX: 17.6%). Positive predictive value (PPV) was satisfactory for the machine learning approach (83.3% for housing, 72.9% for income). Conclusion While the machine learning approach demonstrated reasonable PPV, the overall metrics were poor and unlikely to be useful in a clinical setting for identifying patients with housing or economic needs. More robust analysis could be explored, but continued patient-captured SDoH information is necessary.

Journal Article

Share this book

Add to My Shelf

Quality indices for topic model selection and evaluation: a literature review and case study

by Moineddin, Rahim , Greiver, Michelle , Stukel, Therese A. in Algorithms , Benchmarking , Canada

2023

Background Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus. Design, setting and data Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01/01/2017 through 12/31/2020 from primary care electronic medical records in Toronto Canada. Methods Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall’s weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and/or bootstrap stability analysis were used to estimate these metrics on our corpus. Results Cross-validated reconstruction error favored large topic models (K ≥ 100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K = 100 topics). Rank biased overlap and Kendall’s weighted tau favored small models (K = 5 topics). Few model evaluation metrics suggested mid-sized topic models (25 ≤ K ≤ 75) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus. Conclusions Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.

Journal Article

Share this book

Add to My Shelf

Doxylamine-pyridoxine for nausea and vomiting of pregnancy randomized placebo controlled trial: Prespecified analyses and reanalysis

by Persaud, Navindra , Moineddin, Rahim , Thorpe, Kevin in Active control , Biology and Life Sciences , Clinical trials

2018

Doxylamine-pyridoxine is recommended as a first line treatment for nausea and vomiting during pregnancy and it is commonly prescribed. We re-analysed the findings of a previously reported superiority trial of doxylamine-pyridoxine for the treatment of nausea and vomiting during pregnancy using the clinical study report obtained from Health Canada. We re-analysed individual level data for a parallel arm randomized controlled trial that was conducted in six outpatient obstetrical practices in the United States. Pregnant women between 7 and 14 weeks of gestation with moderate nausea and vomiting of pregnancy symptoms. The active treatment was a tablet containing both doxylamine 10 mg and pyridoxine 10 mg taken between 2 and 4 times per day for 14 days depending on symptoms. The control was an identical placebo tablet taken using the same instructions. The primary outcome measure was improvement in nausea and vomiting of symptoms scores using the 13-point pregnancy unique quantification of emesis scale between baseline and 14 days using an ANCOVA. 140 participants were randomized into each group. Data for 131 active treatment participants and 125 control participants were analysed. On the final day of the trial, 101 active treatment participants and 86 control participants provided primary outcome measures. There was greater improvement in symptoms scores with doxylamine-pyridoxine compared with placebo (0.73 points; 95% CI 0.21 to 1.25) when last observation carried forward imputation was used for missing data but the difference is not statistically significant using other approaches to missing data (e.g. 0.38; 95% CI -0.08 to 0.84 using complete data). There is a trend towards efficacy for nausea and vomiting symptoms with doxylamine-pyridoxine compared with placebo but the statistical significance of the difference depends on the method of handling missing data and the magnitude of the difference suggests that there is no clinically important benefit employing the prespecified minimal clinically important difference or \"expected difference\" of 3 points. Clinical Trial NCT00614445.

Journal Article

Share this book

Add to My Shelf

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study

by Lu, Kevin Jia Qi , Kemppainen, Joel , Leung, Fok-Han in Chatbots , Cognition & reasoning , Elder care

2023

Large language model (LLM)-based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks and language generation have advanced to the point of performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing the performance of these 2 LLM models to that of Family Medicine residents on a multiple-choice medical knowledge test can provide insights into their potential as medical education tools. This study aimed to quantitatively and qualitatively compare the performance of GPT-3.5, GPT-4, and Family Medicine residents in a multiple-choice medical knowledge test appropriate for the level of a Family Medicine resident. An official University of Toronto Department of Family and Community Medicine Progress Test consisting of multiple-choice questions was inputted into GPT-3.5 and GPT-4. The artificial intelligence chatbot's responses were manually reviewed to determine the selected answer, response length, response time, provision of a rationale for the outputted response, and the root cause of all incorrect responses (classified into arithmetic, logical, and information errors). The performance of the artificial intelligence chatbots were compared against a cohort of Family Medicine residents who concurrently attempted the test. GPT-4 performed significantly better compared to GPT-3.5 (difference 25.0%, 95% CI 16.3%-32.8%; McNemar test: P<.001); it correctly answered 89/108 (82.4%) questions, while GPT-3.5 answered 62/108 (57.4%) questions correctly. Further, GPT-4 scored higher across all 11 categories of Family Medicine knowledge. In 86.1% (n=93) of the responses, GPT-4 provided a rationale for why other multiple-choice options were not chosen compared to the 16.7% (n=18) achieved by GPT-3.5. Qualitatively, for both GPT-3.5 and GPT-4 responses, logical errors were the most common, while arithmetic errors were the least common. The average performance of Family Medicine residents was 56.9% (95% CI 56.2%-57.6%). The performance of GPT-3.5 was similar to that of the average Family Medicine resident (P=.16), while the performance of GPT-4 exceeded that of the top-performing Family Medicine resident (P<.001). GPT-4 significantly outperforms both GPT-3.5 and Family Medicine residents on a multiple-choice medical knowledge test designed for Family Medicine residents. GPT-4 provides a logical rationale for its response choice, ruling out other answer choices efficiently and with concise justification. Its high degree of accuracy and advanced reasoning capabilities facilitate its potential applications in medical education, including the creation of exam questions and scenarios as well as serving as a resource for medical knowledge or information on community services.

Journal Article

Share this book

Add to My Shelf

Sociodemographic differences in patient experience with primary care during COVID-19: results from a cross-sectional survey in Ontario, Canada

by Gill, Navsheer , Elman, Debbie , Meaney, Christopher in COVID-19 , COVID-19 - epidemiology , COVID-19 - therapy

2022

PurposeWe sought to understand patients’ care-seeking behaviours early in the pandemic, their use and views of different virtual care modalities, and whether these differed by sociodemographic factors.MethodsWe conducted a multisite cross-sectional patient experience survey at 13 academic primary care teaching practices between May and June 2020. An anonymised link to an electronic survey was sent to a subset of patients with a valid email address on file; sampling was based on birth month. For each question, the proportion of respondents who selected each response was calculated, followed by a comparison by sociodemographic characteristics using χ2 tests.ResultsIn total, 7532 participants responded to the survey. Most received care from their primary care clinic during the pandemic (67.7%, 5068/7482), the majority via phone (82.5%, 4195/5086). Among those who received care, 30.53% (1509/4943) stated that they delayed seeking care because of the pandemic. Most participants reported a high degree of comfort with phone (92.4%, 3824/4139), video (95.2%, 238/250) and email or messaging (91.3%, 794/870). However, those reporting difficulty making ends meet, poor or fair health and arriving in Canada in the last 10 years reported lower levels of comfort with virtual care and fewer wanted their practice to continue offering virtual options after the pandemic.ConclusionsOur study suggests that newcomers, people living with a lower income and those reporting poor or fair health have a stronger preference and comfort for in-person primary care. Further research should explore potential barriers to virtual care and how these could be addressed.

Journal Article

Share this book

Add to My Shelf

Evaluating ChatGPT-4 in the development of family medicine residency examinations

by Chaudhari, Hanu , Leung, Fok-Han , Meaney, Christopher in Biology and Life Sciences , Computer and Information Sciences , Medicine and Health Sciences

2025

Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.

Journal Article

Share this book

Add to My Shelf

Effectiveness of advertising availability of prenatal ultrasound on uptake of antenatal care in rural Uganda: A cluster randomized trial

by Meaney, Christopher , Sodhi, Sumeet , Anguyo, Geoffrey in Adult , Advertising , Advertising as Topic

2017

In rural Uganda pregnant women often lack access to health services, do not attend antenatal care, and tend to utilize traditional healers/birth attendants. We hypothesized that receiving a message advertising that \"you will be able to see your baby by ultrasound\" would motivate rural Ugandan women who otherwise might use a traditional birth attendant to attend antenatal care, and that those women would subsequently be more satisfied with care. A cluster randomized trial was conducted across eight rural sub-counties in southwestern Uganda. Sub-counties were randomized to a control arm, with advertisement of antenatal care with no mention of portable obstetric ultrasound (four communities, n = 59), or an intervention arm, with advertisement of portable obstetric ultrasound. Advertisement of portable obstetric ultrasound was further divided into intervention A) word of mouth advertisement of portable obstetric ultrasound and antenatal care (one communitity, n = 16), B) radio advertisement of only antenatal care and word of mouth advertisement of antenatal care and portable obstetric ultrasound (one community, n = 7), or C) word of mouth + radio advertisement of both antenatal care and portable obstetric ultrasound (two communities, n = 75). The primary outcome was attendance to antenatal care. 159 women presented to antenatal care across eight sub-counties. The rate of attendance was 65.1 (per 1000 pregnant women, 95% CI 38.3-110.4) where portable obstetric ultrasound was advertised by radio and word of mouth, as compared to a rate of 11.1 (95% CI 6.1-20.1) in control communities (rate ratio 5.9, 95% CI 2.6-13.0, p<0.0001). Attendance was also improved in women who had previously seen a traditional healer (13.0, 95% CI 5.4-31.2) compared to control (1.5, 95% CI 0.5-5.0, rate ratio 8.7, 95% CI 2.0-38.1, p = 0.004). By advertising antenatal care and portable obstetric ultrasound by radio attendance was significantly improved. This study suggests that women can be motivated to attend antenatal care when offered the concrete incentive of seeing their baby.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter