Catalogue Search | MBRL

Large language model performance versus human expert ratings in automated suicide risk assessment

by Meinlschmidt, Gunther , Thomas, Julia , Elyoseph, Zohar in 631/477 , 692/700 , Adolescent

2025

Large Language Models’ (LLMs) potential for psychological diagnostics requires systematic evaluation. We aimed to investigate conditions for reliable and valid psychological assessments, focusing on suicide risk evaluation in clinical data by comparing LLM-generated ratings with human expert ratings across model configurations., analyzing 100 youth crisis text line conversation transcripts rated by four experts using the Nurses’ Global Assessment of Suicide Scale (NGASR). Using Mixtral-8x7B-Instruct, we generated ratings across three temperature settings and prompting styles (zero-shot, few-shot, chain-of-thought). Across configurations we compared (a) inter-rating-reliability for AI-generated NGASR risk and sum scores, (b) LLM-to-human observer agreement regarding sum score, risk category, and item, using Krippendorff’s α, (c) classification metrics of risk categories and individual items against human ratings. LLM configuration strongly influenced assessment reliability. Zero-shot prompting at temperature 0 yielded perfect inter-rating reliability (α = 1.00, 95% CI: [1–1] for high & very high risk), while few-shot prompting showed best human-AI agreement for very high risk (α = 0.78, 95% CI: [0.67–0.89]) and strongest classification performance (balanced accuracy 0.54–0.71). Lower temperatures consistently improved reliability and accuracy. However, critical clinical items showed poor validity. Our findings establish optimal conditions (zero temperature, task-specific prompting) for LLM-based psychological assessment. However, inconsistent clinical item performance and only moderate LLM-to-human observer agreement limit LLMs to initial screening rather than detailed assessment, requiring careful parameter control and validation.

Journal Article

Share this book

Add to My Shelf

Protocol of the digital long COVID study: A single-center, registry-based, feasibility and clinical evaluation study to investigate a 12-week digital intervention program for people affected by post-COVID-19 condition

by Meienberg, Andrea , Schaefert, Rainer , Bopp, Katrin in Advertising , Biology and Life Sciences , Coronaviruses

2026

Up to 400 million individuals globally are estimated to experience persistent symptoms, including fatigue, muscle pain, and brain fog, following severe acute respiratory syndrome coronavirus type 2 infection. These persistent symptoms are referred to as Post-COVID-19 condition if they last for more than 12 weeks after infection and persist for at least 8 weeks and often causing significant distress and burden. The underlying pathological mechanisms have not yet been fully elucidated. Due to the heterogeneity of the disease a multifactorial origin is highly likely. Overall, evidence on optimal management is limited, and no medication has yet proven to be effective. Current symptom management and treatment guidelines suggest a biopsychosocial perspective and emphasize multidisciplinary approaches. Comprehensive interventions, adequate treatment access, and appropriate resources remain insufficiently available and implementing digital interventions might help mitigate these limitations. This protocol details a single-site feasibility and clinical evaluation study aiming to bridge this gap. By implementing an exploratory, open-label, digital interventional approach this study investigates the feasibility and efficacy of a 12-week program delivered by a cloud-based application. The program consists of 13 modules encompassing a wide range of topics (e.g., energy management, self-care, stress management) and includes informational (e.g., psychoeducational content) and interactive (e.g., exercises, self-reflection diaries) components. Customization options align the material with participant needs. A dedicated feedback section in each module captures feedback regarding usability and feasibility. Participants are monitored and checked for adherence throughout the study. The primary outcome is the post-intervention change in functional capacity measured by the World Health Organization Disability Assessment Schedule 2.0. All participants provide written informed consent. Key results from the study will be published in peer-reviewed journals.

Journal Article

Share this book

Add to My Shelf

Chronology of Onset of Mental Disorders and Physical Diseases in Mental-Physical Comorbidity - A National Representative Survey of Adolescents

by Tegethoff, Marion , Meinlschmidt, Gunther , Stalujanis, Esther in Adolescent , Adolescents , Affective disorders

2016

The objective was to estimate temporal associations between mental disorders and physical diseases in adolescents with mental-physical comorbidities. This article bases upon weighted data (N = 6483) from the National Comorbidity Survey Adolescent Supplement (participant age: 13-18 years), a nationally representative United States cohort. Onset of Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition lifetime mental disorders was assessed with the fully structured World Health Organization Composite International Diagnostic Interview, complemented by parent report. Onset of lifetime medical conditions and doctor-diagnosed diseases was assessed by self-report. The most substantial temporal associations with onset of mental disorders preceding onset of physical diseases included those between affective disorders and arthritis (hazard ratio (HR) = 3.36, 95%-confidence interval (CI) = 1.95 to 5.77) and diseases of the digestive system (HR = 3.39, CI = 2.30 to 5.00), between anxiety disorders and skin diseases (HR = 1.53, CI = 1.21 to 1.94), and between substance use disorders and seasonal allergies (HR = 0.33, CI = 0.17 to 0.63). The most substantial temporal associations with physical diseases preceding mental disorders included those between heart diseases and anxiety disorders (HR = 1.89, CI = 1.41 to 2.52), epilepsy and eating disorders (HR = 6.27, CI = 1.58 to 24.96), and heart diseases and any mental disorder (HR = 1.39, CI = 1.11 to 1.74). Findings suggest that mental disorders are antecedent risk factors of certain physical diseases in early life, but also vice versa. Our results expand the relevance of mental disorders beyond mental to physical health care, and vice versa, supporting the concept of a more integrated mental-physical health care approach, and open new starting points for early disease prevention and better treatments, with relevance for various medical disciplines.

Journal Article

Share this book

Add to My Shelf

Total somatic symptom score as a predictor of health outcome in somatic symptom disorders

by Sumathipala, Athula , McBeth, John , Rosmalen, Judith in Adjustment , Adolescent , Adult

2013

The diagnosis of somatisation disorder in DSM-IV was based on 'medically unexplained' symptoms, which is unsatisfactory. To determine the value of a total somatic symptom score as a predictor of health status and healthcare use after adjustment for anxiety, depression and general medical illness. Data from nine population-based studies (total n = 28 377) were analysed. In all cross-sectional analyses total somatic symptom score was associated with health status and healthcare use after adjustment for confounders. In two prospective studies total somatic symptom score predicted subsequent health status. This association appeared stronger than that for medically unexplained symptoms. Total somatic symptom score provides a predictor of health status and healthcare use over and above the effects of anxiety, depression and general medical illnesses.

Journal Article

Share this book

Add to My Shelf

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

by Refoua, Elad , Meinlschmidt, Gunther , Hadar Shoval, Dorit in 4014/477 , 631/477 , 639/705

2026

Accurate emotion recognition is a foundational component of social cognition, yet human biases can compromise its reliability. The emergent capabilities of multimodal large language models (MLLMs) offer a potential avenue for objective analysis, but their performance has been tested mainly with ethnically homogenous stimuli. This study provides a systematic cross-ethnic evaluation of leading MLLMs on an emotion recognition task to assess their accuracy and consistency across diverse groups. We evaluated three leading MLLMs: ChatGPT-4, ChatGPT-4o, and Claude 3 Opus. Performance was tested twice using three “Reading the Mind in the Eyes Test” (RMET) versions featuring White, Black, and Korean faces. We analyzed accuracy against chance (25%) and compared scores to established human normative data for each ethnic version. ChatGPT-4o achieved performance significantly above chance levels across all tests ( p < .001), with large effect sizes indicating robust performance (Cohen’s h = 1.253–1.619; RD = 0.583–0.694). The model obtained a mean accuracy of 83.3% (30/36) on the White RMET, 94.4% (34/36) on the Black RMET, and 86.1% (31/36) on the Korean RMET, placing it in the 85th, 94th, and 90th percentiles of human norms, respectively. This high accuracy remained consistent across ethnic stimuli. In contrast, ChatGPT-4 performed near the human average, while Claude 3 Opus performed near chance level. These preliminary findings highlight the rapid evolution of MLLMs, highlighting a significant performance leap between consecutive versions. This study suggests that ChatGPT-4o demonstrated performance scores exceeding average human accuracy on this specific task in recognizing complex emotions from static images of the eye region, with its performance remaining consistent across different ethnic groups. While these results are notable, the pronounced performance gaps between models and the inherent limitations of the RMET task underscore the need for continuous validation and careful, ethical consideration to fully understand the capabilities and boundaries of this technology.

Journal Article

Share this book

Add to My Shelf

Prediction of treatment outcome in patients receiving internet-delivered cognitive behavioural therapy for depressive and anxiety symptoms: a machine learning analysis of data from a healthcare-embedded longitudinal study

by Lieb, Roselind , Meinlschmidt, Gunther , Bahmane, Sanaa in Adult , Aged , Anxiety

2025

BackgroundDigital therapeutics (DTx) show promise in bridging mental healthcare gaps. However, treatment selection often relies on availability and trial-and-error, prolonging suffering and increasing costs. Personalised prediction models could help identify individuals benefiting most from specific DTx.ObjectiveThe aim of this secondary analysis was to establish a machine learning-based prediction model for positive treatment outcomes in patients with depressive or anxiety symptoms after 8 weeks of internet-delivered cognitive behavioural therapy (iCBT).MethodsWe analysed a large real-world dataset of patients from the online therapy unit iCBT programme in Saskatchewan, Canada (2013–2021). Clinically significant changes in depressive symptoms or anxiety were measured using the Patient Health Questionnaire-9 (PHQ-9) and the Generalised Anxiety Disorder-7 (GAD-7). We trained six prediction models using sociodemographic and mental health-related factors at baseline, compared model performances and calculated Shapley values for feature importance.FindingsData from 4175 patients using 34 features for prediction, identified by least absolute shrinkage and selection operator regression, showed the Gradient Boosted Model (gbm) and logistic regression (log) performed best, with balanced accuracies of 0.76, 95% CI (0.70 to 0.83) and 0.70, 95% CI (0.63 to 0.77). Shapley values indicated GAD-7 scores at baseline as the most important predictor of clinically significant improvement, along with mental health history and sociodemographic variables.ConclusionsThe gbm and log models achieved comparable accuracy in predicting clinically significant improvement after iCBT, supporting the use of simpler, interpretable methods in clinical practice.Clinical implicationsThese findings could help improve mental health treatment selection, iCBT assignment, enhance effectiveness and optimise treatment for patients.Trial registration numberNCT05758285.

Journal Article

Share this book

Add to My Shelf

Treatment-associated network dynamics in patients with globus sensations: a proof-of-concept study

by Lieb, Roselind , Imperiale, Marina N. , Meinlschmidt, Gunther in 631/477 , 631/477/2811 , 692/700

2023

In this proof-of-concept study, we used a systems perspective to conceptualize and investigate treatment-related dynamics (temporal and cross-sectional associations) of symptoms and elements related to the manifestation of a common functional somatic syndrome (FSS), Globus Sensations (GS). We analyzed data from 100 patients ( M = 47.1 years, SD = 14.4 years; 64% female) with GS who received eight sessions of group psychotherapy in the context of a randomized controlled trial (RCT). Symptoms and elements were assessed after each treatment session. We applied a multilevel graphical vector-autoregression (ml GVAR) model approach resulting in three separate, complementary networks (temporal, contemporaneous, and between-subject) for an affective, cognitive, and behavioral dimension, respectively. GS were not temporally associated with any affective, cognitive, and behavioral elements. Temporally, catastrophizing cognitions predicted bodily weakness (r = 0.14, p < 0.01, 95% confidence interval (CI) [0.04–0.23]) and GS predicted somatic distress (r = 0.18, p < 0.05, 95% CI [0.04–0.33]). Potential causal pathways between catastrophizing cognitions and bodily weakness as well as GS and somatic distress may reflect treatment-related temporal change processes in patients with GS. Our study illustrates how dynamic NA can be used in the context of outcome research.

Journal Article

Share this book

Add to My Shelf

Obsessive–compulsive disorder in the community: 12-month prevalence, comorbidity and impairment

by Lieb, Roselind , Adam, Yuki , Meinlschmidt, Gunther in Adolescent , Adult , Adult and adolescent clinical studies

2012

Background Although subthreshold conditions are associated with impairment in numerous disorders, research on obsessive–compulsive disorder (OCD) below the diagnostic threshold of DSM-IV in the general population is limited. Purpose To estimate the DSM-IV 12-month prevalence, comorbidity and impairment of OCD, subthreshold OCD (i.e., fulfilling some but not all core DSM-IV criteria), and obsessive–compulsive symptoms (OCS) (i.e., endorsement of OCS without fulfilling any core DSM-IV criteria) in a general population sample. Methods Data from the German National Health Interview and Examination Survey–Mental Health Supplement ( N = 4181, age 18–65 years), based on the standardized diagnostic Munich Composite International Diagnostic Interview. Results The 12-month prevalence of OCD was 0.7%, subthreshold OCD was 4.5%, and OCS was 8.3%. Subjects in all three groups showed higher comorbidity (odds ratios [ORs] ≥ 3.3), compared to those without OCS. The OCD, subthreshold OCD and OCS were all associated with increased odds of substance abuse/dependence-, mood-, anxiety- and somatoform disorders, with especially strong associations with possible psychotic disorder (ORs ≥ 4.1) and bipolar disorders (ORs ≥ 4.7). Participants in all three groups showed higher impairment (ORs ≥ 3.1) and health-care utilization (ORs ≥ 2.4), compared to those without OCS, even after controlling for covariates. Conclusions Individuals with subthreshold OCD and OCS, not currently captured by DSM-IV OCD criteria, nevertheless show substantial comorbidity, impairment and health-care utilization. This should be taken into account in future conceptualization and classification of OCD and clinical care.

Journal Article

Share this book

Add to My Shelf

Basel Long COVID Cohort Study (BALCoS): protocol of a prospective cohort study

by Schaefert, Rainer , Meienberg, Andrea , Bopp, Katrin in Anxiety , Biomarkers , Clinics

2025

IntroductionThe recent pandemic caused by SARS-CoV-2 had a profound global impact. While many individuals recovered from COVID-19, some developed long-lasting symptoms that significantly disrupted daily life. The WHO defines this condition as post-COVID-19 condition (PCC). Common symptoms include fatigue, dyspnoea, sleep disturbances and cognitive difficulties. Increasing evidence suggests that PCC is a multifactorial condition, shaped not only by biomedical but also psychological and social factors. This article presents the protocol of the Basel Long COVID Cohort Study (BALCoS), which aims to improve understanding of PCC by capturing clinical, functional and psychosocial aspects through repeated assessments over the course of 1 year.Methods and analysisBALCoS is a prospective, single-site cohort study. Inclusion criteria include either a probable or confirmed history of SARS-CoV-2 infection with persistent symptoms consistent with the WHO definition of PCC, sufficient German language skills and age ≥18 years. At baseline, we collected detailed information on previous SARS-CoV-2 infections, symptom history, reinfections, COVID-19 vaccination status and pre-existing medical conditions. The study includes standardised psychometric assessments, physical performance tests, ecological momentary assessments (EMAs), neurocognitive testing and blood sample collection. Assessments are scheduled at baseline and at 3-month, 6-month and 12-month follow-up. All participants complete psychometric assessments at each time point. Blood samples are only collected at baseline. Neurocognitive testing and physical performance measures are collected at baseline and 12-month follow-up for in-person participants only. Participants who are unable to attend in person complete a remote version of the study, excluding these in-clinic assessments. EMAs are initiated the day after each time point and consist of eight questions over 10 consecutive days. The study is exploratory in nature, with a target sample size of 120 participants. BALCoS is part of the Horizon Europe Long COVID project, a multinational interdisciplinary research consortium integrating mechanistic, clinical and interventional studies.Ethics and disseminationThe study was approved by the Ethics Commission of Northwest and Central Switzerland (BASEC-ID: 2023–00359) and is registered at ClinicalTrials.gov (ID: NCT05781893). All participants provide written informed consent. Study findings will be disseminated through peer-reviewed publications.Trial registeration numberNCT05781893.

Journal Article

Share this book

Add to My Shelf

Stress during Pregnancy and Offspring Pediatric Disease: A National Cohort Study

by Greene, Naomi , Tegethoff, Marion , Olsen, Jørn in Adult , Biological and medical sciences , Child

2011

Identifying risk factors for adverse health outcomes in children is important. The intrauterine environment plays a pivotal role for health and disease across life. We conducted a comprehensive study to determine whether common psychosocial stress during pregnancy is a risk factor for a wide spectrum of pediatric diseases in the offspring. The study was conducted using prospective data in a population-based sample of mothers with live singleton births (n = 66,203; 71.4% of those eligible) from the Danish National Birth Cohort. We estimated the association between maternal stress during pregnancy (classified based on two a priori-defined indicators of common stress forms, life stress and emotional stress) and offspring diseases during childhood (grouped into 16 categories of diagnoses from the International Classification of Diseases, 10th Revision, based on data from national registries), controlling for maternal stress after pregnancy. Median age at end of follow-up was 6.2 (range, 3.6-8.9) years. Life stress (highest compared with lowest quartile) was associated with an increased risk of conditions originating in the perinatal period [odds ratio (OR) = 1.13; 95% confidence interval (CI): 1.06, 1.21] and congenital malformations (OR=1.17; CI: 1.06, 1.28) and of the first diagnosis of infection [hazard ratio (HR) = 1.28; CI: 1.17, 1.39], mental disorders (age 0-2.5 years: HR = 2.03; CI: 1.32, 3.14), and eye (age 0-4.5 years: HR = 1.27; CI: 1.06, 1.53), ear (HR = 1.36; CI: 1.23, 1.51), respiratory (HR = 1.27; CI; 1.19, 1.35), digestive (HR = 1.23; CI: 1.11, 1.37), skin (HR = 1.24; CI: 1.09, 1.43), musculoskeletal (HR = 1.15; CI: 1.01-1.30), and genitourinary diseases (HR = 1.25; CI; 1.08, 1.45). Emotional stress was associated with an increased risk for the first diagnosis of infection (HR = 1.09; CI: 1.01, 1.18) and a decreased risk for the first diagnosis of endocrine (HR = 0.81; CI; 0.67, 0.99), eye (HR = 0.84; CI; 0.71, 0.99), and circulatory diseases (age 0-3 years: HR = 0.63; CI: 0.42, 0.95). Maternal life stress during pregnancy may be a common risk factor for impaired child health. The results suggest new approaches to reduce childhood diseases.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter