Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
519
result(s) for
"Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health"
Sort by:
Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study
by
Zhou, Huixue
,
Xiao, Yongkang
,
Yang, Han
in
AI Language Models in Health Care
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
,
Care and treatment
2025
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including medical question-answering (QA). However, individual LLMs often exhibit varying performance across different medical QA datasets. We benchmarked individual zero-shot LLMs (GPT-4, Llama2-13B, Vicuna-13B, MedLlama-13B, and MedAlpaca-13B) to assess their baseline performance. Within the benchmark, GPT-4 achieves the best 71% on MedMCQA (medical multiple-choice question answering dataset), Vicuna-13B achieves 89.5% on PubMedQA (a dataset for biomedical question answering), and MedAlpaca-13B achieves the best 70% among all, showing the potential for better performance across different tasks and highlighting the need for strategies that can harness their collective strengths. Ensemble learning methods, combining multiple models to improve overall accuracy and reliability, offer a promising approach to address this challenge.
To develop and evaluate efficient ensemble learning approaches, we focus on improving performance across 3 medical QA datasets through our proposed two ensemble strategies.
Our study uses 3 medical QA datasets: PubMedQA (1000 manually labeled and 11,269 test, with yes, no, or maybe answered for each question), MedQA-USMLE (Medical Question Answering dataset based on the United States Medical Licensing Examination; 12,724 English board-style questions; 1272 test, 5 options), and MedMCQA (182,822 training/4183 test questions, 4-option multiple choice). We introduced the LLM-Synergy framework, consisting of two ensemble methods: (1) a Boosting-based Weighted Majority Vote ensemble, refining decision-making by adaptively weighting each LLM and (2) a Cluster-based Dynamic Model Selection ensemble, dynamically selecting optimal LLMs for each query based on question-context embeddings and clustering.
Both ensemble methods outperformed individual LLMs across all 3 datasets. Specifically comparing the best individual LLM, the Boosting-based Majority Weighted Vote achieved accuracies of 35.84% on MedMCQA (+3.81%), 96.21% on PubMedQA (+0.64%), and 37.26% (tie) on MedQA-USMLE. The Cluster-based Dynamic Model Selection yields even higher accuracies of 38.01% (+5.98%) for MedMCQA, 96.36% (+1.09%) for PubMedQA, and 38.13% (+0.87%) for MedQA-USMLE.
The LLM-Synergy framework, using 2 ensemble methods, represents a significant advancement in leveraging LLMs for medical QA tasks. Through effectively combining the strengths of diverse LLMs, this framework provides a flexible and efficient strategy adaptable to current and future challenges in biomedical informatics.
Journal Article
Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction
by
Zhu, Jianfeng
,
Maharjan, Julina
,
Kenne, Deric
in
Analysis
,
Artificial Intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
2025
Recent advancements in large language models (LLMs) have generated significant interest in their potential for assessing psychological constructs, particularly personality traits. While prior research has explored LLMs' capabilities in zero-shot or few-shot personality inference, few studies have systematically evaluated LLM embeddings within a psychometric validity framework or examined their correlations with linguistic and emotional markers. Additionally, the comparative efficacy of LLM embeddings against traditional feature engineering methods remains underexplored, leaving gaps in understanding their scalability and interpretability for computational personality assessment.
This study evaluates LLM embeddings for personality trait prediction through four key analyses: (1) performance comparison with zero-shot methods on PANDORA Reddit data, (2) psychometric validation and correlation with LIWC (Linguistic Inquiry and Word Count) and emotion features, (3) benchmarking against traditional feature engineering approaches, and (4) assessment of model size effects (OpenAI vs BERT vs RoBERTa). We aim to establish LLM embeddings as a psychometrically valid and efficient alternative for personality assessment.
We conducted a multistage analysis using 1 million Reddit posts from the PANDORA Big Five personality dataset. First, we generated text embeddings using 3 LLM architectures (RoBERTa, BERT, and OpenAI) and trained a custom bidirectional long short-term memory model for personality prediction. We compared this approach against zero-shot inference using prompt-based methods. Second, we extracted psycholinguistic features (LIWC categories and National Research Council emotions) and performed feature engineering to evaluate potential performance enhancements. Third, we assessed the psychometric validity of LLM embeddings: reliability validity using Cronbach α and convergent validity analysis by examining correlations between embeddings and established linguistic markers. Finally, we performed traditional feature engineering on static psycholinguistic features to assess performance under different settings.
LLM embeddings trained using simple deep learning techniques significantly outperform zero-shot approaches on average by 45% across all personality traits. Although psychometric validation tests indicate moderate reliability, with an average Cronbach α of 0.63, correlation analyses spark a strong association with key linguistic or emotional markers; openness correlates highly with social (r=0.53), conscientiousness with linguistic (r=0.46), extraversion with social (r=0.41), agreeableness with pronoun usage (r=0.40), and neuroticism with politics-related text (r=0.63). Despite adding advanced feature engineering on linguistic features, the performance did not improve, suggesting that LLM embeddings inherently capture key linguistic features. Furthermore, our analyses demonstrated efficacy on larger model size with a computational cost trade-off.
Our findings demonstrate that LLM embeddings offer a robust alternative to zero-shot methods in personality trait analysis, capturing key linguistic patterns without requiring extensive feature engineering. The correlation between established psycholinguistic markers and the performance trade-off with computational cost provides a hint for future computational linguistic work targeting LLM for personality assessment. Further research should explore fine-tuning strategies to enhance psychometric validity.
Journal Article
Artificial Intelligence in Health Promotion and Disease Reduction: Rapid Review
by
Ouellet, Steven
,
Ozkan, Marianne
,
Sasseville, Maxime
in
Artificial Intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
,
Chronic Disease - prevention & control
2025
Chronic diseases represent a significant global burden of mortality, exacerbated by behavioral risk factors. Artificial intelligence (AI) has transformed health promotion and disease reduction through improved early detection, encouraging healthy lifestyle modifications, and mitigating the economic strain on health systems.
The aim of this study is to investigate how AI contributes to health promotion and disease reduction among Organization for Economic Co-operation and Development countries.
We conducted a rapid review of the literature to identify the latest evidence on how AI is used in health promotion and disease reduction. We applied comprehensive search strategies formulated for MEDLINE (OVID) and CINAHL to locate studies published between 2019 and 2024. A pair of reviewers independently applied the inclusion and exclusion criteria to screen the titles and abstracts, assess the full texts, and extract the data. We synthesized extracted data from the study characteristics, intervention characteristics, and intervention purpose using structured narrative summaries of main themes, giving a portrait of the current scope of available AI initiatives used in promoting healthy activities and preventing disease.
We included 22 studies in this review (out of 3442 publications screened), most of which were conducted in the United States (10/22, 45%) and focused on health promotion by targeting lifestyle dimensions, such as dietary behavior (10/22, 45%), smoking cessation (6/22, 27%), physical activity (4/22, 18%), and mental health (3/22, 14%). Three studies targeted disease reduction related to metabolic health (eg, obesity, diabetes, hypertension). Most AI initiatives were AI-powered mobile apps. Overall, positive results were reported for process outcomes (eg, acceptability, engagement), cognitive and behavioral outcomes (eg, confidence, step count), and health outcomes (eg, glycemia, blood pressure). We categorized the challenges, benefits, and suggestions identified in the studies using a Strengths, Weaknesses, Opportunities, and Threats analysis to inform future developments. Key recommendations include conducting further investigations, taking into account the needs of end users, improving the technical aspect of the technology, and allocating resources.
These findings offer critical insights into the effective implementation of AI for health promotion and disease prevention, potentially guiding policymakers and health care practitioners in optimizing the use of AI technologies in supporting health promotion and disease reduction.
Journal Article
AI and Machine Learning Terminology in Medicine, Psychology, and Social Sciences: Tutorial and Practical Recommendations
by
Sui, Jie
,
Cao, Bo
,
Greenshaw, Andrew
in
AI Language Models in Health Care
,
Artificial Intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
2025
Recent applications of artificial intelligence (AI) and machine learning in medicine, psychology, and social sciences have led to common terminological confusions. In this paper, we review emerging evidence from systematic reviews documenting widespread misuse of key terms, particularly “prediction” being applied to studies merely demonstrating association or retrospective analysis. We clarify when “prediction” should be used and recommend using “prospective prediction” for future prediction; explain validation procedures essential for model generalizability; discuss overfitting and generalization in machine learning and traditional regression methods; clarify relationships between features, independent variables, predictors, risk factors, and causal factors; and clarify the hierarchical relationship between AI, machine learning, deep learning, large language models, and generative AI. We provide evidence-based recommendations for terminology use that can facilitate clearer communication among researchers from different disciplines and between the research community and the public, ultimately advancing the rigorous application of AI in medicine, psychology, and social sciences.
Journal Article
Public Medical Appeals and Government Online Responses: Big Data Analysis Based on Chinese Digital Governance Platforms
by
Liu, Zhihan
,
Zhang, Ziyan
,
Li, Hebin
in
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
,
Big Data
,
China
2025
In the era of internet-based governance, online public appeals-particularly those related to health care-have emerged as a crucial channel through which citizens articulate their needs and concerns.
This study aims to investigate the thematic structure, emotional tone, and underlying logic of governmental responses related to public medical appeals in China.
We collected messages posted on the \"Message Board for Leaders\" hosted by People's Daily Online between January 2022 and November 2023 to identify valid medical appeals for analysis. (1) Key themes of public appeals were identified using the term frequency-inverse document frequency model for feature word extraction, followed by hierarchical cluster analysis. (2) Sentiment classification was conducted using supervised machine learning, with additional validation through sentiment scores derived from a lexicon-based approach. (3) A binary logistic regression model was employed to examine the influence of textual, transactional, and macro-environmental factors on the likelihood of receiving a government response. Robustness was tested using a Probit model.
From a total of 404,428 online appeals, 8864 valid public medical messages were retained after filtering. These primarily concerned pandemic control, fertility policies, health care institutions, and insurance issues. Negative sentiment predominated across message types, accounting for 3328 out of 3877 (85.84%) complaints/help-seeking messages, 1666 out of 2381 (69.97%) consultation messages, and 1710 out of 2606 (65.62%) suggestions. Regression analysis revealed that textual features, issue complexity, and benefit attribution were not significantly associated with government responsiveness. Specifically, for textual features, taking the epidemic issue as the reference category in the appeal theme, the P values were as follows: fertility issue (P=.63), hospital issue (P=.63), security issue (P=.72), and other issues (P=.34). Other textual features include appeal content (P=.80), appeal sentiment (P=.64), and appeal title (P=.55). Regarding the difficulty of resolving incidents, with low difficulty as the reference category, the P values were moderate difficulty (P=.59) and high difficulty (P=.96). For benefits attribution, using individual interest as the reference, collective interest (P=.25) was not statistically significant. By contrast, macro-level factors-specifically internet penetration, education, economic development, and labor union strength-had significant effects. Compared with areas with lower levels, higher internet penetration (odds ratio 1.577-9.930, P=.004 to <.001), education (odds ratio 2.497, P<.001), and gross domestic product (odds ratio 2.599, P<.001) were associated with increased responsiveness. Conversely, medium (odds ratio 0.565, P<.001) and high (odds ratio 0.116, P<.001) levels of labor union development were linked to lower response odds.
Public medical appeals exhibit 5 defining characteristics: urgency induced by pandemic conditions, connections to fertility policy reforms, tensions between the efficacy and costs of medical services, challenges related to cross-regional insurance coverage, and a predominance of negative sentiment. The findings indicate that textual features and issue-specific content exert limited influence on government responsiveness, likely due to the politically sensitive and complex nature of health care-related topics. Instead, macro-level environmental factors emerge as key determinants. These insights can inform the optimization of response mechanisms on digital health platforms and offer valuable theoretical and empirical contributions to the advancement of health information dissemination and digital governance within the health care sector.
Journal Article
Decoding HIV Discourse on Social Media: Large-Scale Analysis of 191,972 Tweets Using Machine Learning, Topic Modeling, and Temporal Analysis
by
Song, Meijia
,
Zhan, Xiangming
,
Shrader, Cho Hee
in
Analysis
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
,
Behavioral Surveillance for Population and Public Health Informatics
2025
HIV remains a global challenge, with stigma, financial constraints, and psychosocial barriers preventing people living with HIV from accessing health care services, driving them to seek information and support on social media. Despite the growing role of digital platforms in health communication, existing research often narrowly focuses on specific HIV-related topics rather than offering a broader landscape of thematic patterns. In addition, much of the existing research lacks large-scale analysis and predominantly predates COVID-19 and the platform's transition to X (formerly known as Twitter), limiting our understanding of the comprehensive, dynamic, and postpandemic HIV-related discourse.
This study aims to (1) observe the dominant themes in current HIV-related social media discourse, (2) explore similarities and differences between theory-driven (eg, literature-informed predetermined categories) and data-driven themes (eg, unsupervised Latent Dirichlet Allocation [LDA] without previous categorization), and (3) examine how emotional responses and temporal patterns influence the dissemination of HIV-related content.
We analyzed 191,972 tweets collected between June 2023 and August 2024 using an integrated analytical framework. This approach combined: (1) supervised machine learning for text classification, (2) comparative topic modeling with both theory-driven and data-driven LDA to identify thematic patterns, (3) sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) and the NRC Emotion Lexicon to examine emotional dimensions, and (4) temporal trend analysis to track engagement patterns.
Theory-driven themes revealed that information and education content constituted the majority of HIV-related discourse (120,985/191,972, 63.02%), followed by opinions and commentary (23,863/191,972, 12.43%), and personal experiences and stories (19,672/191,972, 10.25%). The data-driven approach identified 8 distinct themes, some of which shared similarities with aspects from the theory-driven approach, while others were unique. Temporal analysis revealed 2 different engagement patterns: official awareness campaigns like World AIDS Day generated delayed peak engagement through top-down information sharing, while community-driven events like National HIV Testing Day showed immediate user engagement through peer-to-peer interactions.
HIV-related social media discourse on X reflects the dominance of informational content, the emergence of prevention as a distinct thematic focus, and the varying effectiveness of different timing patterns in HIV-related messaging. These findings suggest that effective HIV communication strategies can integrate medical information with community perspectives, maintain balanced content focus, and strategically time messages to maximize engagement. These insights provide valuable guidance for developing digital outreach strategies that better connect healthcare services with vulnerable populations in the post-COVID-19 pandemic era.
Journal Article
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians
by
Moëll, Birger
,
Sand Aronsson, Fredrik
in
Analysis
,
artificial intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
2025
The integration of large language models (LLMs) into health care presents significant risks to patients and clinicians, inadequately addressed by current guidance. This paper adapts harm reduction principles from public health to medical LLMs, proposing a structured framework for mitigating these domain-specific risks while maximizing ethical utility. We outline tailored strategies for patients, emphasizing critical health literacy and output verification, and for clinicians, enforcing “human-in-the-loop” validation and bias-aware workflows. Key innovations include developing thoughtful use protocols that position LLMs as assistive tools requiring mandatory verification, establishing actionable institutional policies with risk-stratified deployment guidelines and patient disclaimers, and critically analyzing underaddressed regulatory, equity, and safety challenges. This research moves beyond theory to offer a practical roadmap, enabling stakeholders to ethically harness LLMs, balance innovation with accountability, and preserve core medical values: patient safety, equity, and trust in high-stakes health care settings.
Journal Article
Youth Perspectives on Generative AI and Its Use in Health Care
by
Frank, Abby
,
Wong, Andrew
,
Bains, Manvir
in
Adolescent
,
Artificial Intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
2025
A nationwide survey of youth aged 14 to 24 years on generative artificial intelligence (GAI) found that many youths are wary about the use of GAI in health care, suggesting that health professionals should acknowledge concerns about AI health tools and address them with adolescent patients as they become more pervasive.
Journal Article
Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models
by
Zhong, Wei
,
Liu, Yan
,
Yan, YouSheng
in
Accuracy
,
AI Language Models in Health Care
,
Artificial Intelligence
2025
Identifying Human Phenotype Ontology (HPO) terms is crucial for diagnosing and managing rare diseases. However, clinicians, especially junior physicians, often face challenges due to the complexity of describing patient phenotypes accurately. Traditional manual search methods using HPO databases are time-consuming and prone to errors.
The aim of the study is to investigate whether the use of multimodal large language models (MLLMs) can improve the accuracy of junior physicians in identifying HPO terms from patient images related to rare diseases.
In total, 20 junior physicians from 10 specialties participated. Each physician evaluated 27 patient images sourced from publicly available literature, with phenotypes relevant to rare diseases listed in the Chinese Rare Disease Catalogue. The study was divided into 2 groups: the manual search group relied on the Chinese Human Phenotype Ontology website, while the MLLM-assisted group used an electronic questionnaire that included HPO terms preidentified by ChatGPT-4o as prompts, followed by a search using the Chinese Human Phenotype Ontology. The primary outcome was the accuracy of HPO identification, defined as the proportion of correctly identified HPO terms compared to a standard set determined by an expert panel. Additionally, the accuracy of outputs from ChatGPT-4o and 2 open-source MLLMs (Llama3.2:11b and Llama3.2:90b) was evaluated using the same criteria, with hallucinations for each model documented separately. Furthermore, participating physicians completed an additional electronic questionnaire regarding their rare disease background to identify factors affecting their ability to accurately describe patient images using standardized HPO terms.
A total of 270 descriptions were evaluated per group. The MLLM-assisted group achieved a significantly higher accuracy rate of 67.4% (182/270) compared to 20.4% (55/270) in the manual group (relative risk 3.31, 95% CI 2.58-4.25; P<.001). The MLLM-assisted group demonstrated consistent performance across departments, whereas the manual group exhibited greater variability. Among standalone MLLMs, ChatGPT-4o achieved an accuracy of 48% (13/27), while the open-source models Llama3.2:11b and Llama3.2:90b achieved 15% (4/27) and 18% (5/27), respectively. However, MLLMs exhibited a high hallucination rate, frequently generating HPO terms with incorrect IDs or entirely fabricated content. Specifically, ChatGPT-4o, Llama3.2:11b, and Llama3.2:90b generated incorrect IDs in 57.3% (67/117), 98% (62/63), and 82% (46/56) of cases, respectively, and fabricated terms in 34.2% (40/117), 41% (26/63), and 32% (18/56) of cases, respectively. Additionally, a survey on the rare disease knowledge of junior physicians suggests that participation in rare disease and genetic disease training may enhance the performance of some physicians.
The integration of MLLMs into clinical workflows significantly enhances the accuracy of HPO identification by junior physicians, offering promising potential to improve the diagnosis of rare diseases and standardize phenotype descriptions in medical research. However, the notable hallucination rate observed in MLLMs underscores the necessity for further refinement and rigorous validation before widespread adoption in clinical practice.
Journal Article
AI Governance: A Challenge for Public Health
by
Doerr, Megan
,
Wagner, Jennifer K
,
Schmit, Cason D
in
Artificial Intelligence
,
Artificial Intelligence, Machine Learning, and Natural Language Processing for Public Health
,
Bias
2024
The rapid evolution of artificial intelligence (AI) is structuralizing social, political, and economic determinants of health into the invisible algorithms that shape all facets of modern life. Nevertheless, AI holds immense potential as a public health tool, enabling beneficial objectives such as precision public health and medicine. Developing an AI governance framework that can maximize the benefits and minimize the risks of AI is a significant challenge. The benefits of public health engagement in AI governance could be extensive. Here, we describe how several public health concepts can enhance AI governance. Specifically, we explain how (1) harm reduction can provide a framework for navigating the governance debate between traditional regulation and “soft law” approaches; (2) a public health understanding of social determinants of health is crucial to optimally weigh the potential risks and benefits of AI; (3) public health ethics provides a toolset for guiding governance decisions where individual interests intersect with collective interests; and (4) a One Health approach can improve AI governance effectiveness while advancing public health outcomes. Public health theories, perspectives, and innovations could substantially enrich and improve AI governance, creating a more equitable and socially beneficial path for AI development.
Journal Article