Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
108
result(s) for
"Development and Evaluation of Research Methods, Instruments and Tools"
Sort by:
The Impact of Individual Factors on Careless Responding Across Different Mental Disorder Screenings: Cross-Sectional Study
2025
Online questionnaires are widely used for large-scale screening. However, careless responding (CR) from participants can compromise the reliability of screening outcomes. Prior studies have focused on the effects of individual and environmental factors on CR, but the effect of questionnaire type remains underexplored.
This study investigates the individual factors influencing CR in online mental health screening and assesses how the effect of these factors varies across different psychological questionnaires.
This study analyzed data from 24,367 participants across 4 questionnaires (PHQ-9 [Patient Health Questionnaire-9], PSS [Perceived Stress Scale], ISI [Insomnia Severity Index], and GAD-7 [Generalized Anxiety Disorder-7 Scale]). CR was defined as the proportion of items completed in less than 2 seconds per item. We used a multiple linear regression model to examine the effect of individual factors (sex, age, education, smoking, and drinking) on CR across 4 questionnaires. In addition, response times were visualized to identify patterns between careless and careful responders.
Females demonstrate lower levels of CR than males when completing the PHQ-9 (β=-.172, 95% CI -0.104 to -0.089; P<.001), PSS (β=-.234, 95% CI -0.162 to -0.14; P<.001), ISI (β=-.207, 95% CI -0.13 to -0.114; P<.001), and GAD-7 (β=-.177, 95% CI -0.108 to -0.093; P<.001). Older participants demonstrated lower levels of CR on the PHQ-9 (β=-.036, 95% CI -0.007 to -0.003; P<.001), ISI (β=-.036, 95% CI -0.007 to -0.003; P<.001), and GAD-7 (β=-.053, 95% CI -0.009 to -0.005; P<.001), but their age was unrelated to CR on the PSS. Interestingly, compared with participants with an associate-level education, those with a high education (bachelor's, master's, or doctoral degree) demonstrated higher levels of CR, especially those with a master's degree (PHQ-9: β=.098, 95% CI 0.136 to 0.188; P<.001 and GAD-7: β=.091, 95% CI 0.125 to 0.178; P<.001). Smokers exhibited varied patterns, with current smokers demonstrating lower levels of CR on the PHQ-9 (β=-.022, 95% CI -0.064 to -0.016; P=.001) and GAD-7 (β=-.014, 95% CI -0.051 to -0.002; P=.03), whereas occasional smokers demonstrated higher levels of CR on the PSS (β=.019, 95% CI 0.010 to 0.050; P=.003) than nonsmokers. Drinkers demonstrated lower levels of CR than nondrinkers, with the strongest effect among occasional drinkers on the PHQ-9 (β=-.163, 95% CI -0.103 to -0.087; P<.001). Analysis of response times revealed that participants tended to spend less time on PHQ-9 and GAD-7 surveys, and CR on PSS and ISI surveys was characterized by skipping questions.
The effect of individual factors on CR varies across questionnaire types. These findings offer valuable insights for questionnaire designers and administrators, highlighting the need for targeted intervention.
Journal Article
Developing a Framework for Online Review-Based Health Care Service Quality Assessment: Text-Mining Study
by
Sun, Jianshan
,
Zhang, Xue
,
Li, Chenwei
in
Data mining
,
Data Mining - methods
,
Development and Evaluation of Research Methods, Instruments and Tools
2025
With the development of online health care platforms, patient reviews have become an important source for assessing medical service quality. However, the critical aspects of quality dimensions in textual reviews remain largely unexplored.
This study aims to establish a comprehensive medical service quality assessment framework by leveraging online review data. Such a framework would support large service providers, such as online platforms, to assess the quality of many doctors efficiently.
We adopted a text-mining approach with theory-driven topic extraction from online reviews to develop a service quality assessment framework. The framework is based on topic and sentiment classification methods. We conducted an empirical analysis to assess the validity of the framework. Specifically, we examined if patients' sentiments regarding our extracted dimensions affect demand (number of consultation requests) due to quality signals reflected in these dimensions.
We develop a 5-dimensional health care service quality framework (HSQ-5D model). In the empirical study, patient demand is affected by these dimensions, including expertise (coefficient=1.12; P<.001), service delivery process (coefficient=5.60; P<.001), attitude (coefficient=0.82; P<.001), empathy (coefficient=2.65; P<.001), and outcome (coefficient=0.26; P<.001; through patients' perceived quality from reviews). The 5 dimensions can explain 85.52% of the variance in patient demand, while all information from online reviews can explain 85.67%. The results show the validity and the potential practical value of the proposed HSQ-5D model.
This study explores how online reviews can be used to evaluate health care services, offering significant implications for health care management. Theoretically, we extend existing service quality frameworks by integrating text-mining analysis of online reviews, thereby enhancing the understanding of service quality assessment in the digital health context. Practically, the framework can allow health care platforms to identify and reveal doctors' service quality to reduce patients' information asymmetry and strengthen patient-provider relationships, ultimately contributing to a more effective and patient-centered health care system.
Journal Article
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics
by
Jiang, Liming
,
Sun, Luning
,
Wang, Xiting
in
Accuracy
,
Alternative approaches
,
Applications of AI
2025
Rigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot explain how GMAI might fail (lacking explanatory power) or in what circumstances (lacking predictive power). To address these limitations, we propose a new methodology to improve the quality of GMAI evaluation using construct-oriented processes. Drawing on modern psychometric techniques, we introduce approaches to construct identification and present alternative assessment formats for different domains of professional skills, knowledge, and behaviors that are essential for safe practice. We also discuss the need for human oversight in future GMAI adoption.
Journal Article
Developing an Evaluation Index System for Service Capability of Internet Hospitals in China: Mixed Methods Study
by
Xia, Mingge
,
Li, Min
,
Ma, Li
in
Adoption and Change Management of eHealth Systems
,
China
,
Contract manufacturing
2025
Rapid advancements in web-based technology have significantly transformed the health care landscape. In China, internet hospitals have emerged as vital components of the health care system. This rapid growth highlights the necessity for a thorough evaluation of internet hospitals within the health care system, as they operate under models distinct from traditional health care settings.
This study aimed to identify critical indicators that reflect the service capabilities of internet hospitals and to establish a comprehensive evaluation index system for their assessment.
This study initially compiled a pool of indicators through literature review and expert consultation. The final evaluation index system was established using the Delphi method, involving 2 rounds of expert consultation, and the index weights were determined using the Analytic Hierarchy Process.
In total, 21 experts from relevant fields, such as hospital management, clinical services, and health information management, were enrolled in the consultation. After 2 rounds of Delphi consultation, the experts' positive coefficients were 95.45% and 100%, and the authoritative coefficients were both >0.7. The final evaluation index system for the service capabilities of internet hospitals contained 3 first-level indicators, 9 second-level indicators, and 29 third-level indicators. The first-level indicators were categorized into 3 dimensions: \"Internet hospital infrastructure,\" \"Internet hospital services,\" and \"Internet hospital management.\" \"Internet hospital infrastructure\" encompasses the essential conditions for service delivery, such as hardware and software resources, human capital, information security, and payment systems. \"Internet hospital services\" focuses on the scope and depth of services offered, such as \"online medical services,\" \"online pharmaceutical services,\" and \"collaborative services.\" Finally, \"Internet hospital management\" is divided into \"medical administration\" and \"general management.\" Weights were assigned to each indicator using the Analytic Hierarchy Process, revealing that \"Internet hospital services\" held the highest importance (0.573) among 3 first-level indicators, followed by \"Internet hospital infrastructure\" (0.239) and \"Internet hospital management\" (0.188). Among the second-level indicators, \"Online medical service\" emerged as the most critical (0.344), followed by \"Medical administration\" (0.140), \"Online pharmaceutical service\" (0.119), \"Collaboration service\" (0.110), and \"Information security\" (0.087). Among the third-level indicators, \"Online health consultation\" (0.092) had the highest weight, followed by \"Online chronic disease management\" (0.080), \"Online pharmaceutical consultation\" (0.076), \"Consistency between online medical service and offline medical service\" (0.071), and \"Medical quality management\" (0.071).
This study identified and established a comprehensive evaluation index system for assessing the service capabilities of internet hospitals in China. The resulting index system not only provides a valuable tool for evaluating and improving service delivery in internet hospitals but also serves as a foundation for future studies in this rapidly evolving field.
Journal Article
Developing Requirements for a Digital Self-Care Intervention for Adults With Heart Failure: Qualitative Workshop Study
2025
Heart failure is a complex syndrome that requires long-term management, including self-care, to prevent decompensation and hospitalization. Although a range of interventions exists, evidence supporting their effectiveness remains limited, prompting calls for more theory-informed and user-centered approaches. The rapid advancement of mobile and digital technologies offers new opportunities to improve self-care, particularly when interventions are grounded in behavioral theory and shaped by user input.
This study aimed to define user-centered, theory-informed requirements for a digital intervention to support self-care among people with heart failure. We combined the Behavior Change Wheel (BCW) with user-centered design (UCD) to explore self-care barriers and generate actionable intervention requirements.
A qualitative study was conducted, involving 4 workshops with people with heart failure (n=16) and informal caregivers (n=4) across metropolitan and regional Australia. Guided by UCD principles, the workshops explored self-care barriers and elicited ideas for a digital intervention. Barriers were coded using the capability, opportunity, motivation, and behavior (COM-B) model and the Theoretical Domains Framework to identify behavioral determinants and user needs. Ideas and preferences for the intervention were analyzed using requirements analysis and affinity mapping to generate themes describing intervention components (\"what\" the system should do) and user requirements (\"how\" it should operate). Intervention components were then mapped to relevant BCW intervention functions.
Participants identified self-care barriers across all 3 COM-B components and 11 of 14 Theoretical Domains Framework domains, including barriers related to capability (eg, lack of knowledge and forgetfulness), opportunity (eg, busy lifestyle and limited access to resources), and motivation (eg, emotional burden and lack of confidence). These were translated into 28 distinct user needs. From participants' ideas, 6 themes relating to intervention components were identified: education, monitoring and feedback, social connection and support, psychological and emotional support, planning and preparing, and health care support. These components mapped to 7 BCW functions: education, persuasion, incentivization, training, environmental restructuring, modeling, and enablement. Additionally, 6 user requirement themes were developed: physical design, accessibility and usability, personalization and control, engagement and user experience, support and implementation, and integration and system organization.
This study demonstrates the value of integrating UCD with the BCW to develop intervention requirements that are both user-centered and theoretically grounded. By exploring both what the intervention should do and how it should do it, we identified actionable requirements that bridge the gap between understanding behavior and developing effective solutions. Future work can focus on translating these requirements into prototype interventions and evaluating their feasibility, acceptability, and effectiveness.
Journal Article
Comparison of 3 Aging Metrics in Dual Declines to Capture All-Cause Dementia and Mortality Risk: Cohort Study
2025
The utility of aging metrics that incorporate cognitive and physical function is not fully understood.
We aim to compare the predictive capacities of 3 distinct aging metrics-motoric cognitive risk syndrome (MCR), physio-cognitive decline syndrome (PCDS), and cognitive frailty (CF)-for incident dementia and all-cause mortality among community-dwelling older adults.
We used longitudinal data from waves 10-15 of the Health and Retirement Study. Cox proportional hazards regression analysis was employed to evaluate the effects of MCR, PCDS, and CF on incident all-cause dementia and mortality, controlling for socioeconomic and lifestyle factors, as well as medical comorbidities. Discrimination analysis was conducted to assess and compare the predictive accuracy of the 3 aging metrics.
A total of 2367 older individuals aged 65 years and older, with no baseline prevalence of dementia or disability, were ultimately included. The prevalence rates of MCR, PCDS, and CF were 5.4%, 6.3%, and 1.3%, respectively. Over a decade-long follow-up period, 341 cases of dementia and 573 deaths were recorded. All 3 metrics were predictive of incident all-cause dementia and mortality when adjusting for multiple confounders, with variations in the strength of their associations (incident dementia: MCR odds ratio [OR] 1.90, 95% CI 1.30-2.78; CF 5.06, 95% CI 2.87-8.92; PCDS 3.35, 95% CI 2.44-4.58; mortality: MCR 1.60, 95% CI 1.17-2.19; CF 3.26, 95% CI 1.99-5.33; and PCDS 1.58, 95% CI 1.17-2.13). The C-index indicated that PCDS and MCR had the highest discriminatory accuracy for all-cause dementia and mortality, respectively.
Despite the inherent differences among the aging metrics that integrate cognitive and physical functions, they consistently identified risks of dementia and mortality. This underscores the importance of implementing targeted preventive strategies and intervention programs based on these metrics to enhance the overall quality of life and reduce premature deaths in aging populations.
Journal Article
Evidence-Based Learning Strategies in Medicine Using AI
by
Posso-Nuñez, Jose Alejandro
,
Arango-Ibanez, Juan Pablo
,
Cruz-Suárez, Gustavo
in
Artificial Intelligence
,
Artificial Intelligence (AI) in Medical Education
,
Development and Evaluation of Research Methods, Instruments and Tools
2024
Large language models (LLMs), like ChatGPT, are transforming the landscape of medical education. They offer a vast range of applications, such as tutoring (personalized learning), patient simulation, generation of examination questions, and streamlined access to information. The rapid advancement of medical knowledge and the need for personalized learning underscore the relevance and timeliness of exploring innovative strategies for integrating artificial intelligence (AI) into medical education. In this paper, we propose coupling evidence-based learning strategies, such as active recall and memory cues, with AI to optimize learning. These strategies include the generation of tests, mnemonics, and visual cues.
Journal Article
Leveraging Large Language Models and Agent-Based Systems for Scientific Data Analysis: Validation Study
by
Kuplicki, Rayus
,
Sen, Sandip
,
Peasley, Dale
in
Application programming interface
,
Artificial Intelligence
,
Big Data
2025
Large language models have shown promise in transforming how complex scientific data are analyzed and communicated, yet their application to scientific domains remains challenged by issues of factual accuracy and domain-specific precision. The Laureate Institute for Brain Research-Tulsa University (LIBR-TU) Research Agent (LITURAt) leverages a sophisticated agent-based architecture to mitigate these limitations, using external data retrieval and analysis tools to ensure reliable, context-aware outputs that make scientific information accessible to both experts and nonexperts.
The objective of this study was to develop and evaluate LITURAt to enable efficient analysis and contextualization of complex scientific datasets for diverse user expertise levels.
An agent-based system based on large language models was designed to analyze and contextualize complex scientific datasets using a \"plan-and-solve\" framework. The system dynamically retrieves local data and relevant PubMed literature, performs statistical analyses, and generates comprehensive, context-aware summaries to answer user queries with high accuracy and consistency.
Our experiments demonstrated that LITURAt achieved an internal consistency rate of 94.8% and an external consistency rate of 91.9% across repeated and rephrased queries. Additionally, GPT-4 evaluations rated 80.3% (171/213) of the system's answers as accurate and comprehensive, with 23.5% (50/213) receiving the highest rating of 5 for completeness and precision.
These findings highlight the potential of LITURAt to significantly enhance the accessibility and accuracy of scientific data analysis, achieving high consistency and strong performance in complex query resolution. Despite existing limitations, such as model stability for highly variable queries, LITURAt demonstrates promise as a robust tool for democratizing data-driven insights across diverse scientific domains.
Journal Article
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study
by
Kaczmarczyk, Robert
,
Martin, Ron
,
Roos, Jonas
in
Artificial Intelligence
,
Artificial Intelligence (AI) in Medical Education
,
Case reports
2024
The rapid development of large language models (LLMs) such as OpenAI's ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities.
This study aims to critically examine the effectiveness of these LLMs in medical diagnostics and training by assessing their accuracy and utility in answering image-based questions from medical licensing examinations.
This study analyzed 1070 image-based multiple-choice questions from the AMBOSS learning platform, divided into 605 in English and 465 in German. Customized prompts in both languages directed the models to interpret medical images and provide the most likely diagnosis. Student performance data were obtained from AMBOSS, including metrics such as the \"student passed mean\" and \"majority vote.\" Statistical analysis was conducted using Python (Python Software Foundation), with key libraries for data manipulation and visualization.
GPT-4 1106 Vision Preview (OpenAI) outperformed Bard Gemini Pro (Google), correctly answering 56.9% (609/1070) of questions compared to Bard's 44.6% (477/1070), a statistically significant difference (χ2₁=32.1, P<.001). However, GPT-4 1106 left 16.1% (172/1070) of questions unanswered, significantly higher than Bard's 4.1% (44/1070; χ2₁=83.1, P<.001). When considering only answered questions, GPT-4 1106's accuracy increased to 67.8% (609/898), surpassing both Bard (477/1026, 46.5%; χ2₁=87.7, P<.001) and the student passed mean of 63% (674/1070, SE 1.48%; χ2₁=4.8, P=.03). Language-specific analysis revealed both models performed better in German than English, with GPT-4 1106 showing greater accuracy in German (282/465, 60.65% vs 327/605, 54.1%; χ2₁=4.4, P=.04) and Bard Gemini Pro exhibiting a similar trend (255/465, 54.8% vs 222/605, 36.7%; χ2₁=34.3, P<.001). The student majority vote achieved an overall accuracy of 94.5% (1011/1070), significantly outperforming both artificial intelligence models (GPT-4 1106: χ2₁=408.5, P<.001; Bard Gemini Pro: χ2₁=626.6, P<.001).
Our study shows that GPT-4 1106 Vision Preview and Bard Gemini Pro have potential in medical visual question-answering tasks and to serve as a support for students. However, their performance varies depending on the language used, with a preference for German. They also have limitations in responding to non-English content. The accuracy rates, particularly when compared to student responses, highlight the potential of these models in medical education, yet the need for further optimization and understanding of their limitations in diverse linguistic contexts remains critical.
Journal Article
Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study
2025
Postpartum depression (PPD) is a prevalent mental health issue with significant impacts on mothers and families. Exploring reliable predictors is crucial for the early and accurate prediction of PPD, which remains challenging.
This study aimed to comprehensively collect variables from multiple aspects, develop and validate machine learning models to achieve precise prediction of PPD, and interpret the model to reveal clinical implications.
This study recruited pregnant women who delivered at the West China Second University Hospital, Sichuan University. Various variables were collected from electronic medical record data and screened using least absolute shrinkage and selection operator penalty regression. Participants were divided into training (1358/2055, 66.1%) and validation (697/2055, 33.9%) sets by random sampling. Machine learning-based predictive models were developed in the training cohort. Models were validated in the validation cohort with receiver operating curve and decision curve analysis. Multiple model interpretation methods were implemented to explain the optimal model.
We recruited 2055 participants in this study. The extreme gradient boosting model was the optimal predictive model with the area under the receiver operating curve of 0.849. Shapley Additive Explanation indicated that the most influential predictors of PPD were antepartum depression, lower fetal weight, elevated thyroid-stimulating hormone, declined thyroid peroxidase antibodies, elevated serum ferritin, and older age.
This study developed and validated a machine learning-based predictive model for PPD. Several significant risk factors and how they impact the prediction of PPD were revealed. These findings provide new insights into the early screening of individuals with high risk for PPD, emphasizing the need for comprehensive screening approaches that include both physiological and psychological factors.
Journal Article