Catalogue Search | MBRL

363 The art and science of data navigation for translational research

by Amburgey, Vanessa , Facelli, Julio , Gouripeddi, Ram in Informatics , Informatics, AI and Data Science , Population studies

2025

Objectives/Goals: Translational researchers spend significant amounts of time finding available datasets and other research data resources for their purposes. Objectives of this program are develop and evaluate a multipronged approach to supporting researchers with existing data resources. Methods/Study Population: We established a dedicated service with expertise in data resources to increase awareness, understanding, and utilization of existing data resources. This program assists investigators and trainees discover appropriate data resources, formulate scientific problems in computable formats, advise on state-of-the-art data analytics, data management, build collaborations, mentor data users, and develop a service pipeline for streamlined data resource project management. This is accomplished through these essential functions: (1) Discover, catalog, document, and manage metadata resources, (2) train and present data resources to the research community, (3) provide individual consultations, and (4) explore and assess novel data resources. Results/Anticipated Results: In a phased approach, the data navigation program is performing outreach to the research community and integrating with existing data efforts on campus, presenting and demonstrating existing data resources, established a consultation service, and building core competencies into long-term usage and navigation of resources across campus. Evaluating the program monthly has shown an increase in various metrics for evaluating commitment and engagement including number of requests for access to data resource, consultations, publications and presentations, co-authorship, and proposals. Unawareness and inappropriate use of data resources leads to delays in performing research and potentially unnecessary duplications of efforts. Discussion/Significance of Impact: Our data navigation program has increased use of data resources in research. Next steps are to continue evaluation and further streamline informatics approaches to data discovery, abstraction, formulation, and analysis. Harmonized data resource programs are important translational science approach to foster the next generation of research.

Journal Article

Share this book

Add to My Shelf

27 Automated IRB compliance and secure data delivery in i2b2

by Usmani, Shakeel , Limbu, Saurav , Sherazi, Hira in Automation , Compliance , Informatics, AI and Data Science

2025

Objectives/Goals: To address the manual, time-consuming processes of validating IRB compliance and ensuring the secure delivery of i2b2 data, this project automates compliance checks, streamlines Protected Health Information (PHI) access, and provides timely, secure data availability while reducing administrative burdens and non-compliance risks. Methods/Study Population: This project enhances the i2b2 application to automate compliance processes and facilitate secure data delivery through integration with REDCap. By linking i2b2 with the IRB system, the application performs automatic compliance checks for project requests, verifying GCP and HIPAA certifications, only allowing the release of IRB-approved PHI variables, safeguarding against unauthorized data access. Manual signatures confirm non-automated compliance processes. Once verified, the application automatically creates a REDCap project, assigns user access, and securely delivers data, ensuring compliance with HIPAA regulations. Results/Anticipated Results: The automated system successfully streamlined IRB compliance checks and data delivery for i2b2 requests. Validation of certifications like GCP and HIPAA, now occurs automatically, significantly reducing the risk of non-compliance. Personnel access to data is limited to IRB-approved PHI, ensuring data security and adherence to institutional standards. The integration with REDCap has reduced manual processes, cutting data request processing time to approximately 30 minutes. Researchers and administrative staff experienced a notable decrease in administrative burden, with faster, more efficient access to approved data while maintaining full compliance with IRB and HIPAA regulations. Discussion/Significance of Impact: The lessons learned can be adapted by institutions to improve compliance efficiency and reduce administrative overhead. Implementing similar automation of certification checks and data delivery, sites can enhance data security, minimize errors, and ensure faster, compliant access to research data.

Journal Article

Share this book

Add to My Shelf

378 Leveraging large language models to communicate translational science benefits at Weill Cornell Medicine Clinical and Translational Science Center

by Imperato-McGinley, J , Sholle, E Campion , Jr, TR in Informatics, AI and Data Science , Large language models , Population studies

2025

Objectives/Goals: This Weill Cornell Clinical and Translational Science Collaborative (CTSC) project evaluates whether large language models (LLMs) can generate accurate summaries of translational science benefits using the Translational Science Benefits Model (TSBM) framework, aiming to identify optimal LLMs and prompting strategies via expert review. Methods/Study Population: We are using prompt engineering to train multiple LLMs to generate one-page impact profiles based on the TSBM framework. LLMs will be selected via benchmarks, focusing on models excelling in information extraction. Leading LLMs (e.g., Llama 3.2, ChatGPT 4.0, Gemini 1.5 Pro, and Claude) and other high-performing models will be considered. Initial work has utilized Gemini 1.5 Pro. Models use data from CTSC-supported projects in WebCAMP, our local instantiation of a translational research activity tracking system used by >20 CTSA hubs, and manuscripts from the Overton database cited in policy documents. Human experts will evaluate the quality and accuracy of LLM-generated profiles. Results/Anticipated Results: Preliminary results using Gemini 1.5 Pro indicate that LLMs can generate coherent and informative impact profiles encompassing diverse areas within the TSBM. Face validity appears satisfactory, suggesting the outputs align with expectations. We anticipate that further exploration with other LLMs and expert validation will reveal strengths and weaknesses of the LLM approach, including the potential for naccuracies (“hallucinations”), informing further refinement of models and prompting strategies. Analysis of manuscripts cited in policy will provide valuable insights into communicating policy-relevant benefits effectively, and benchmark comparisons will identify optimal LLMs for this use case. Discussion/Significance of Impact: This project demonstrates LLMs’ potential for streamlining and enhancing impact reporting in translational science, enabling broader dissemination of research outcomes and promoting better understanding among stakeholders. Future work will integrate LLM-based reporting into research infrastructure.

Journal Article

Share this book

Add to My Shelf

386 Developing an assessment tool for NIH data management and sharing plans to understand current data practices and needs

by Surkis, Alisa , LaPolla, Fred , Yee, Michelle in Experimental research , Informatics, AI and Data Science , Management

2025

Objectives/Goals: NIH requires researchers submit Data Management and Sharing (DMS) Plans with their grant applications. Librarians developed an assessment tool for the plans and completed a pilot assessment in order to leverage the plans and understand current institutional research data management and sharing. Methods/Study Population: The assessment tool includes questions related to evaluations of DMS Plans as well as questions related to the content of the plans. Evaluation questions were adapted from the Federation of American Societies for Experimental Biology evaluation rubric developed for the DataWorks! Data Management Plan (DMP) Challenge. Fields were added to collect information on the content of DMS Plans, including data type, institutional resources, data repositories, data standards, and data dissemination timelines. The assessment tool was tested in a pilot implementation. Seven library workers were trained and completed paired review samples of 27 DMS Plans (54 evaluations total) in order to test for tool reliability. Results/Anticipated Results: Results include findings on the reliability of the tool as well as preliminary results from an assessment of DMS Plans. Findings on the reliability of the tool include assessments of the paired reviewers for each question included in the tool. Paired reviewers generally agreed, but tended to differ on specific questions, including questions pertaining to the data types generated or used in a research project. Questions with high levels of agreement included subjects of study and code sharing practices. Results on the content of the DMS Plans include information such as data repositories used, data oversight responsibilities, and data and metadata standards employed. Discussion/Significance of Impact: DMS Plans present an opportunity to better understand data management and sharing practices, and good data management supports high-quality, reproducible research. Developing and testing assessment tools for these plans is a key step toward understanding and improving current research data management practices.

Journal Article

Share this book

Add to My Shelf

358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases

by Lin, Michael , Weis, Jennifer , Abdul Fattah, H M in Clinical trials , Informatics , Informatics, AI and Data Science

2025

Objectives/Goals: Identifying and indexing rare disease studies is labor intensive, especially in research centers with a large number of trials. To address this gap, we applied natural language processing (NLP) and visualization techniques to develop an efficient pipeline and user-friendly web interface. Our goal is to offer the rare disease study identification (RDSI) tool for adoption by other sites. Methods/Study Population: The RDSI retrieves study information (short and long titles, study abstract) from the IRB system. These descriptive fields are then processed by the MetaMap Lite NLP program for identifying disease terms and standardizing them to UMLS concepts. By terminology identifier mapping, the diseases intersecting with concepts in rare disease databases (Genetic and Rare Disease program and Orphanet) are further scored to pinpoint studies that focus on a rare disease. The web interface displays a scatter bubble chart as an overview of all the rare diseases, with each bubble size proportional to the number of studies for that disease. In addition to the visual navigation, users can search studies by disease name, PI, or IRB number. Search results contain detailed study information as well as the evidence used by algorithms of the pipeline. Results/Anticipated Results: The RDSI identification results and functions were verified manually and spot-checked by several study investigators. The web interface is a self-contained solution available to our staff for various use cases like reporting or environment scan. We have built in a versioning mechanism that logs the date of each major result in the process. Therefore, even as the rare disease data sources evolve over time, we will be able to preserve any historical context or perform updates as needed. The RDSI outputs are replicated to Mayo Clinic’s enterprise data warehouse daily, allowing tech-savvy users to leverage any useful intermediate results at the backend. We anticipate the performance of the rare disease identification to be further enhanced by employing the advancements in AI technology. Discussion/Significance of Impact: The RDSI represents an informatics solution that offers efficiency in identifying and navigating rare disease clinical studies. It features the use of public databases and open-source tools, manifesting return on investment from the broad translational science ecosystem. These considerations are informative and adoptable by other institutions.

Journal Article

Share this book

Add to My Shelf

347 Modeling long-term environmental effects on discrete events using shapelets: An application for stillbirth

by Riches, Naomi , Facelli, Julio , Silver, Robert in Artificial intelligence , Environmental effects , Genetic transformation

2025

Objectives/Goals: To develop an informatics framework that will allow study of environmental effects on stillbirth at large scale (i.e., US-level) and leverage recent advances in machine learning and artificial intelligence to produce reproducible results that can be compared across multiple institutional settings. Methods/Study Population: Experimental exposure data are often available in “absolute time,” where a clinical event can be anchored using a timeline transformation. We associate each stillbirth event with a set of ti…ti+1L shapelets [1] associated with a location, L, and time intervals for the entire dataset. These shapeless are aggregated using a state-of-the-art shapelet classifier [2]. An autoencoder is used to reduce the dimensionality of the stillbirth classification and to cluster stillbirth events according to their corresponding exposure patterns. The stillbirth cluster can be analyzed for other nonexposure (i.e., genetic, SDoH, and demographics) factors, which may be enriched and/or depleted. Results/Anticipated Results: The framework we are developing leverages a shapelet-based approach to produce clusters of stillbirth events according to their corresponding exposure patterns. These clusters can be analyzed for depletion or enrichment of nonenvironmental factors. This analysis will inform how to formulate (or not) class models of exposure that can be more informative and have better predictive power than overall population models. Moreover, the finding of depletion and enrichment of physiological properties of the individuals may lead to novel physiological hypotheses to better understand the injury mechanisms that the environmental exposure profile produces. Discussion/Significance of Impact: Nearly 20,000 babies are stillborn in the USA each year [3]. Environmental exposures, usually studied as time averages over certain periods of time, have produced mixed results for stillbirth risk [4]. However, temporal profiles matter [1], and we argue that they can be assessed using shapelet technology.

Journal Article

Share this book

Add to My Shelf

380 National trends in interventional clinical trial participation by race, gender, and age: Insights from EHR data on over 130 million patients

by Kaelber, David C. , Fry, Sarah , Terebuh, Pauline in Age composition , Clinical trials , Gender

2025

Objectives/Goals: To investigate interventional clinical trial participation overall and by race, gender, and age. Methods/Study Population: We used Epic Cosmos, an aggregated, de-identified EHR platform including over 270 million patients, to examine overall clinical trial participation and the race, gender, and age composition of participants versus non-participants. Patients ≥5 years old with known race and gender and at least one healthcare encounter between 2021 and 2024 were included. Interventional trial enrollment was identified by a “research flag” indicating current or past participation in an interventional study within an Epic system contributing data to Cosmos. Race was categorized as American Indian, Asian, Black, Native Hawaiian, or White. Age-adjusted relative representation (RR) ratios were used to compare participation, with RR >1 indicating over-representation and RR Results/Anticipated Results: Of 130,455,189 patients meeting eligibility criteria, 0.52% (673,425) of patients were active or inactive in an interventional clinical trial. Results are shown in the figure below. The poorest representation was from Asian and NH/PI persons. Representation was most similar to the patient population for whites and AI/AN persons. Black males participated less and women, more than predicted by patient composition. Older patients participated more frequently than younger (age, mean (SD), y, 53 (22) vs. 46 (23); p Discussion/Significance of Impact: This is the first study we know of describing interventional trial participation in the USA across millions and millions of patients. Further research is needed to clarify whether these differences are due to the nature of the studies themselves (e.g., OB/GYN trials including only women, etc.) versus disparities in recruitment or otherwise.

Journal Article

Share this book

Add to My Shelf

362 Evaluating the educational quality of ChatGPT as a health information resource for patients with acute myeloid leukemia (AML)

by Chinyengetere, Fadzai , Taylor, Allison O. , Branchaud, Brenda in Acute myeloid leukemia , Chatbots , Diagnosis

2025

Objectives/Goals: Upon diagnosis, patients with acute myeloid leukemia (AML) have significant information needs. Given its recent increase in popularity, patients may use ChatGPT to access information about AML. We will examine the quality, reliability, and readability of information that ChatGPT provides in response to frequently asked questions (FAQs) about AML. Methods/Study Population: From FAQs on the top 3 patient-facing websites about AML, we derived 26 questions, written in lay terms, about AML diagnosis, treatment, prognosis, and functional impact. We queried ChatGPT-4o on 10/14/2024 using a new Google account with no prior history. We asked each question in a separate chat window once, verbatim, and without prompt engineering. After calibration, 5 oncologists independently reviewed ChatGPT responses. We assessed quality via the Global Quality Scale (GQS), scored from 1 (poor) to 5 (excellent) based on flow, topic coverage, and usefulness. For reliability, we assessed whether each response addresses the query and is factually accurate, elaborating on specific inaccuracies. For readability, we assessed Flesch-Kincaid Grade Level, Gunning Fog Index, and Simple Measure of Gobbledygook. Results/Anticipated Results: This will be a descriptive analysis of ChatGPT responses. For quality and reliability assessments, we will report Fleiss’ kappa for inter-rater reliability and expect substantial agreement or greater (≥0.61). Per prior studies in other domains, we hypothesize that ChatGPT responses will have good quality on average (i.e., GQS score near 4). We hypothesize that nearly all responses will address their query and will mostly be accurate; a minority of responses may have partial inaccuracies. Finally, we hypothesize that readability metrics will suggest that a higher educational level (e.g., college-level education) is required for comprehension. Overall, these findings will help elucidate strengths and limitations of ChatGPT for AML and guide discussion of factors patients should be aware of when using ChatGPT. Discussion/Significance of Impact: No prior study has examined the educational quality of ChatGPT for AML. Our study will detail whether patients are receiving trustworthy and meaningful information, identify misinformation, and provide guidance to oncologists when recommending information resources to patients or fielding questions that patients may raise after using ChatGPT.

Journal Article

Share this book

Add to My Shelf

356 Usability, acceptability, and future opportunities of mobile health (mHealth) apps for caregiver health decision making: A scoping review

by Clarke, Martina , Kerns, Ellen , Dai, Jiayu in Caregivers , Decision making , Informatics, AI and Data Science

2025

Objectives/Goals: This study aims to evaluate common features of mobile health (mHealth) apps and their role in helping caregivers make health decisions for children. Methods/Study Population: A scoping review of literature on caregivers’ use of mHealth apps (published since 2008) was conducted across 5 databases (i.e., Embase, PubMed, CINAHL, Clinicaltrials.gov, and IEEE Xplore). Selected papers were categorized based on app purposes, target users, and mHealth agile development phases. Common features were also identified and analyzed along with users’ pros and cons. Further, primary feature requests were summarized to inform future development. Results/Anticipated Results: This review included 62 studies. Most apps were about maternity and infant care and specific diseases. Major users were caregivers and pregnant women. Around 20% of papers covered multiple phases in the mHealth agile development lifecycle. The effectiveness/clinical trial (phase III) was the most common. E-learning, personalization and customization, and health tracking features were the three most common features of mHealth apps included in this review. More positive feedback was found regarding features than concerns. Caregivers perceived apps as helpful and empowered them to make informed decisions. Concerns were mainly over 1) technical issues, 2) inappropriate design, and 3) ambiguous terms. Requested new features included content comprehensiveness, user engagement, and usage flexibility. Discussion/Significance of Impact: To our knowledge, this is the first review to investigate the usability of mHealth app features in this area. The results offer feasible strategies for developers to improve the effectiveness of apps for caregiver decision-making.

Journal Article

Share this book

Add to My Shelf

369 Precision education and generative AI in surgery utilization study: A framework for global surgical education

by Please, Helen , Nsubuga, Mike , Kintu, Timothy in Curricula , Epidemiology , Generative artificial intelligence

2025

Objectives/Goals: Global surgical education is largely driven by high-income countries (HICs), with curricula not tailored to the needs of low- and middle-income countries (LMICs). This study assessed country-specific needs for global surgical curricula and used generative AI to develop tailored curricula. Methods/Study Population: A curriculum framework was developed using expert opinion. Using a focused needs assessment survey, we evaluated international medical students’ and trainees’ needs for structured global surgery curricula, covering research, education, data and develop tailored curriculum templates for each country, ensuring alignment with the distinct needs of respective LMIC and HIC respondents. The AI-generated curricula were then compared across countries to identify variations in content and focus areas. Results/Anticipated Results: A total of 145 respondents from 18 countries and 6 continents participated, with 94 from LMICs and 51 from HICs. Four countries [Uganda (n = 31), Nigeria (n = 34), the USA (n = 23), and the UK (n = 23)] had more than 10 respondents, with the creation of a country specific global surgery curriculum. Curricula developed by HIC trainees focused on access to resources and infrastructure, future directions of global surgical research, and the role of medical students and early career development with a decreased focus on the history of global surgery. LMIC country-based curriculum focused on introducing the concepts of global surgery, quantifying the burden and epidemiology of surgical disease and had a greater emphasis on case studies and use cases, with decreased focus on resources and collaboration. Discussion/Significance of Impact: The research introduces a “precision education” approach that could help close the surgical education access gap globally. Further pilot and qualitative studies are necessary to validate the feasibility of AI-generated needs-based curricula.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter