Catalogue Search | MBRL

Ensemble learning of foundation models for precision oncology

by Zhang, Xiaoming , Kelley, Yuan , Eweje, Feyisope in Artificial intelligence , Biomarkers , Ensemble learning

2025

Histopathology is essential for disease diagnosis and treatment decision-making. Recent advances in artificial intelligence (AI) have enabled the development of pathology foundation models that learn rich visual representations from large-scale whole-slide images (WSIs). However, existing models are often trained on disparate datasets using varying strategies, leading to inconsistent performance and limited generalizability. Here, we introduce ELF (Ensemble Learning of Foundation models), a novel framework that integrates five state-of-the-art pathology foundation models to generate unified slide-level representations. Trained on 53,699 WSIs spanning 20 anatomical sites, ELF leverages ensemble learning to capture complementary information from diverse models while maintaining high data efficiency. Unlike traditional tile-level models, ELF's slide-level architecture is particularly advantageous in clinical contexts where data are limited, such as therapeutic response prediction. We evaluated ELF across a wide range of clinical applications, including disease classification, biomarker detection, and response prediction to major anticancer therapies, cytotoxic chemotherapy, targeted therapy, and immunotherapy, across multiple cancer types. ELF consistently outperformed all constituent foundation models and existing slide-level models, demonstrating superior accuracy and robustness. Our results highlight the power of ensemble learning for pathology foundation models and suggest ELF as a scalable and generalizable solution for advancing AI-assisted precision oncology.

Paper

Share this book

Add to My Shelf

Stress response silencing by an E3 ligase mutated in neurodegeneration

by Wernig, Marius , Haakonsen, Diane L. , Witus, Samuel R. in 13/100 , 13/109 , 14/1

2024

Stress response pathways detect and alleviate adverse conditions to safeguard cell and tissue homeostasis, yet their prolonged activation induces apoptosis and disrupts organismal health 1 – 3 . How stress responses are turned off at the right time and place remains poorly understood. Here we report a ubiquitin-dependent mechanism that silences the cellular response to mitochondrial protein import stress. Crucial to this process is the silencing factor of the integrated stress response (SIFI), a large E3 ligase complex mutated in ataxia and in early-onset dementia that degrades both unimported mitochondrial precursors and stress response components. By recognizing bifunctional substrate motifs that equally encode protein localization and stability, the SIFI complex turns off a general stress response after a specific stress event has been resolved. Pharmacological stress response silencing sustains cell survival even if stress resolution failed, which underscores the importance of signal termination and provides a roadmap for treating neurodegenerative diseases caused by mitochondrial import defects. The E3 ligase SIFI is identified as a dedicated silencing factor of the integrated stress response, a finding that has implications for the development of therapeutics for neurodegenerative diseases caused by mitochondrial protein import stress.

Journal Article

Share this book

Add to My Shelf

Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models

by Cates, Andrew , Meystre, Stéphane M. , Bastian, Grace in Algorithms , Artificial Intelligence , Automation

2023

Background To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution. Methods To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow. Results Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system. Conclusions Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials.

Journal Article

Share this book

Add to My Shelf

Leveraging Artificial Intelligence for Clinical Study Matching: Key Threads for Interweaving Data Science and Implementation Science

by Obeid, Jihad S , Heider, Paul M , Goodwin, Andrew James in Algorithms , Artificial Intelligence , Clinical Information and Decision Making

2025

Artificial intelligence holds the potential to enhance the efficiency of clinical research. Yet, like all innovations, its impact is dependent upon target user uptake and adoption. As efforts to leverage artificial intelligence for clinical trial screening become more widespread, it is imperative that implementation science principles be incorporated in both the design and roll-out of user-facing tools. We present and discuss implementation themes considered to be highly relevant by target users of artificial intelligence–enabled clinical trial screening platforms. The identified themes range from design features that optimize usability to collaboration with tool designers to improve transparency and trust. These themes were generally mapped to domains of existing implementation science frameworks such as the Consolidated Framework for Implementation Research. Designers should consider incorporating an implementation science framework early in the development process to not only ensure a user-centered design but to inform how tools are integrated into existing clinical research workflows.

Journal Article

Share this book

Add to My Shelf

AI approaches for phenotyping Alzheimer's disease and related dementias using electronic health records

by Obeid, Jihad S. , Scherbakov, Dmitry , Cutty, Maxwell in Alzheimer's disease and related dementias , artificial intelligence , digital phenotype

2025

INTRODUCTION The current standard electronic (e‐)phenotype for identifying patients with Alzheimer's disease and related dementias (ADRD) from medical claims data yields suboptimal diagnostic accuracy. This study leveraged artificial intelligence (AI)–based text‐classification methods to improve the identification of patients with dementia due to ADRD using clinical notes from electronic health records (EHRs). METHODS EHR data for patients aged ≥ 64 (N = 4000) from an academic medical center were used. The cohort included 1000 patients with ADRD per the Chronic Conditions Warehouse (CCW) algorithm for ADRD (i.e., at least one ADRD International Classification of Diseases, Tenth Revision codes [ICD‐10 code]) and 3000 matched controls without ADRD (i.e., no CCW codes). We trained several AI‐based text‐classification models, including bag‐of‐words models, deep learning, and large language models (LLMs), to make ADRD determinations from clinical notes. The performance of each model was evaluated against “gold standard” manual chart review. RESULTS A foundational LLM derived from Llama 2 demonstrated superior performance in identifying patients with ADRD (area under the curve [AUC] = 0.9534, F1 score 0.8571) compared to both the current standard CCW algorithm (AUC = 0.8482, F1 score 0.8323, although only the AUC was statistically significantly different) and other AI‐based models. Several of the AI‐based models, including convolutional neural networks, also outperformed the CCW algorithm. DISCUSSION These findings highlight the potential of AI‐based text‐classification methods to optimize the automated identification of patients with ADRD using rich EHR data. However, the success of this approach depends on the quality of clinical notes, and more work is needed to refine and validate these methods across more diverse data sets. Highlights The current e‐phenotype for patients with Alzheimer's disease and related dementias (ADRD) in electronic health records has suboptimal diagnostic accuracy. The study used artificial intelligence (AI)–based text classification methods to improve the detection of patients with ADRD. AI‐based models, including convolutional neural networks, outperformed the Chronic Conditions Warehouse algorithm. The current standard electronic (e‐) phenotype for identifying patients with Alzheimer's Disease and Related Dementias (ADRD), the Chronic Conditions Warehouse (CCW) algorithm for ADRD, yields suboptimal diagnostic accuracy. We leveraged Artificial Intelligence (AI)‐based text‐classification methods, to improve the identification of patients with dementia due to ADRD using clinical notes from electronic health records (EHR). Using EHR data for patients aged 64 and older (N = 4000) from an academic medical center, we trained several AI‐based text‐classification models, including bag‐of‐words models, deep learning, and large language models (LLMs), to make ADRD determinations from clinical notes. The performance of each model was evaluated against “gold standard” chart review. A foundational LLM derived from Llama 2 demonstrated superior performance in identifying patients with ADRD dementia (area under the curve: AUC 0.95) compared to the current standard CCW algorithm (AUC = 0.85) and other AI‐based models. Several of the AI‐based models, including convolutional neural networks, also outperformed the CCW algorithm. These findings highlight the potential of AI‐based text‐classification methods to optimize the automated identification of patients with ADRD using rich EHR data. However, this approach depends on the quality of clinical notes and more work is needed to refine and validate these methods across more diverse data sets.

Journal Article

Share this book

Add to My Shelf

Granulomas in Diagnostic Biopsies Associated With High Risk of Crohn’s Complications—But May Be Preventable

by Heider, Amer , Adler, Jeremy , Lawrence, Lindsey S in Biopsy , Crohn Disease - complications , Crohn Disease - drug therapy

2022

Abstract Background Granulomatous intestinal inflammation may be associated with aggressive Crohn’s disease (CD) behavior. However, this has not been confirmed, and it is unknown if associated disease complications are preventable. Methods This is a retrospective cohort of patients younger than 21 years at CD diagnosis (November 1, 2005 to November 11, 2015). Clinical information was abstracted, including dates of starting medications and the timing of perianal fistula or stricture development, if any. Diagnostic pathology reports were reviewed, and a subset of biopsy slides were evaluated by a blinded pathologist. Patients were excluded if perianal fistula or stricture developed within 30 days after CD diagnosis. Medications were included in analyses only if started >90 days before development of perianal fistula or stricture. Results In total, 198 patients were included. Half (54%) had granulomas at diagnosis. Granulomas were associated with a greater than 3-fold increased risk of perianal fistula (hazard ration [HR] = 3.24; 95% confidence interval CI], 1.40–7.48). Immunomodulator and anti-tumor necrosis factor-α (anti-TNF) therapy were associated with 90% (HR, = 0.10; 95% CI, 0.03–0.42) and 98% (HR, = 0.02; 95% CI, 0.01–0.10) reduced risk of perianal fistula, respectively. Patients with granulomatous inflammation preferentially responded to anti-TNF therapy with reduced risk of perianal fistula. The presence of granulomas was not associated with risk of stricture. Immunomodulator and anti-TNF therapy were associated with 96% (HR, = 0.04; 95% CI, 0.01–0.22) and 94% (HR, = 0.06; 95% CI, 0.02–0.20) reduced risk of stricture, respectively. Conclusions Granulomas are associated with increased risk of perianal fistula but not stricture. Steroid sparing therapies seem to reduce the risk of both perianal fistula and stricture. For those with granulomas, anti-TNF-α therapy greatly reduced the risk of perianal fistula development, whereas immunomodulators did not.

Journal Article

Share this book

Add to My Shelf

Genome of the Tasmanian tiger provides insights into the evolution and demography of an extinct marsupial carnivore

by Cooper, Alan , Mitchell, Kieren J. , Soubrier, Julien in 631/181/2474 , 631/181/735 , 631/208/212/2304

2018

The Tasmanian tiger or thylacine ( Thylacinus cynocephalus ) was the largest carnivorous Australian marsupial to survive into the modern era. Despite last sharing a common ancestor with the eutherian canids ~160 million years ago, their phenotypic resemblance is considered the most striking example of convergent evolution in mammals. The last known thylacine died in captivity in 1936 and many aspects of the evolutionary history of this unique marsupial apex predator remain unknown. Here we have sequenced the genome of a preserved thylacine pouch young specimen to clarify the phylogenetic position of the thylacine within the carnivorous marsupials, reconstruct its historical demography and examine the genetic basis of its convergence with canids. Retroposon insertion patterns placed the thylacine as the basal lineage in Dasyuromorphia and suggest incomplete lineage sorting in early dasyuromorphs. Demographic analysis indicated a long-term decline in genetic diversity starting well before the arrival of humans in Australia. In spite of their extraordinary phenotypic convergence, comparative genomic analyses demonstrated that amino acid homoplasies between the thylacine and canids are largely consistent with neutral evolution. Furthermore, the genes and pathways targeted by positive selection differ markedly between these species. Together, these findings support models of adaptive convergence driven primarily by cis -regulatory evolution. The Tasmanian tiger is an extinct carnivorous marsupial. By sequencing the genome of a preserved specimen the authors show long-term population decline and reveal the genetic basis of the phenotypic convergence between Tasmanian tigers and canids.

Journal Article

Share this book

Add to My Shelf

Development and validation of natural language processing algorithms in the national ENACT network

by Garduno-Rapp, Nelly-Estefanie , Li, Chenyu , Xia, Zongqi in Algorithms , Artificial intelligence , Clinical trials

2025

Electronic Health Record (EHR) data are critical for advancing translational research and AI technologies. The ENACT network offers access to structured EHR data across 57 CTSA hubs. However, substantial information is contained in clinical narratives, requiring natural language processing (NLP) for research. The ENACT NLP Working Group was formed to make NLP-derived clinical information accessible and queryable across the network. We established the ENACT NLP Working Group with 13 sites selected based on criteria including clinical notes access, IT infrastructure, NLP expertise, and institutional support. We divided sites into five focus groups targeting clinical tasks within disease contexts. Each focus group consisted of two development sites and two validation sites. We extended the ENACT ontology to standardize NLP-derived data and conducted multisite evaluations using the Open Health Natural Language Processing (OHNLP) Toolkit. The working group achieved 100% site retention and deployed NLP infrastructure across all sites. We developed and validated NLP algorithms for rare disease phenotyping, social determinants of health, opioid use disorder, sleep phenotyping, and delirium phenotyping. Performance varied across sites (F1 scores 0.53-0.96), highlighting data heterogeneity impacts. We extended the ENACT common data model and ontology to incorporate NLP-derived data while maintaining Shared Health Research Informatics NEtwork (SHRINE) compatibility. This demonstrates feasibility of deploying NLP infrastructure across large, federated networks. The focus group approach proved more practical than general-purpose approaches. Key lessons include the challenge of data heterogeneity and importance of collaborative governance. This work also provides a foundation that other networks can build on to implement NLP capabilities for translational research.

Journal Article

Share this book

Add to My Shelf

Limited Genetic Diversity Preceded Extinction of the Tasmanian Tiger

by Pask, Andrew J. , Renfree, Marilyn B. , Heider, Thomas in Agriculture , Amino acids , Analysis

2012

The Tasmanian tiger or thylacine was the largest carnivorous marsupial when Europeans first reached Australia. Sadly, the last known thylacine died in captivity in 1936. A recent analysis of the genome of the closely related and extant Tasmanian devil demonstrated limited genetic diversity between individuals. While a similar lack of diversity has been reported for the thylacine, this analysis was based on just two individuals. Here we report the sequencing of an additional 12 museum-archived specimens collected between 102 and 159 years ago. We examined a portion of the mitochondrial DNA hyper-variable control region and determined that all sequences were on average 99.5% identical at the nucleotide level. As a measure of accuracy we also sequenced mitochondrial DNA from a mother and two offspring. As expected, these samples were found to be 100% identical, validating our methods. We also used 454 sequencing to reconstruct 2.1 kilobases of the mitochondrial genome, which shared 99.91% identity with the two complete thylacine mitochondrial genomes published previously. Our thylacine genomic data also contained three highly divergent putative nuclear mitochondrial sequences, which grouped phylogenetically with the published thylacine mitochondrial homologs but contained 100-fold more polymorphisms than the conserved fragments. Together, our data suggest that the thylacine population in Tasmania had limited genetic diversity prior to its extinction, possibly as a result of their geographic isolation from mainland Australia approximately 10,000 years ago.

Journal Article

Share this book

Add to My Shelf

Enhancing genome assemblies by integrating non-sequence based data

by Heider, Thomas N , Lindsay, James , Wang, Chenwei in Biomedicine , Medicine , Medicine & Public Health

2011

Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby ( Macropus eugenii ) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. Conclusions We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska . While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter