Catalogue Search | MBRL

A large language model for electronic health records

by PourNejatian, Nima , Lipori, Gloria , Martin, Cheryl in 692/308 , 692/700 , Artificial intelligence

2022

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og .

Journal Article

Share this book

Add to My Shelf

Predicting blood–brain barrier permeability of molecules with a large language model and machine learning

by Tseng, Warren C. W. , Yang, Jai-Sing , Liao, Ken Y. K. in 631/114 , 639/705 , Artificial intelligence

2024

Predicting the blood–brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood–brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule could cross through the BBB. We used Morgan or Circular fingerprints to apply the Morgan algorithm to a set of atomic invariants as a baseline encoder also with an XGBoost classifier to compare the results. BBB permeability was assessed in vitro using three-dimensional (3D) human BBB spheroids (human brain microvascular endothelial cells, brain vascular pericytes, and astrocytes). Using multiple BBB databases, the results of the final in silico transformer and XGBoost model achieved an area under the receiver operating characteristic curve of 0.88 on the held-out test dataset. Temozolomide (TMZ) and 21 randomly selected BBB permeable compounds (Pred scores = 1, indicating BBB-permeable) from the NPRL penetrated human BBB spheroid cells. No evidence suggests that ferulic acid or five BBB-impermeable compounds (Pred scores < 1.29423E−05, which designate compounds that pass through the human BBB) can pass through the spheroid cells of the BBB. Our validation of in vitro experiments indicated that the in silico prediction of small-molecule permeation in the BBB model is accurate. Transformer-based models like MegaMolBART, leveraging the SMILES representations of molecules, show great promise for applications in new drug discovery. These models have the potential to accelerate the development of novel targeted treatments for disorders of the central nervous system.

Journal Article

Share this book

Add to My Shelf

Federated learning for predicting clinical outcomes in patients with COVID-19

by Kang, Min Kyu , de Antônio Corradi, Gustavo César , Sriswasdi, Sira in 631/114/2397 , 692/699/255/2514 , 692/700/1421/1770

2021

Federated learning (FL) is a method used for training artificial intelligence models with data from multiple sources while maintaining data anonymity, thus removing many barriers to data sharing. Here we used data from 20 institutes across the globe to train a FL model, called EXAM (electronic medical record (EMR) chest X-ray AI model), that predicts the future oxygen requirements of symptomatic patients with COVID-19 using inputs of vital signs, laboratory data and chest X-rays. EXAM achieved an average area under the curve (AUC) >0.92 for predicting outcomes at 24 and 72 h from the time of initial presentation to the emergency room, and it provided 16% improvement in average AUC measured across all participating sites and an average increase in generalizability of 38% when compared with models trained at a single site using that site’s data. For prediction of mechanical ventilation treatment or death at 24 h at the largest independent test site, EXAM achieved a sensitivity of 0.950 and specificity of 0.882. In this study, FL facilitated rapid data science collaboration without data exchange and generated a model that generalized across heterogeneous, unharmonized datasets for prediction of clinical outcomes in patients with COVID-19, setting the stage for the broader use of FL in healthcare. Federated learning, a method for training artificial intelligence algorithms that protects data privacy, was used to predict future oxygen requirements of symptomatic patients with COVID-19 using data from 20 different institutes across the globe.

Journal Article

Share this book

Add to My Shelf

Radial Basis Functions for Combining Shape and Speckle Tracking in Echocardiography

by Compas, Colin B in Biomedical engineering , Medical imaging

2013

Heart disease is the number one cause of death in the United States and the left ventricle is often studied as an overall indicator of heart health. Quantitative analysis of left ventricular deformation has been an active area of research within the medical imaging community for many years. Accurate motion tracking can provide quantitative information about the extent and location of myocardial injury. This information can be useful both in the diagnosis of disease and in studying the efficacy of disease treatments. Echocardiography provides a non-invasive, readily available method for generating real-time images of the left ventricle. Left ventricle deformation analysis often begins with some form of frame to frame displacement estimation. While there are a variety of strategies that might accomplish this, we chose two complementary approaches that can estimate information at both the myocardial boundaries (shape tracking) and mid-wall (RF-based speckle tracking). In shape tracking, features are generated directly from image intensity values or segmentation of the ventricle and then matched in neighboring frames using a point-matching technique. Speckle tracking is an alternative approach that tracks patterns in the ultrasound data in subsequent frames. These two methods provide complementary information. Shape tracking gives high accuracy on boundaries of the heart wall, while speckle tracking gives reliable displacements across the myocardium. This work focuses on combining these two methods to yield a more accurate representation of the overall ventricular deformation. The displacements generated from the two methods provide sparse information over the heart wall and radial basis functions can be used to combine the two sources of information to generate a dense displacement field. From the displacement data measures of deformation, cardiac strains, can be calculated to determine the condition of the heart. In this work we develop an adaptive multilevel radial basis function approach to combine shape and speckle tracked displacements for both 2D+time and 3D+time data. These methods are evaluated on acute open-chest canines pre- and post-coronary artery occlusion to show the ability of the combined method to differentiate diseased tissue. The proposed methods are compared against magnetic resonance tagged images for both 2D+time and 3D+time data to validate the results. We further validate our findings by comparing functionally defined infarct zones found using our methods to post-mortem histology of the hearts. We show that our methods are able to identify normal and diseased tissue, as well as a functional border zone that is critical for treatment. We also show that our methods are able to define and track these zones in closed-chest chronic data acquired at four time points following coronary artery occlusion. The findings suggest that the combined method will allow for the use of these methods in investigating new treatment methods for heart disease.

Dissertation

Share this book

Add to My Shelf

Federated Learning for Breast Density Classification: A Real-World Implementation

by Flores, Mona , Gupta, Vikash , Yun, B Min in Breast , Classification , Density

2020

Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data.

Paper

Share this book

Add to My Shelf

GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records

by PourNejatian, Nima , Lipori, Gloria , Shenkman, Elizabeth A in Electronic health records , Health care , Mathematical models

2022

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model - GatorTron - using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on 5 clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve 5 clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter