Catalogue Search | MBRL

A pathology foundation model for cancer diagnosis and prognosis prediction

by Zhao, Junhan , Dillon, Deborah , Li, Yu in 631/114/1305 , 631/114/1564 , 631/114/2397

2024

Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task 1 , 2 . Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations 3 . Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high-resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using 19,491 whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer. A study describes the development of a generalizable foundation machine learning framework to extract pathology imaging features for cancer diagnosis and prognosis prediction.

Journal Article

Share this book

Add to My Shelf

Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images

by Xie, Ting , Meng, Xiang-He , Deng, Hong-Wen in 119/118 , 631/114/1305 , 631/67/2321

2021

Machine-assisted pathological recognition has been focused on supervised learning (SL) that suffers from a significant annotation bottleneck. We propose a semi-supervised learning (SSL) method based on the mean teacher architecture using 13,111 whole slide images of colorectal cancer from 8803 subjects from 13 independent centers. SSL (~3150 labeled, ~40,950 unlabeled; ~6300 labeled, ~37,800 unlabeled patches) performs significantly better than the SL. No significant difference is found between SSL (~6300 labeled, ~37,800 unlabeled) and SL (~44,100 labeled) at patch-level diagnoses (area under the curve (AUC): 0.980 ± 0.014 vs. 0.987 ± 0.008, P value = 0.134) and patient-level diagnoses (AUC: 0.974 ± 0.013 vs. 0.980 ± 0.010, P value = 0.117), which is close to human pathologists (average AUC: 0.969). The evaluation on 15,000 lung and 294,912 lymph node images also confirm SSL can achieve similar performance as that of SL with massive annotations. SSL dramatically reduces the annotations, which has great potential to effectively build expert-level pathological artificial intelligence platforms in practice. Machine-assisted recognition of colorectal cancer has been mainly focused on supervised deep learning that suffers from a significant bottleneck of requiring massive amounts of labeled data. Here, the authors propose a semi-supervised model based on the mean teacher architecture that provides pathological predictions at both patch- and patient-levels.

Journal Article

Share this book

Add to My Shelf

Recommendations and future directions for supervised machine learning in psychiatry

by Baune, Bernhard T , Cearns, Micah , Hahn, Tim in Artificial intelligence , Machine learning , Psychiatry

2019

Machine learning methods hold promise for personalized care in psychiatry, demonstrating the potential to tailor treatment decisions and stratify patients into clinically meaningful taxonomies. Subsequently, publication counts applying machine learning methods have risen, with different data modalities, mathematically distinct models, and samples of varying size being used to train and test models with the promise of clinical translation. Consequently, and in part due to the preliminary nature of such works, many studies have reported largely varying degrees of accuracy, raising concerns over systematic overestimation and methodological inconsistencies. Furthermore, a lack of procedural evaluation guidelines for non-expert medical professionals and funding bodies leaves many in the field with no means to systematically evaluate the claims, maturity, and clinical readiness of a project. Given the potential of machine learning methods to transform patient care, albeit, contingent on the rigor of employed methods and their dissemination, we deem it necessary to provide a review of current methods, recommendations, and future directions for applied machine learning in psychiatry. In this review we will cover issues of best practice for model training and evaluation, sources of systematic error and overestimation, model explainability vs. trust, the clinical implementation of AI systems, and finally, future directions for our field.

Journal Article

Share this book

Add to My Shelf

Single-modal and multi-modal false arrhythmia alarm reduction using attention-based convolutional and recurrent neural networks

by Fotoohinasab, Atiyeh , Afghah, Fatemeh , Mousavi, Sajad in Alarms , Algorithms , Arrhythmia

2020

This study proposes a deep learning model that effectively suppresses the false alarms in the intensive care units (ICUs) without ignoring the true alarms using single- and multi- modal biosignals. Most of the current work in the literature are either rule-based methods, requiring prior knowledge of arrhythmia analysis to build rules, or classical machine learning approaches, depending on hand-engineered features. In this work, we apply convolutional neural networks to automatically extract time-invariant features, an attention mechanism to put more emphasis on the important regions of the segmented input signal(s) that are more likely to contribute to an alarm, and long short-term memory units to capture the temporal information presented in the signal segments. We trained our method efficiently using a two-step training algorithm (i.e., pre-training and fine-tuning the proposed network) on the dataset provided by the PhysioNet computing in cardiology challenge 2015. The evaluation results demonstrate that the proposed method obtains better results compared to other existing algorithms for the false alarm reduction task in ICUs. The proposed method achieves a sensitivity of 93.88% and a specificity of 92.05% for the alarm classification, considering three different signals. In addition, our experiments for 5 separate alarm types leads significant results, where we just consider a single-lead ECG (e.g., a sensitivity of 90.71%, a specificity of 88.30%, an AUC of 89.51 for alarm type of Ventricular Tachycardia arrhythmia).

Journal Article

Share this book

Add to My Shelf

A supervised learning approach for diffusion MRI quality control with minimal training data

by Zhang, Hui , Graham, Mark S. , Drobnjak, Ivana in Artefacts , Automation , Brain Mapping - methods

2018

Quality control (QC) is a fundamental component of any study. Diffusion MRI has unique challenges that make manual QC particularly difficult, including a greater number of artefacts than other MR modalities and a greater volume of data. The gold standard is manual inspection of the data, but this process is time-consuming and subjective. Recently supervised learning approaches based on convolutional neural networks have been shown to be competitive with manual inspection. A drawback of these approaches is they still require a manually labelled dataset for training, which is itself time-consuming to produce and still introduces an element of subjectivity. In this work we demonstrate the need for manual labelling can be greatly reduced by training on simulated data, and using a small amount of labelled data for a final calibration step. We demonstrate its potential for the detection of severe movement artefacts, and compare performance to a classifier trained on manually-labelled real data. •We demonstrate a classifier for the quality control of DW-MRI data.•The classifier is trained to spot movement-corrupted volumes.•It greatly reduces the need for labelled training data by making use of simulation.•The classifier performance is similar to a classifier trained entirely on real data.

Journal Article

Share this book

Add to My Shelf

Identification of relevant features using SEQENS to improve supervised machine learning models predicting AML treatment outcome

by Signol, François , Alvarez, Noemi , Arnal, Laura in Acute myeloid leukemia , Adult , Aged

2025

Background and objective This study has two main objectives. First, to evaluate a feature selection methodology based on SEQENS, an algorithm for identifying relevant variables. Second, to validate machine learning models that predict the risk of complications in patients with acute myeloid leukemia (AML) using data available at diagnosis. Predictions are made at three time points: 90 days, six months, and one year post-diagnosis. These objectives represent fundamental steps toward the development of a tool to assist clinicians in therapeutic decision-making and provide insights into the risk factors associated with AML complications. Methods A dataset of 568 patients, including demographic, clinical, genetic (VAF), and cytogenetic information, was created by combining data from Hospital 12 de Octubre (Madrid, Spain) and Instituto de Investigación Sanitaria La Fe (Valencia, Spain). Feature selection based on an enhanced version of SEQENS was conducted for each time point, followed by the comparison of four classifiers (XGBoost, Multi-Layer Perceptron, Logistic Regression and Decision Tree) to assess the impact of feature selection on model performance. Results SEQENS identified different relevant features for each prediction horizon, with Age, TP53, − 7/7Q, and EZH2 consistently relevant across all time points. The models were evaluated using 5-fold cross-validation, XGBoost achieve the highest average ROC-AUC scores of 0.81, 0.84, and 0.82 for 90-day, 6-month, and 1-year predictions, respectively. Generally, performance remained stable or improved after applying SEQENS-based feature selection. Evaluation on an external test set of 54 patients yielded ROC-AUC scores of 0.72 (90-day), 0.75 (6-month), and 0.68 (1-year). Conclusions The models achieved performance levels that suggest they could serve as therapeutic decision support tools at different times after diagnosis. The selected variables align with the European LeukemiaNet (ELN) 2022 risk classification, and the SEQENS-based feature selection effectively reduced the feature set while maintaining prediction accuracy.

Journal Article

Share this book

Add to My Shelf

Risk Prediction of Low Bone Density in Elderly Patients with Supervised Machine Learning Algorithms

by Karaismailoğlu, Eda , Karaismailoğlu, Serkan in Aged , Aged patients , Aged, 80 and over

2025

Low bone mineral density (BMD) is a common age-related condition that elevates the risk of fractures and mortality. Machine learning (ML) techniques offer a promising approach for early prediction using readily available clinical, biochemical, and demographic data. To evaluate the predictive performance of eleven ML models in identifying low BMD and to determine the most influential risk factors using the best-performing model. Cross-sectional study. Data were obtained from National Health and Nutrition Examination Survey (2005-2010, 2013-2014, and 2017-2020), focusing on individuals aged ≥ 50 years with available femoral neck or total femur BMD data. After applying exclusion criteria, 12,108 participants were included. Supervised ML algorithms were trained using 57 clinical, biochemical, demographic, and behavioral features. Model performance was assessed using accuracy, area under the curve (AUC), recall, precision, and F1 score. SHAP analysis was employed to interpret model outputs and rank predictors. The extra trees classifier outperformed other ML methods, achieving an accuracy of 76.7% and an AUC of 0.85. Recursive Feature Elimination with Cross-Validation identified 14 key predictors of low BMD in descending order of importance: sex, age, body mass index, race, family income-to-poverty ratio, serum uric acid, diabetes status, HDL cholesterol, urinary creatinine, alkaline phosphatase, mean cell volume, lymphocyte count, diastolic blood pressure, and glycohemoglobin. Tree-based ML models, particularly Extra Trees, can effectively predict low BMD. The identified risk factors include both established and lesser-studied predictors. These findings support the use of ML for personalized osteoporosis and osteopenia screening and highlight its ability to capture complex, multifactorial relationships in population health data.

Journal Article

Share this book

Add to My Shelf

Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting

by Buckley, Michael , Gu, Muxin in Algorithms , Analysis , Ancient bone identification

2018

Background Biomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels. High-throughput peptide mass fingerprinting has been largely applied to bacterial identification, but increasingly used to identify archaeological and palaeontological skeletal material to yield information on past environments and human-animal interaction. However, as applications move away from predominantly domesticate and the more abundant wild fauna to a much wider range of less common taxa that do not yet have genetically-derived sequence information, robust methods of species identification and biomarker selection need to be determined. Results Here we developed a supervised machine learning algorithm for classifying the species of ancient remains based on collagen fingerprinting. The aim was to minimise requirements on prior knowledge of known species while yielding satisfactory sensitivity and specificity. The algorithm uses iterations of a modified random forest classifier with a similarity scoring system to expand its identified samples. We tested it on a set of 6805 spectra and found that a high level of accuracy can be achieved with a training set of five identified specimens per taxon. Conclusions This method consistently achieves higher accuracy than two-dimensional principal component analysis and similar accuracy with hierarchical clustering using optimised parameters, which greatly reduces requirements for human input. Within the vertebrata, we demonstrate that this method was able to achieve the taxonomic resolution of family or sub-family level whereas the genus- or species-level identification may require manual interpretation or further experiments. In addition, it also identifies additional species biomarkers than those previously published.

Journal Article

Share this book

Add to My Shelf

Scientific discovery in the age of artificial intelligence

by Liu, Ziming , Coley, Connor W. , Song, Le in 631/114/1305 , 639/705/117 , 639/705/531

2023

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI tools need a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation. The advances in artificial intelligence over the past decade are examined, with a discussion on how artificial intelligence systems can aid the scientific process and the central issues that remain despite advances.

Journal Article

Share this book

Add to My Shelf

Comparing different supervised machine learning algorithms for disease prediction

by Hossain, Md Ekramul , Uddin, Shahadat , Moni, Mohammad Ali in Algorithms , Analysis , Bayes Theorem

2019

Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter