Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
36
result(s) for
"Xia, Yingda"
Sort by:
The Medical Segmentation Decathlon
2022
International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Although segmentation is the most widely investigated medical image processing task, the various challenges have been organized to focus only on specific clinical tasks. We organized the Medical Segmentation Decathlon (MSD)—a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities to investigate the hypothesis that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. MSD results confirmed this hypothesis, moreover, MSD winner continued generalizing well to a wide range of other clinical problems for the next two years. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to scientists that are not versed in AI model training.
International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Here, the authors present the results of a biomedical image segmentation challenge, showing that a method capable of performing well on multiple tasks will generalize well to a previously unseen task.
Journal Article
Large-scale pancreatic cancer detection via non-contrast CT and deep learning
2023
Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986–0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.
A deep learning model provides high accuracy in detecting pancreatic lesions in multicenter data, outperforming radiology specialists.
Journal Article
Multi-modal AI for opportunistic screening, staging and progression risk stratification of steatotic liver disease
by
Zhang, Xiaoming
,
Yao, Jiawen
,
Bai, Ruobing
in
692/4020/4021/1607/1605
,
692/699/1503/1607/2750
,
692/700/1421/1846/2771
2026
The global rise in steatotic liver disease poses a significant public health challenge. While non-contrast computed tomography scans hold promise for opportunistic detection of steatotic liver disease, their potential for staging and risk assessment remains underexplored. Here we present a multimodal AI model trained on a large dataset, comprising of (n=968) histopathologically and (n=1103) radiologically confirmed cases, validated against both histology (n=660) and MRI-PDFF (n=375) gold standards, demonstrating high accuracy in detecting mild to severe steatosis (AUC: 0.904–0.929) and clinically significant fibrosis (AUC: 0.824–0.888). Furthermore, integrating the model into the standard clinical pathway improves primary risk screening in a retrospective patient cohort (n=1192), identifying 36% more patients at risk of fibrosis progression. Using Cox proportional hazard model, we observe that the intermediate-high risk patients identified by the optimized clinical pathway exhibits a significantly higher incidence of cirrhosis (hazard ratio: 5.54: 2.69–11.42), showcasing the model’s potential for early detection and management of steatotic liver disease.
This study presents MAOSS, a multimodal AI model that repurposes non-contrast CT scans and leverages clinical features to detect and stage liver steatosis and fibrosis. Here the authors show MAOSS accurately stratifies cirrhosis progression risk when embedded into the standard clinical workflow, enabling scalable, opportunistic screening for early intervention of steatotic liver disease.
Journal Article
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios
2025
3D medical vision-language (VL) pretraining has shown potential in radiology by leveraging large-scale multimodal datasets with CT-report pairs. However, existing methods primarily rely on a global VL alignment directly adapted from 2D scenarios. The entire 3D image is transformed into one global embedding, resulting in a loss of sparse but critical semantics essential for accurately aligning with the corresponding diagnosis. To address this limitation, we propose CT-GLIP, a 3D Grounded Language-Image Pretrained model that constructs fine-grained CT-report pairs to enhance \\textit{grounded} cross-modal contrastive learning, effectively aligning grounded visual features with precise textual descriptions. Leveraging the grounded cross-modal alignment, CT-GLIP improves performance across diverse downstream tasks and can even identify organs and abnormalities in a zero-shot manner using natural language. CT-GLIP is trained on a multimodal CT dataset comprising 44,011 organ-level CT-report pairs from 17,702 patients, covering 104 organs. Evaluation is conducted on four downstream tasks: zero-shot organ recognition (OR), zero-shot abnormality detection (AD), tumor detection (TD), and tumor segmentation (TS). Empirical results show that it outperforms its counterparts with global VL alignment. Compared to vanilla CLIP, CT-GLIP achieves average performance improvements of 15.1% of F1 score, 1.9% of AUC, and 3.2% of DSC for zero-shot AD, TD, and TS tasks, respectively. This study highlights the significance of grounded VL alignment in enabling 3D medical VL foundation models to understand sparse representations within CT scans.
RevSAM2: Prompt SAM2 for Medical Image Segmentation via Reverse-Propagation without Fine-tuning
2024
The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, when the propagation mechanism of SAM2 is applied to medical images, it often results in spatial inconsistencies, leading to significantly different segmentation outcomes for very similar images. In this paper, we introduce RevSAM2, a simple yet effective self-correction framework that enables SAM2 to achieve superior performance in unseen 3D medical image segmentation tasks without the need for fine-tuning. Specifically, to segment a 3D query volume using a limited number of support image-label pairs that define a new segmentation task, we propose reverse propagation strategy as a query information selection mechanism. Instead of simply maintaining a first-in-first-out (FIFO) queue of memories to predict query slices sequentially, reverse propagation selects high-quality query information by leveraging support images to evaluate the quality of each predicted query slice mask. The selected high-quality masks are then used as prompts to propagate across the entire query volume, thereby enhancing generalization to unseen tasks. Notably, we are the first to explore the potential of SAM2 in label-efficient medical image segmentation without fine-tuning. Compared to fine-tuning on large labeled datasets, the label-efficient scenario provides a cost-effective alternative for medical segmentation tasks, particularly for rare diseases or when dealing with unseen classes. Experiments on four public datasets demonstrate the superiority of RevSAM2 in scenarios with limited labels, surpassing state-of-the-arts by 12.18% in Dice. The code will be released.
Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-fine Framework and Its Adversarial Examples
by
Li, Yingwei
,
Zhou, Yuyin
,
Yuille, Alan L
in
Artificial neural networks
,
Datasets
,
Image segmentation
2020
Although deep neural networks have been a dominant method for many 2D vision tasks, it is still challenging to apply them to 3D tasks, such as medical image segmentation, due to the limited amount of annotated 3D data and limited computational resources. In this chapter, by rethinking the strategy to apply 3D Convolutional Neural Networks to segment medical images, we propose a novel 3D-based coarse-to-fine framework to efficiently tackle these challenges. The proposed 3D-based framework outperforms their 2D counterparts by a large margin since it can leverage the rich spatial information along all three axes. We further analyze the threat of adversarial attacks on the proposed framework and show how to defense against the attack. We conduct experiments on three datasets, the NIH pancreas dataset, the JHMI pancreas dataset and the JHMI pathological cyst dataset, where the first two and the last one contain healthy and pathological pancreases respectively, and achieve the current state-of-the-art in terms of Dice-Sorensen Coefficient (DSC) on all of them. Especially, on the NIH pancreas segmentation dataset, we outperform the previous best by an average of over \\(2\\%\\), and the worst case is improved by \\(7\\%\\) to reach almost \\(70\\%\\), which indicates the reliability of our framework in clinical applications.
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios
2024
Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports. Compared with the 2D counterpart, 3D VLP is required to effectively capture essential semantics from significantly sparser representation in 3D imaging. In this paper, we introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning, aligning grounded visual features with precise diagnostic text. Additionally, we developed an abnormality dictionary to augment contrastive learning with diverse contrastive pairs. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages. The performance of CT-GLIP is validated on a separate test set of 1,130 patients, focusing on the 16 most frequent abnormalities across 7 organs. The experimental results show our model's superior performance over the standard CLIP framework across zero-shot and fine-tuning scenarios, using both CNN and ViT architectures.
End-to-End Adversarial Shape Learning for Abdomen Organ Deep Segmentation
2019
Automatic segmentation of abdomen organs using medical imaging has many potential applications in clinical workflows. Recently, the state-of-the-art performance for organ segmentation has been achieved by deep learning models, i.e., convolutional neural network (CNN). However, it is challenging to train the conventional CNN-based segmentation models that aware of the shape and topology of organs. In this work, we tackle this problem by introducing a novel end-to-end shape learning architecture -- organ point-network. It takes deep learning features as inputs and generates organ shape representations as points that located on organ surface. We later present a novel adversarial shape learning objective function to optimize the point-network to capture shape information better. We train the point-network together with a CNN-based segmentation model in a multi-task fashion so that the shared network parameters can benefit from both shape learning and segmentation tasks. We demonstrate our method with three challenging abdomen organs including liver, spleen, and pancreas. The point-network generates surface points with fine-grained details and it is found critical for improving organ segmentation. Consequently, the deep segmentation model is improved by the introduced shape learning as significantly better Dice scores are observed for spleen and pancreas segmentation.
Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation
2020
Although having achieved great success in medical image segmentation, deep learning-based approaches usually require large amounts of well-annotated data, which can be extremely expensive in the field of medical image analysis. Unlabeled data, on the other hand, is much easier to acquire. Semi-supervised learning and unsupervised domain adaptation both take the advantage of unlabeled data, and they are closely related to each other. In this paper, we propose uncertainty-aware multi-view co-training (UMCT), a unified framework that addresses these two tasks for volumetric medical image segmentation. Our framework is capable of efficiently utilizing unlabeled data for better performance. We firstly rotate and permute the 3D volumes into multiple views and train a 3D deep network on each view. We then apply co-training by enforcing multi-view consistency on unlabeled data, where an uncertainty estimation of each view is utilized to achieve accurate labeling. Experiments on the NIH pancreas segmentation dataset and a multi-organ segmentation dataset show state-of-the-art performance of the proposed framework on semi-supervised medical image segmentation. Under unsupervised domain adaptation settings, we validate the effectiveness of this work by adapting our multi-organ segmentation model to two pathological organs from the Medical Segmentation Decathlon Datasets. Additionally, we show that our UMCT-DA model can even effectively handle the challenging situation where labeled source data is inaccessible, demonstrating strong potentials for real-world applications.
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
2026
Computed Tomography (CT) is one of the most widely used and diagnostically information-dense imaging modalities, covering critical organs such as the heart, lungs, liver, and colon. Clinical interpretation relies on both slice-driven local features (e.g., sub-centimeter nodules, lesion boundaries) and volume-driven spatial representations (e.g., tumor infiltration, inter-organ anatomical relations). However, existing Large Vision-Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding: slice-driven LVLMs show strong generalization but lack cross-slice spatial consistency, while volume-driven LVLMs explicitly capture volumetric semantics but suffer from coarse granularity and poor compatibility with slice inputs. The absence of a unified modeling paradigm constitutes a major bottleneck for the clinical translation of medical LVLMs. We present OmniCT, a powerful unified slice-volume LVLM for CT scenarios, which makes three contributions: (i) Spatial Consistency Enhancement (SCE): volumetric slice composition combined with tri-axial positional embedding that introduces volumetric consistency, and an MoE hybrid projection enables efficient slice-volume adaptation; (ii) Organ-level Semantic Enhancement (OSE): segmentation and ROI localization explicitly align anatomical regions, emphasizing lesion- and organ-level semantics; (iii) MedEval-CT: the largest slice-volume CT dataset and hybrid benchmark integrates comprehensive metrics for unified evaluation. OmniCT consistently outperforms existing methods with a substantial margin across diverse clinical tasks and satisfies both micro-level detail sensitivity and macro-level spatial reasoning. More importantly, it establishes a new paradigm for cross-modal medical imaging understanding. Our project is available at https://github.com/ZJU4HealthCare/OmniCT.