Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5
result(s) for
"Singi, Siddharth"
Sort by:
Interpretable multiple instance learning for hematologic diagnosis from peripheral blood smears
2026
Accurate diagnosis of hematologic malignancies from peripheral blood smears (PBSs) requires integrating cellular morphology and composition across numerous white blood cells. Existing computational approaches predominantly automate single-cell classifications and do not provide holistic, slide-level diagnostic predictions.
We present a framework that employs a high-performance cell-based encoder (DeepHeme) for feature extraction, integrated with our weakly supervised, attention-based multiple instance learning (MIL) model, termed CAREMIL (Cell AggRegation, Explainable, Multiple Instance Learning). Through comprehensive evaluations of leading image encoders and MIL architectures, the combination of DeepHeme and CAREMIL demonstrated superior performance on disease classification tasks. CAREMIL functions as a robust aggregation mechanism, consistently outperforming established slide-level MIL methods (gated MIL and Dual-stream MIL Network) across multiple encoder types. The most pronounced performance gains were observed with out-of-domain encoders, including ImageNet-pretrained and open-source pathology foundation models (UNI2 and Virchow2).
CAREMIL combined with DeepHeme achieves the highest diagnostic accuracy across acute myeloid leukemia (AML), myelodysplastic syndromes (MDS), and hairy cell leukemia (HCL), with AUROCs of 0.999, 0.891, and 0.945, respectively, and successfully identifies AML even in cases with minimal or absent circulating blasts. Attention values assigned by CAREMIL highlight diagnostically relevant cells and reveal disease-specific morphometric patterns, enabling biological interpretability and case-level insights. The framework remains resilient to individual cell misclassifications and does not require explicit cell-level supervision.
These findings establish CAREMIL as an effective and interpretable MIL framework for hematologic slide diagnosis, extendable to bone marrow aspirates, cytology, and other liquid biopsy specimens, supporting a shift toward quantitative, morphology-informed hematologic diagnostics.
Journal Article
GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit
2026
Computational biomarkers (CBs) are histopathology-derived patterns extracted from hematoxylin-eosin (H&E) whole-slide images (WSIs) using artificial intelligence (AI) to predict therapeutic response or prognosis. Recently, slide-level
(MIL) with pathology foundation models (PFMs) has become the standard baseline for CB development. While these methods, with architectural and optimization advances, have improved predictive performance, computational pathology lacks standardized intermediate data formats, provenance tracking, checkpointing conventions, and reproducible evaluation metrics required for clinical-grade deployment. Consequently, discipline-level standardization, including data representation, model versioning, evaluation protocols, and auditability, is essential to enable reliable, scalable, and regulatory-ready clinical translation of CBs.
We introduce
(www.artificialintelligencepathology.org), a standardized benchmarking framework built on a curated TCGA cohort with clinically anchored OncoKB level 1-3 biomarker labels. GOLDMARK distributes structured intermediate outputs, including tile coordinates, per-slide feature embeddings from canonical PFMs, embedding-level quality-control metadata, trained slide-level weights, and reference code. Multiple publicly available PFMs are benchmarked under a unified attention-based MIL head using predefined patient-level splits. Models are trained on TCGA and evaluated on an independent MSKCC cohort with reciprocal testing.
We evaluated 33 tumor-biomarker tasks; aggregate summaries over the 33 tasks with complete reciprocal metric coverage yielded mean AUROC of 0.689 (TCGA) and 0.630 (MSKCC). Restricting analysis to the eight highest-performing tasks yielded mean AUROCs of 0.831 and 0.801, respectively. These tasks correspond to established morphologic-genomic associations (e.g., LGG
, COAD MSI/
, THCA
/
, BLCA
, UCEC
) and showed the most stable cross-site performance. Differences between canonical encoders were modest relative to task-specific variability.
Computational pathology is entering a translational phase in which reproducibility, transparency, and cross-institutional robustness are prerequisites for clinical trust. GOLDMARK establishes a reference framework that separates dataset curation from model evaluation and introduces structured intermediate artifacts, quality-control metadata, and symmetric cross-dataset testing as core components of benchmarking. Such infrastructure is essential for transforming computational biomarkers from research demonstrations into reproducible, clinically trusted workflows.
Journal Article
GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit
by
Amir Momeni Boroujeni
,
Kumar, Neeraj
,
Singi, Siddharth
in
Artificial intelligence
,
Benchmarks
,
Biomarkers
2026
Computational biomarkers (CBs) are histopathology-derived patterns extracted from hematoxylin-eosin (H&E) whole-slide images (WSIs) using artificial intelligence (AI) to predict therapeutic response or prognosis. Recently, slide-level multiple-instance learning (MIL) with pathology foundation models (PFMs) has become the standard baseline for CB development. While these methods have improved predictive performance, computational pathology lacks standardized intermediate data formats, provenance tracking, checkpointing conventions, and reproducible evaluation metrics required for clinical-grade deployment. We introduce GOLDMARK (https://artificialintelligencepathology.org), a standardized benchmarking framework built on a curated TCGA cohort with clinically actionable OncoKB level 1-3 biomarker labels. GOLDMARK releases structured intermediate representations, including tile coordinate maps, per-slide feature embeddings from canonical PFMs, quality-control metadata, predefined patient-level splits, trained slide-level models, and evaluation outputs. Models are trained on TCGA and evaluated on an independent MSKCC cohort with reciprocal testing. Across 33 tumor-biomarker tasks, mean AUROC was 0.689 (TCGA) and 0.630 (MSKCC). Restricting to the eight highest-performing tasks yielded mean AUROCs of 0.831 and 0.801, respectively. These tasks correspond to established morphologic-genomic associations (e.g., LGG IDH1, COAD MSI/BRAF, THCA BRAF/NRAS, BLCA FGFR3, UCEC PTEN) and showed the most stable cross-site performance. Differences between canonical encoders were modest relative to task-specific variability. GOLDMARK establishes a shared experimental substrate for computational pathology, enabling reproducible benchmarking and direct comparison of methods across datasets and models.
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis
2025
Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs). However, adapting these pretrained PFMs for specific clinical tasks presents considerable challenges, primarily due to the availability of only weak (WSI-level) labels for gigapixel images, necessitating multiple instance learning (MIL) paradigm for effective WSI analysis. This paper proposes a novel approach for single-GPU Task Adaptation of PFMs (TAPFM) that uses vision transformer () attention for MIL aggregation while optimizing both for feature representations and attention weights. The proposed approach maintains separate computational graphs for MIL aggregator and the PFM to create stable training dynamics that align with downstream task objectives during end-to-end adaptation. Evaluated on mutation prediction tasks for bladder cancer and lung adenocarcinoma across institutional and TCGA cohorts, TAPFM consistently outperforms conventional approaches, with H-Optimus-0 (TAPFM) outperforming the benchmarks. TAPFM effectively handles multi-label classification of actionable mutations as well. Thus, TAPFM makes adaptation of powerful pre-trained PFMs practical on standard hardware for various clinical applications.
Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
by
He, Zhanpeng
,
Robinson Piramuthu
,
Singi, Siddharth
in
Confidence intervals
,
Decision making
,
Training
2023
In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed. However, knowing when to request such assistance is critical: too few requests can lead to the robot making mistakes, but too many requests can overload the expert. In this paper, we present a Reinforcement Learning based approach to this problem, where a semi-autonomous agent asks for external assistance when it has low confidence in the eventual success of the task. The confidence level is computed by estimating the variance of the return from the current state. We show that this estimate can be iteratively improved during training using a Bellman-like recursion. On discrete navigation problems with both fully- and partially-observable state information, we show that our method makes effective use of a limited budget of expert calls at run-time, despite having no access to the expert at training time.