Catalogue Search | MBRL

scPRINT: pre-training on 50 million cells allows robust gene network predictions

by Cantini, Laura , Kalfon, Jérémie , Samaran, Jules in 45/91 , 631/114 , 631/114/1305

2025

A cell is governed by the interaction of myriads of macromolecules. Inferring such a network of interactions has remained an elusive milestone in cellular biology. Building on recent advances in large foundation models and their ability to learn without supervision, we present scPRINT, a large cell model for the inference of gene networks pre-trained on more than 50 million cells from the cellxgene database. Using innovative pretraining tasks and model architecture, scPRINT pushes large transformer models towards more interpretability and usability when uncovering the complex biology of the cell. Based on our atlas-level benchmarks, scPRINT demonstrates superior performance in gene network inference to the state of the art, as well as competitive zero-shot abilities in denoising, batch effect correction, and cell label prediction. On an atlas of benign prostatic hyperplasia, scPRINT highlights the profound connections between ion exchange, senescence, and chronic inflammation. Authors present a state-of-the-art cell foundation model trained on 50 million cells. They show that the model generates a meaningful gene network and has zero-shot ability in various tasks.

Journal Article

Share this book

Add to My Shelf

Repeat expansions confer WRN dependence in microsatellite-unstable cancers

by Meltzer, Paul S. , Kalfon, Jeremie , Walker, Robert L. in 13/1 , 13/106 , 13/109

2020

The RecQ DNA helicase WRN is a synthetic lethal target for cancer cells with microsatellite instability (MSI), a form of genetic hypermutability that arises from impaired mismatch repair 1 – 4 . Depletion of WRN induces widespread DNA double-strand breaks in MSI cells, leading to cell cycle arrest and/or apoptosis. However, the mechanism by which WRN protects MSI-associated cancers from double-strand breaks remains unclear. Here we show that TA-dinucleotide repeats are highly unstable in MSI cells and undergo large-scale expansions, distinct from previously described insertion or deletion mutations of a few nucleotides 5 . Expanded TA repeats form non-B DNA secondary structures that stall replication forks, activate the ATR checkpoint kinase, and require unwinding by the WRN helicase. In the absence of WRN, the expanded TA-dinucleotide repeats are susceptible to cleavage by the MUS81 nuclease, leading to massive chromosome shattering. These findings identify a distinct biomarker that underlies the synthetic lethal dependence on WRN, and support the development of therapeutic agents that target WRN for MSI-associated cancers. In cells with microsatellite instability, expanded TA-dinucleotide repeats form cruciform structures that stall replication forks and cause chromosome shattering in the absence of the WRN helicase.

Journal Article

Share this book

Add to My Shelf

Mapping the landscape of genetic dependencies in chordoma

by Root, David E. , Kazachkova, Mariya , Levy, Joan in 13/51 , 38/1 , 38/22

2023

Identifying the spectrum of genes required for cancer cell survival can reveal essential cancer circuitry and therapeutic targets, but such a map remains incomplete for many cancer types. We apply genome-scale CRISPR-Cas9 loss-of-function screens to map the landscape of selectively essential genes in chordoma, a bone cancer with few validated targets. This approach confirms a known chordoma dependency, TBXT ( T ; brachyury), and identifies a range of additional dependencies, including PTPN11, ADAR, PRKRA, LUC7L2, SRRM2 , SLC2A1, SLC7A5, FANCM , and THAP1. CDK6, SOX9, and EGFR , genes previously implicated in chordoma biology, are also recovered. We find genomic and transcriptomic features that predict specific dependencies, including interferon-stimulated gene expression, which correlates with ADAR dependence and is elevated in chordoma. Validating the therapeutic relevance of dependencies, small-molecule inhibitors of SHP2, encoded by PTPN11 , have potent preclinical efficacy against chordoma. Our results generate an emerging map of chordoma dependencies to enable biological and therapeutic hypotheses. Cancer cells possess unique molecular features that can confer an increased dependence on specific genes. Here, the authors use CRISPR-Cas9 screens to identify selectively essential genes and therapeutic targets in chordoma.

Journal Article

Share this book

Add to My Shelf

CaImAn an open source tool for scalable calcium imaging data analysis

by Giovannucci, Andrea , Gunn, Pat , Tank, David W in Algorithms , Animals , Batch processing

2019

Advances in fluorescence microscopy enable monitoring larger brain areas in-vivo with finer time resolution. The resulting data rates require reproducible analysis pipelines that are reliable, fully automated, and scalable to datasets generated over the course of months. We present CaImAn, an open-source library for calcium imaging data analysis. CaImAn provides automatic and scalable methods to address problems common to pre-processing, including motion correction, neural activity identification, and registration across different sessions of data collection. It does this while requiring minimal user intervention, with good scalability on computers ranging from laptops to high-performance computing clusters. CaImAn is suitable for two-photon and one-photon imaging, and also enables real-time analysis on streaming data. To benchmark the performance of CaImAn we collected and combined a corpus of manual annotations from multiple labelers on nine mouse two-photon datasets. We demonstrate that CaImAn achieves near-human performance in detecting locations of active neurons. The human brain contains billions of cells called neurons that rapidly carry information from one part of the brain to another. Progress in medical research and healthcare is hindered by the difficulty in understanding precisely which neurons are active at any given time. New brain imaging techniques and genetic tools allow researchers to track the activity of thousands of neurons in living animals over many months. However, these experiments produce large volumes of data that researchers currently have to analyze manually, which can take a long time and generate irreproducible results. There is a need to develop new computational tools to analyze such data. The new tools should be able to operate on standard computers rather than just specialist equipment as this would limit the use of the solutions to particularly well-funded research teams. Ideally, the tools should also be able to operate in real-time as several experimental and therapeutic scenarios, like the control of robotic limbs, require this. To address this need, Giovannucci et al. developed a new software package called CaImAn to analyze brain images on a large scale. Firstly, the team developed algorithms that are suitable to analyze large sets of data on laptops and other standard computing equipment. These algorithms were then adapted to operate online in real-time. To test how well the new software performs against manual analysis by human researchers, Giovannucci et al. asked several trained human annotators to identify active neurons that were round or donut-shaped in several sets of imaging data from mouse brains. Each set of data was independently analyzed by three or four researchers who then discussed any neurons they disagreed on to generate a ‘consensus annotation’. Giovannucci et al. then used CaImAn to analyze the same sets of data and compared the results to the consensus annotations. This demonstrated that CaImAn is nearly as good as human researchers at identifying active neurons in brain images. CaImAn provides a quicker method to analyze large sets of brain imaging data and is currently used by over a hundred laboratories across the world. The software is open source, meaning that it is freely-available and that users are encouraged to customize it and collaborate with other users to develop it further.

Journal Article

Share this book

Add to My Shelf

Germline variation contributes to false negatives in CRISPR-based experiments with varying burden across ancestries

by Boyle, Isabella , Huang, Katherine , McFarland, James M. in 45/23 , 45/43 , 45/47

2024

Reducing disparities is vital for equitable access to precision treatments in cancer. Socioenvironmental factors are a major driver of disparities, but differences in genetic variation likely also contribute. The impact of genetic ancestry on prioritization of cancer targets in drug discovery pipelines has not been systematically explored due to the absence of pre-clinical data at the appropriate scale. Here, we analyze data from 611 genome-scale CRISPR/Cas9 viability experiments in human cell line models to identify ancestry-associated genetic dependencies essential for cell survival. Surprisingly, we find that most putative associations between ancestry and dependency arise from artifacts related to germline variants. Our analysis suggests that for 1.2-2.5% of guides, germline variants in sgRNA targeting sequences reduce cutting by the CRISPR/Cas9 nuclease, disproportionately affecting cell models derived from individuals of recent African descent. We propose three approaches to mitigate this experimental bias, enabling the scientific community to address these disparities. The role of ancestry in target discovery remains to be systematically explored. Here, the authors analyse data from 611 genome scale CRISPR/Cas9 viability experiments in human cell line models as part of The Cancer Dependency Map and identify ancestry-associated genetic dependencies.

Journal Article

Share this book

Add to My Shelf

scPRINT-2: Towards the next-generation of cell foundation models and benchmarks

by Cantini, Laura , Kalfon, Jeremie , Peyre, Gabriel in Benchmarks , Cell culture , Embedding

2026

Cell biology has been booming with foundation models trained on large single-cell RNA-seq databases, but benchmarks and capabilities remain unclear. We propose an additive benchmark across a gymnasium of tasks to discover which features improve performance. From these findings, we present scPRINT-2, a single-cell Foundation Model pre-trained across 350 million cells and 16 organisms. Our contributions in pre-training tasks, tokenization, and losses made scPRINT-2 state-of-the-art in expression denoising, cell embedding, and cell type prediction. Furthermore, with our cell-level architecture, scPRINT-2 becomes generative, as demonstrated by our expression imputation and counterfactual reasoning results. Finally, thanks to our pre-training database, we uncover generalization to unseen modalities and organisms. These studies, together with improved abilities in gene embeddings and gene network inference, place scPRINT-2 as a next-generation cell foundation model.Competing Interest StatementThe authors have declared no competing interest.Footnotes* update on conflict of interest missing information and funding related to conmputational resources used

Paper

Share this book

Add to My Shelf

Transcriptional plasticity drives leukemia immune escape

by Koren, Jost Vrabic , Dharia, Neekesh V , Harada, Taku in Acute myeloid leukemia , Bone marrow transplantation , CIITA protein

2021

Relapse of acute myeloid leukemia (AML) after allogeneic bone marrow transplantation (alloSCT) has been linked to immune evasion due to reduced expression of major histocompatibility complex class II (MHC-II) proteins through unknown mechanisms. We developed CORENODE, a computational algorithm for genome-wide transcription network decomposition, that identified the transcription factors (TFs) IRF8 and MEF2C as positive regulators and MYB and MEIS1 as negative regulators of MHC-II expression in AML cells. We show that reduced MHC-II expression at relapse is transcriptionally driven by combinatorial changes in the levels of these TFs, acting both independently and through the MHC-II coactivator CIITA. Beyond the MHC-II genes, MYB and IRF8 antagonistically regulate a broad genetic program responsible for cytokine signaling and T-cell stimulation that displays reduced expression at relapse. A small number of cells with altered TF levels and silenced MHC-II expression are present at the time of initial leukemia diagnosis, likely contributing to eventual relapse. Our findings reveal an adaptive transcriptional mechanism of AML evolution after allogenic transplantation whereby combinatorial fluctuations of TF levels under immune pressure result in selection of cells with a silenced T-cell stimulation program. Competing Interest Statement KS has consulted for KronosBio and Auron Therapeutics, has stock options with Auron Therapeutics, and received grant funding from Novartis on topics unrelated to this manuscript. NVD is a current employee of Genentech, Inc., a member of the Roche Group. KE has consulted for Third Rock Ventures on topics unrelated to this manuscript. The other authors have no competing interests to report.

Paper

Share this book

Add to My Shelf

Towards foundation models that learn across biological scales

by Cantini, Laura , Kalfon, Jeremie , Peyre, Gabriel in Bioinformatics

2025

We have reached a point where many bio foundation models exist across 4 different scales, from molecules to molecular chains, cells, and tissues. However, while related in many ways, these models do not yet bridge these scales. We present a framework and architecture called Xpressor that enables cross-scale learning by (1) using a novel cross-attention mechanism to compress high-dimensional gene representations into lower-dimensional cell-state vectors, and (2) implementing a multi-scale fine-tuning approach that allows cell models to leverage and adapt protein-level representations. Using a cell Foundation Model as an example, we demonstrate that our architecture improves model performance across multiple tasks, including cell-type prediction (+12%) and embedding quality (+8%). Together, these advances represent first steps toward models that can understand and bridge different scales of biological organization.

Paper

Share this book

Add to My Shelf

A distinct core regulatory module enforces oncogene expression in KMT2A-rearranged leukemia

by Zhu, Qian , Herbert, Zachary T , Harada, Taku in Acute myeloid leukemia , Biotechnology , Chromatin

2021

A small set of lineage-restricted transcription factors (TFs), termed core regulatory circuitry (CRC), control cell identity and malignant transformation. Here, we integrated gene dependency, chromatin architecture and TF perturbation datasets to characterize 31 core TFs in acute myeloid leukemia (AML). Contrary to a widely accepted model, we detected a modular CRC structure with hierarchically organized, partially redundant and only sparsely interconnected modules of core TFs controlling distinct genetic programs. Rapid TF degradation followed by measurement of genome-wide transcription rates revealed that core TFs directly regulate dramatically fewer genes than previously assumed. Leukemias carrying KMT2A (MLL) rearrangements depend on the IRF8/MEF2 axis to directly enforce expression of the key oncogenes MYC, HOXA9 and BCL2. Our datasets provide an evolving model of CRC organization in human cells, and a resource for further inquiries into and therapeutic targeting of aberrant transcriptional circuits in cancer. Competing Interest Statement K. Stegmaier has funding from Novartis Institute of Biomedical Research, consults for and has stock options in Auron Therapeutics, and has consulted for Kronos Bio and AstraZeneca on topics unrelated to this work. N.V. Dharia is a current employee of Genentech, Inc., a member of the Roche Group. J. Xavier Ferrucio is a current employee of Vor Biopharma. C.Y. Lin is a current employee of Kronos Bio. B. Nabet is an inventor on patent applications related to the dTAG system (WO/2017/024318, WO/2017/024319, WO/2018/148440, WO/2018/148443 and WO/2020/146250). K. Eagle has consulted for Third Rock Ventures and Flare Therapeutics on topics unrelated to this manuscript. All other authors declare no potential conflict of interest.

Paper

Share this book

Add to My Shelf

CaImAn: An open source tool for scalable Calcium Imaging data Analysis

by Giovannucci, Andrea , Gunn, Pat , Kalfon, Jeremie in Calcium imaging , Computers , Data analysis

2018

Advances in fluorescence microscopy enable monitoring larger brain areas in-vivo with finer time resolution. The resulting data rates require reproducible analysis pipelines that are reliable, fully automated, and scalable to datasets generated over the course of months. Here we present CaImAn, an open-source library for calcium imaging data analysis. CaImAn provides automatic and scalable methods to address problems common to pre-processing, including motion correction, neural activity identification, and registration across different sessions of data collection. It does this while requiring minimal user intervention, with good performance on computers ranging from laptops to high-performance computing clusters. CaImAn is suitable for two-photon and one-photon imaging, and also enables real-time analysis on streaming data. To benchmark the performance of CaImAn, we collected a corpus of ground truth annotations from multiple labelers on nine mouse two-photon datasets. We demonstrate that CaImAn achieves near-human performance in detecting locations of active neurons.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter