Catalogue Search | MBRL

A comparison of automatic cell identification methods for single-cell RNA sequencing data

by Cats, Davy , Abdelaal, Tamim , Mei, Hailiang in Animal Genetics and Genomics , Annotations , Benchmark

2019

Background Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification. Results Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods’ sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments. Conclusions We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( https://github.com/tabdelaal/scRNAseq_Benchmark ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.

Journal Article

Share this book

Add to My Shelf

Consequences and opportunities arising due to sparser single-cell RNA-seq datasets

by Reinders, Marcel J. T. , Mahfouz, Ahmed , Bouland, Gerard A. in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2023

With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.

Journal Article

Share this book

Add to My Shelf

Benchmarking variational AutoEncoders on cancer transcriptomics data

by Reinders, Marcel J. T. , Eltager, Mostafa , Mahfouz, Ahmed in Analysis , Benchmarks , Biology and Life Sciences

2023

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β -TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation ( ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement.

Journal Article

Share this book

Add to My Shelf

Hierarchical progressive learning of cell identities in single-cell data

by Reinders, Marcel J. T. , Michielsen, Lieke , Mahfouz, Ahmed in 631/114/1305 , 631/114/1314 , 631/208/199

2021

Supervised methods are increasingly used to identify cell populations in single-cell data. Yet, current methods are limited in their ability to learn from multiple datasets simultaneously, are hampered by the annotation of datasets at different resolutions, and do not preserve annotations when retrained on new datasets. The latter point is especially important as researchers cannot rely on downstream analysis performed using earlier versions of the dataset. Here, we present scHPL, a hierarchical progressive learning method which allows continuous learning from single-cell data by leveraging the different resolutions of annotations across multiple datasets to learn and continuously update a classification tree. We evaluate the classification and tree learning performance using simulated as well as real datasets and show that scHPL can successfully learn known cellular hierarchies from multiple datasets while preserving the original annotations. scHPL is available at https://github.com/lcmmichielsen/scHPL . Classification methods for scRNA-seq data are limited in their ability to learn from multiple datasets simultaneously. Here the authors present scHPL, a hierarchical progressive learning method that automatically finds relationships between cell populations across multiple datasets and constructs a classification tree.

Journal Article

Share this book

Add to My Shelf

Decoding exon inclusion in the human brain reveals more divergent splicing mechanisms in neurons than glia

by Belchikov, Natan , Joglekar, Anoushka , Reinders, Marcel J. T. in Alternative Splicing , Animal Genetics and Genomics , Binding Sites

2026

Background Alternative splicing contributes to molecular diversity across brain cell types. RNA-binding proteins (RBPs) regulate splicing, but the genome-wide mechanisms underlying cell-type-specific splicing remain poorly understood. Results Here, we want to unravel cell-type-specific splicing mechanisms by using RBP binding sites and/or the genomic sequence to predict exon inclusion in neurons and glia as measured by long-read single-cell data in the human hippocampus and frontal cortex. We found that exon inclusion of variable exons is harder to predict in neurons compared to glia in both brain regions. Comparing neurons and glia, the position of RBP binding sites in alternatively spliced exons in neurons differ more from non-variable exons indicating distinct splicing mechanisms. Model interpretation pinpointed RBPs, including QKI, potentially regulating alternative splicing between neurons and glia. Finally, we accurately predict and prioritize the effect of splicing QTLs. Conclusions Our results indicate that the splicing mechanisms in variable exons in neurons diverged more from the standard mechanisms. Splicing in neurons might be less sequence-dependent and influenced more by, for instance, chromatin accessibility or methylation. Taken together, these results highlight new insights into the mechanisms regulating cell-type-specific alternative splicing in the brain.

Journal Article

Share this book

Add to My Shelf

algorithm-based topographical biomaterials library to instruct cell fate

by Uetz, Marc , Stamatialis, Dimitrios , Unadkat, Hemant V in Algorithms , Biocompatible Materials , Biological Sciences

2011

It is increasingly recognized that material surface topography is able to evoke specific cellular responses, endowing materials with instructive properties that were formerly reserved for growth factors. This opens the window to improve upon, in a cost-effective manner, biological performance of any surface used in the human body. Unfortunately, the interplay between surface topographies and cell behavior is complex and still incompletely understood. Rational approaches to search for bioactive surfaces will therefore omit previously unperceived interactions. Hence, in the present study, we use mathematical algorithms to design nonbiased, random surface features and produce chips of poly(lactic acid) with 2,176 different topographies. With human mesenchymal stromal cells (hMSCs) grown on the chips and using high-content imaging, we reveal unique, formerly unknown, surface topographies that are able to induce MSC proliferation or osteogenic differentiation. Moreover, we correlate parameters of the mathematical algorithms to cellular responses, which yield novel design criteria for these particular parameters. In conclusion, we demonstrate that randomized libraries of surface topographies can be broadly applied to unravel the interplay between cells and surface topography and to find improved material surfaces.

Journal Article

Share this book

Add to My Shelf

2D Representation of Transcriptomes by t-SNE Exposes Relatedness between Human Tissues

by Reinders, Marcel J. T. , Taskesen, Erdogan in Algorithms , Bioinformatics , Biology and Life Sciences

2016

The GTEx Consortium reported that hierarchical clustering of RNA profiles from 25 unique tissue types among 1641 individuals accurately distinguished the tissue types, but a multidimensional scaling failed to generate a 2D projection of the data that separates tissue-subtypes. In this study we show that a projection by t-Distributed Stochastic Neighbor Embedding is in line with the cluster analysis which allows a more detailed examination and visualization of human tissue relationships.

Journal Article

Share this book

Add to My Shelf

Mapping AML heterogeneity - multi-cohort transcriptomic analysis identifies novel clusters and divergent ex-vivo drug responses

by van den Berg, Redmar R. , van der Reijden, Bert A. , Sánchez-López, Elena in 631/67/1990/283/1897 , 631/67/69 , 692/699/67/1059/602

2024

Subtyping of acute myeloid leukaemia (AML) is predominantly based on recurrent genetic abnormalities, but recent literature indicates that transcriptomic phenotyping holds immense potential to further refine AML classification. Here we integrated five AML transcriptomic datasets with corresponding genetic information to provide an overview ( n = 1224) of the transcriptomic AML landscape. Consensus clustering identified 17 robust patient clusters which improved identification of CEBPA -mutated patients with favourable outcomes, and uncovered transcriptomic subtypes for KMT2A rearrangements (2), NPM1 mutations (5), and AML with myelodysplasia-related changes (AML-MRC) (5). Transcriptomic subtypes of KMT2A , NPM1 and AML-MRC showed distinct mutational profiles, cell type differentiation arrests and immune properties, suggesting differences in underlying disease biology. Moreover, our transcriptomic clusters show differences in ex-vivo drug responses, even when corrected for differentiation arrest and superiorly capture differences in drug response compared to genetic classification. In conclusion, our findings underscore the importance of transcriptomics in AML subtyping and offer a basis for future research and personalised treatment strategies. Our transcriptomic compendium is publicly available and we supply an R package to project clusters to new transcriptomic studies.

Journal Article

Share this book

Add to My Shelf

The transcriptional regulator c2h2 accelerates mushroom formation in Agaricus bisporus

by Vos, Aurin M. , Baars, Johan J. P. , Lugones, Luis G. in Agaricus , Agaricus - genetics , Agaricus - growth & development

2016

The Cys2His2 zinc finger protein gene c2h2 of Schizophyllum commune is involved in mushroom formation . Its inactivation results in a strain that is arrested at the stage of aggregate formation. In this study, the c2h2 orthologue of Agaricus bisporus was over-expressed in this white button mushroom forming basidiomycete using Agrobacterium -mediated transformation. Morphology, cap expansion rate, and total number and biomass of mushrooms were not affected by over-expression of c2h2 . However, yield per day of the c2h2 over-expression strains peaked 1 day earlier. These data and expression analysis indicate that C2H2 impacts timing of mushroom formation at an early stage of development, making its encoding gene a target for breeding of commercial mushroom strains.

Journal Article

Share this book

Add to My Shelf

All-atom protein sequence design using discrete diffusion models

by Villegas-Morcillo, Amelia , Reinders, Marcel J. T. , Admiraal, Gijs J. in All-atom representation , Amino acid composition , Amino acid sequence

2025

Advancing protein design is crucial for breakthroughs in medicine and biotechnology. Traditional approaches for protein sequence representation often rely solely on the 20 canonical amino acids, limiting the representation of non-canonical amino acids and residues that undergo post-translational modifications. This work explores discrete diffusion models for generating novel protein sequences using the all-atom chemical representation SELFIES. By encoding the atomic composition of each amino acid in the protein, this approach expands the design possibilities beyond standard sequence representations. Using a modified ByteNet architecture within the discrete diffusion D3PM framework, we evaluate the impact of this all-atom representation on protein quality, diversity, and novelty, compared to conventional amino acid-based models. To this end, we develop a comprehensive assessment pipeline to determine whether generated SELFIES sequences translate into valid proteins containing both canonical and non-canonical amino acids. Additionally, we examine the influence of two noise schedules within the diffusion process—uniform (random replacement of tokens) and absorbing (progressive masking)—on generation performance. While models trained on the all-atom representation struggle to consistently generate fully valid proteins, the successfully generated proteins show improved novelty and diversity compared to their amino acid-based model counterparts. Furthermore, the all-atom representation achieves structural foldability results comparable to those of amino acid-based models. Lastly, our results highlight the absorbing noise schedule as the most effective for both representations. Data and code are available at https://github.com/Intelligent-molecular-systems/All-Atom-Protein-Sequence-Generation . Scientific Contribution This work introduces a discrete diffusion-based framework for protein sequence generation using an all-atom representation, laying the groundwork for extending to non-canonical amino acids and post-translational modifications. Additionally, it provides a comprehensive evaluation pipeline to assess the validity of generated proteins, demonstrating how noise schedules within the diffusion process impact sequence novelty, diversity, and structural foldability.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter