Catalogue Search | MBRL

Single sample scoring of molecular phenotypes

by Horan, Kristy , Foroutan, Momeneh , Lyu, Ruqian in Algorithms , Bias , Bioinformatics

2018

Background Gene set scoring provides a useful approach for quantifying concordance between sample transcriptomes and selected molecular signatures. Most methods use information from all samples to score an individual sample, leading to unstable scores in small data sets and introducing biases from sample composition (e.g. varying numbers of samples for different cancer subtypes). To address these issues, we have developed a truly single sample scoring method, and associated R/Bioconductor package singscore ( https://bioconductor.org/packages/singscore ). Results We use multiple cancer data sets to compare singscore against widely-used methods, including GSVA, z -score, PLAGE, and ssGSEA. Our approach does not depend upon background samples and scores are thus stable regardless of the composition and number of samples being scored. In contrast, scores obtained by GSVA, z -score, PLAGE and ssGSEA can be unstable when less data are available ( N S < 25). The singscore method performs as well as the best performing methods in terms of power, recall, false positive rate and computational time, and provides consistently high and balanced performance across all these criteria. To enhance the impact and utility of our method, we have also included a set of functions implementing visual analysis and diagnostics to support the exploration of molecular phenotypes in single samples and across populations of data. Conclusions The singscore method described here functions independent of sample composition in gene expression data and thus it provides stable scores, which are particularly useful for small data sets or data integration. Singscore performs well across all performance criteria, and includes a suite of powerful visualization functions to assist in the interpretation of results. This method performs as well as or better than other scoring approaches in terms of its power to distinguish samples with distinct biology and its ability to call true differential gene sets between two conditions. These scores can be used for dimensional reduction of transcriptomic data and the phenotypic landscapes obtained by scoring samples against multiple molecular signatures may provide insights for sample stratification.

Journal Article

Share this book

Add to My Shelf

Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer

by Smyth, Gordon K. , Davis, Melissa J. , Cursons, Joseph in Animal Genetics and Genomics , Benchmarking , Benchmarking Studies

2019

Background Elucidation of regulatory networks, including identification of regulatory mechanisms specific to a given biological context, is a key aim in systems biology. This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation of the resulting networks has been hindered by the lack of known context-specific regulatory interactions. Results In this study, we develop a simulator based on dynamical systems modelling capable of simulating differential co-expression patterns. With the simulator and an evaluation framework, we benchmark and characterise the performance of inference methods. Defining three different levels of “true” networks for each simulation, we show that accurate inference of causation is difficult for all methods, compared to inference of associations. We show that a z -score-based method has the best general performance. Further, analysis of simulation parameters reveals five network and simulation properties that explained the performance of methods. The evaluation framework and inference methods used in this study are available in the dcanr R/Bioconductor package. Conclusions Our analysis of networks inferred from simulated data show that hub nodes are more likely to be differentially regulated targets than transcription factors. Based on this observation, we propose an interpretation of the inferred differential network that can reconstruct a putative causal network.

Journal Article

Share this book

Add to My Shelf

Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling

by Visvader, Jane E. , Gordon, Lavinia , Chen, Yunshun in 631/136/2060 , 631/337/2019 , 631/80

2017

The mammary epithelium comprises two primary cellular lineages, but the degree of heterogeneity within these compartments and their lineage relationships during development remain an open question. Here we report single-cell RNA profiling of mouse mammary epithelial cells spanning four developmental stages in the post-natal gland. Notably, the epithelium undergoes a large-scale shift in gene expression from a relatively homogeneous basal-like program in pre-puberty to distinct lineage-restricted programs in puberty. Interrogation of single-cell transcriptomes reveals different levels of diversity within the luminal and basal compartments, and identifies an early progenitor subset marked by CD55. Moreover, we uncover a luminal transit population and a rare mixed-lineage cluster amongst basal cells in the adult mammary gland. Together these findings point to a developmental hierarchy in which a basal-like gene expression program prevails in the early post-natal gland prior to the specification of distinct lineage signatures, and the presence of cellular intermediates that may serve as transit or lineage-primed cells. The mammary epithelium comprises two cell lineages but the heterogeneity amongst these during development is unclear. Here, the authors report single-cell RNA sequencing of the mouse mammary epithelium at four developmental stages, revealing diversity in both compartments and a transcriptional shift with puberty onset.

Journal Article

Share this book

Add to My Shelf

PRMT1-mediated H4R3me2a recruits SMARCA4 to promote colorectal cancer progression by enhancing EGFR signaling

by Zeng, Xiangwei , Pan, Hua-Feng , Zhao, Quan in Adenosine triphosphatase , Analysis , Animals

2021

Background Aberrant changes in epigenetic mechanisms such as histone modifications play an important role in cancer progression. PRMT1 which triggers asymmetric dimethylation of histone H4 on arginine 3 (H4R3me2a) is upregulated in human colorectal cancer (CRC) and is essential for cell proliferation. However, how this dysregulated modification might contribute to malignant transitions of CRC remains poorly understood. Methods In this study, we integrated biochemical assays including protein interaction studies and chromatin immunoprecipitation (ChIP), cellular analysis including cell viability, proliferation, colony formation, and migration assays, clinical sample analysis, microarray experiments, and ChIP-Seq data to investigate the potential genomic recognition pattern of H4R3me2s in CRC cells and its effect on CRC progression. Results We show that PRMT1 and SMARCA4, an ATPase subunit of the SWI/SNF chromatin remodeling complex, act cooperatively to promote colorectal cancer (CRC) progression. We find that SMARCA4 is a novel effector molecule of PRMT1-mediated H4R3me2a. Mechanistically, we show that H4R3me2a directly recruited SMARCA4 to promote the proliferative, colony-formative, and migratory abilities of CRC cells by enhancing EGFR signaling. We found that EGFR and TNS4 were major direct downstream transcriptional targets of PRMT1 and SMARCA4 in colon cells, and acted in a PRMT1 methyltransferase activity-dependent manner to promote CRC cell proliferation. In vivo, knockdown or inhibition of PRMT1 profoundly attenuated the growth of CRC cells in the C57BL/6 J-Apc Min/+ CRC mice model. Importantly, elevated expression of PRMT1 or SMARCA4 in CRC patients were positively correlated with expression of EGFR and TNS4, and CRC patients had shorter overall survival. These findings reveal a critical interplay between epigenetic and transcriptional control during CRC progression, suggesting that SMARCA4 is a novel key epigenetic modulator of CRC. Our findings thus highlight PRMT1/SMARCA4 inhibition as a potential therapeutic intervention strategy for CRC. Conclusion PRMT1-mediated H4R3me2a recruits SMARCA4, which promotes colorectal cancer progression by enhancing EGFR signaling.

Journal Article

Share this book

Add to My Shelf

SpaNorm: spatially-aware normalization for spatial transcriptomics data

by Yang, Pengyi , Yang, Jean Y. H. , Davis, Melissa J. in Advances in Spatial Transcriptomics for Understanding Development and Disease , Algorithms , Animal Genetics and Genomics

2025

Normalization of spatial transcriptomics data is challenging due to spatial association between region-specific library size and biology. We develop SpaNorm, the first spatially-aware normalization method that concurrently models library size effects and the underlying biology, segregates these effects, and thereby removes library size effects without removing biological information. Using 27 tissue samples from 6 datasets spanning 4 technological platforms, SpaNorm outperforms commonly used single-cell normalization approaches while retaining spatial domain information and detecting spatially variable genes. SpaNorm is versatile and works equally well for multicellular and subcellular spatial transcriptomics data with relatively robust performance under different segmentation methods.

Journal Article

Share this book

Add to My Shelf

vissE: a versatile tool to identify and visualise higher-order molecular phenotypes from functional enrichment analysis

by Liu, Ning , Papachristos, Nicholas , Whitfield, Holly J. in Adaptability , Algorithms , Bioinformatics

2024

Functional analysis of high throughput experiments using pathway analysis is now ubiquitous. Though powerful, these methods often produce thousands of redundant results owing to knowledgebase redundancies upstream. This scale of results hinders extensive exploration by biologists and can lead to investigator biases due to previous knowledge and expectations. To address this issue, we present vissE, a flexible network-based analysis and visualisation tool that organises information into semantic categories and provides various visualisation modules to characterise them with respect to the underlying data, thus providing a comprehensive view of the biological system. We demonstrate vissE’s versatility by applying it to three different technologies: bulk, single-cell and spatial transcriptomics. Applying vissE to a factor analysis of a breast cancer spatial transcriptomic data, we identified stromal phenotypes that support tumour dissemination. Its adaptability allows vissE to enhance all existing gene-set enrichment and pathway analysis workflows, empowering biologists during molecular discovery.

Journal Article

Share this book

Add to My Shelf

Library size confounds biology in spatial transcriptomics data

by Liu, Ning , Bhuva, Dharmesh D. , Chen, Jinjin in Advances in Spatial Transcriptomics for Understanding Development and Disease , Algorithms , Animal Genetics and Genomics

2024

Spatial molecular data has transformed the study of disease microenvironments, though, larger datasets pose an analytics challenge prompting the direct adoption of single-cell RNA-sequencing tools including normalization methods. Here, we demonstrate that library size is associated with tissue structure and that normalizing these effects out using commonly applied scRNA-seq normalization methods will negatively affect spatial domain identification. Spatial data should not be specifically corrected for library size prior to analysis, and algorithms designed for scRNA-seq data should be adopted with caution.

Journal Article

Share this book

Add to My Shelf

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

by Li, Yuan , Pires, Douglas E. V. , Elangovan, Aparna in Algorithms , Analysis , Annotations

2022

Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Journal Article

Share this book

Add to My Shelf

Dissection of the bone marrow microenvironment in hairy cell leukaemia identifies prognostic tumour and immune related biomarkers

by Koldej, Rachel M. , Ritchie, David S. , Ng, Ashley P. in 631/1647/48 , 631/1647/664/1257 , 692/308/53/2422

2021

Hairy cell leukaemia (HCL) is a rare CD20+ B cell malignancy characterised by rare “hairy” B cells and extensive bone marrow (BM) infiltration. Frontline treatment with the purine analogue cladribine (CDA) results in a highly variable response duration. We hypothesised that analysis of the BM tumour microenvironment would identify prognostic biomarkers of response to CDA. HCL BM immunology pre and post CDA treatment and healthy controls were analysed using Digital Spatial Profiling to assess the expression of 57 proteins using an immunology panel. A bioinformatics pipeline was developed to accommodate the more complex experimental design of a spatially resolved study. Treatment with CDA was associated with the reduction in expression of HCL tumour markers (CD20, CD11c) and increased expression of myeloid markers (CD14, CD68, CD66b, ARG1). Expression of HLA-DR, STING, CTLA4, VISTA, OX40L were dysregulated pre- and post-CDA. Duration of response to treatment was associated with greater reduction in tumour burden and infiltration by CD8 T cells into the BM post-CDA. This is the first study to provide a high multiplex analysis of HCL BM microenvironment demonstrating significant immune dysregulation and identify biomarkers of response to CDA. With validation in future studies, prospective application of these biomarkers could allow early identification and increased monitoring in patients at increased relapse risk post CDA.

Journal Article

Share this book

Add to My Shelf

Type 2 Innate Lymphoid Cells Protect against Colorectal Cancer Progression and Predict Improved Patient Survival

by Hansbro, Philip M. , Davis, Melissa J. , McKenzie, Andrew N. J. in Animal models , Bacterial infections , Colorectal cancer

2021

Chronic inflammation of the gastrointestinal (GI) tract contributes to colorectal cancer (CRC) progression. While the role of adaptive T cells in CRC is now well established, the role of innate immune cells, specifically innate lymphoid cells (ILCs), is not well understood. To define the role of ILCs in CRC we employed complementary heterotopic and chemically-induced CRC mouse models. We discovered that ILCs were abundant in CRC tumours and contributed to anti-tumour immunity. We focused on ILC2 and showed that ILC2-deficient mice developed a higher tumour burden compared with littermate wild-type controls. We generated an ILC2 gene signature and using machine learning models revealed that CRC patients with a high intratumor ILC2 gene signature had a favourable clinical prognosis. Collectively, our results highlight a critical role for ILC2 in CRC, suggesting a potential new avenue to improve clinical outcomes through ILC2-agonist based therapeutic approaches.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter