Catalogue Search | MBRL

GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation

by Zhao, Hongyu , Li, Cong , Chung, Dongjun in Annotations , Biology and Life Sciences , Bladder cancer

2014

Results from Genome-Wide Association Studies (GWAS) have shown that complex diseases are often affected by many genetic variants with small or moderate effects. Identifications of these risk variants remain a very challenging problem. There is a need to develop more powerful statistical methods to leverage available information to improve upon traditional approaches that focus on a single GWAS dataset without incorporating additional data. In this paper, we propose a novel statistical approach, GPA (Genetic analysis incorporating Pleiotropy and Annotation), to increase statistical power to identify risk variants through joint analysis of multiple GWAS data sets and annotation information because: (1) accumulating evidence suggests that different complex diseases share common risk bases, i.e., pleiotropy; and (2) functionally annotated variants have been consistently demonstrated to be enriched among GWAS hits. GPA can integrate multiple GWAS datasets and functional annotations to seek association signals, and it can also perform hypothesis testing to test the presence of pleiotropy and enrichment of functional annotation. Statistical inference of the model parameters and SNP ranking is achieved through an EM algorithm that can handle genome-wide markers efficiently. When we applied GPA to jointly analyze five psychiatric disorders with annotation information, not only did GPA identify many weak signals missed by the traditional single phenotype analysis, but it also revealed relationships in the genetic architecture of these disorders. Using our hypothesis testing framework, statistically significant pleiotropic effects were detected among these psychiatric disorders, and the markers annotated in the central nervous system genes and eQTLs from the Genotype-Tissue Expression (GTEx) database were significantly enriched. We also applied GPA to a bladder cancer GWAS data set with the ENCODE DNase-seq data from 125 cell lines. GPA was able to detect cell lines that are biologically more relevant to bladder cancer. The R implementation of GPA is currently available at http://dongjunchung.github.io/GPA/.

Journal Article

Share this book

Add to My Shelf

LRT: Integrative analysis of scRNA-seq and scTCR-seq data to investigate clonal differentiation heterogeneity

by Jeon, Hyeongseon , Xie, Juan , Chung, Dongjun in Algorithms , Analysis , Bar codes

2023

Single-cell RNA sequencing (scRNA-seq) data has been widely used for cell trajectory inference, with the assumption that cells with similar expression profiles share the same differentiation state. However, the inferred trajectory may not reveal clonal differentiation heterogeneity among T cell clones. Single-cell T cell receptor sequencing (scTCR-seq) data provides invaluable insights into the clonal relationship among cells, yet it lacks functional characteristics. Therefore, scRNA-seq and scTCR-seq data complement each other in improving trajectory inference, where a reliable computational tool is still missing. We developed LRT, a computational framework for the integrative analysis of scTCR-seq and scRNA-seq data to explore clonal differentiation trajectory heterogeneity. Specifically, LRT uses the transcriptomics information from scRNA-seq data to construct overall cell trajectories and then utilizes both the TCR sequence information and phenotype information to identify clonotype clusters with distinct differentiation biasedness. LRT provides a comprehensive analysis workflow, including preprocessing, cell trajectory inference, clonotype clustering, trajectory biasedness evaluation, and clonotype cluster characterization. We illustrated its utility using scRNA-seq and scTCR-seq data of CD8 + T cells and CD4 + T cells with acute lymphocytic choriomeningitis virus infection. These analyses identified several clonotype clusters with distinct skewed distribution along the differentiation path, which cannot be revealed solely based on scRNA-seq data. Clones from different clonotype clusters exhibited diverse expansion capability, V-J gene usage pattern and CDR3 motifs. The LRT framework was implemented as an R package ‘LRT’, and it is now publicly accessible at https://github.com/JuanXie19/LRT . In addition, it provides two Shiny apps ‘shinyClone’ and ‘shinyClust’ that allow users to interactively explore distributions of clonotypes, conduct repertoire analysis, implement clustering of clonotypes, trajectory biasedness evaluation, and clonotype cluster characterization.

Journal Article

Share this book

Add to My Shelf

Genome-scale Analysis of Escherichia coli FNR Reveals Complex Features of Transcription Factor Binding

by Yan, Huihuang , Chung, Dongjun , Landick, Robert in Anaerobiosis - genetics , Binding Sites , Binding sites (Biochemistry)

2013

FNR is a well-studied global regulator of anaerobiosis, which is widely conserved across bacteria. Despite the importance of FNR and anaerobiosis in microbial lifestyles, the factors that influence its function on a genome-wide scale are poorly understood. Here, we report a functional genomic analysis of FNR action. We find that FNR occupancy at many target sites is strongly influenced by nucleoid-associated proteins (NAPs) that restrict access to many FNR binding sites. At a genome-wide level, only a subset of predicted FNR binding sites were bound under anaerobic fermentative conditions and many appeared to be masked by the NAPs H-NS, IHF and Fis. Similar assays in cells lacking H-NS and its paralog StpA showed increased FNR occupancy at sites bound by H-NS in WT strains, indicating that large regions of the genome are not readily accessible for FNR binding. Genome accessibility may also explain our finding that genome-wide FNR occupancy did not correlate with the match to consensus at binding sites, suggesting that significant variation in ChIP signal was attributable to cross-linking or immunoprecipitation efficiency rather than differences in binding affinities for FNR sites. Correlation of FNR ChIP-seq peaks with transcriptomic data showed that less than half of the FNR-regulated operons could be attributed to direct FNR binding. Conversely, FNR bound some promoters without regulating expression presumably requiring changes in activity of condition-specific transcription factors. Such combinatorial regulation may allow Escherichia coli to respond rapidly to environmental changes and confer an ecological advantage in the anaerobic but nutrient-fluctuating environment of the mammalian gut.

Journal Article

Share this book

Add to My Shelf

Investigating the effects of chronic low-dose radiation exposure in the liver of a hypothermic zebrafish model

by Williamson, Tucker , Hardiman, Gary , Cahill, Thomas in 631/45 , 631/45/500 , 631/553/1833

2023

Mankind’s quest for a manned mission to Mars is placing increased emphasis on the development of innovative radio-protective countermeasures for long-term space travel. Hibernation confers radio-protective effects in hibernating animals, and this has led to the investigation of synthetic torpor to mitigate the deleterious effects of chronic low-dose-rate radiation exposure. Here we describe an induced torpor model we developed using the zebrafish. We explored the effects of radiation exposure on this model with a focus on the liver. Transcriptomic and behavioural analyses were performed. Radiation exposure resulted in transcriptomic perturbations in lipid metabolism and absorption, wound healing, immune response, and fibrogenic pathways. Induced torpor reduced metabolism and increased pro-survival, anti-apoptotic, and DNA repair pathways. Coupled with radiation exposure, induced torpor led to a stress response but also revealed maintenance of DNA repair mechanisms, pro-survival and anti-apoptotic signals. To further characterise our model of induced torpor, the zebrafish model was compared with hepatic transcriptomic data from hibernating grizzly bears ( Ursus arctos horribilis ) and active controls revealing conserved responses in gene expression associated with anti-apoptotic processes, DNA damage repair, cell survival, proliferation, and antioxidant response. Similarly, the radiation group was compared with space-flown mice revealing shared changes in lipid metabolism.

Journal Article

Share this book

Add to My Shelf

multi-GPA-Tree: Statistical approach for pleiotropy informed and functional annotation tree guided prioritization of GWAS results

by Wolf, Bethany J. , Yilmaz, Ayse Selen , Chung, Dongjun in Annotations , Autoimmune diseases , B cells

2023

Genome-wide association studies (GWAS) have successfully identified over two hundred thousand genotype-trait associations. Yet some challenges remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), most with small or moderate effect sizes, making them difficult to detect. Second, many complex traits share a common genetic basis due to ‘pleiotropy’ and and though few methods consider it, leveraging pleiotropy can improve statistical power to detect genotype-trait associations with weaker effect sizes. Third, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with specific or multiple traits. We propose multi-GPA-Tree to address these challenges. The multi-GPA-Tree approach can identify risk SNPs associated with single as well as multiple traits while also identifying the combinations of functional annotations that can explain the mechanisms through which risk-associated SNPs are linked with the traits. First, we implemented simulation studies to evaluate the proposed multi-GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that multi-GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs for multiple traits. Second, we applied multi-GPA-Tree to a systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), and to a Crohn’s disease (CD) and ulcertive colitis (UC) GWAS, and functional annotation data including GenoSkyline and GenoSkylinePlus. Our results demonstrate that multi-GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.

Journal Article

Share this book

Add to My Shelf

Multi-omics analysis for identifying cell-type-specific and bulk-level druggable targets in Alzheimer’s disease

by Liu, Shiwei , Nho, Kwangsik , Huang, Yen-Ning in 1-Phosphatidylinositol 3-kinase , AKT protein , Alzheimer Disease - drug therapy

2025

Background Analyzing disease-linked genetic variants via expression quantitative trait loci (eQTLs) helps identify potential disease-causing genes. Previous research prioritized genes by integrating Genome-Wide Association Study (GWAS) results with tissue-level eQTLs. Recent studies have explored brain cell type-specific eQTLs, but a systematic analysis across multiple Alzheimer’s disease (AD) genome-wide association study (GWAS) datasets or comparisons between tissue-level and cell type-specific effects remain limited. Here, we integrated brain cell type-level and bulk-level eQTL datasets with AD GWAS datasets to identify potential causal genes. Methods We used Summary Data-Based Mendelian Randomization (SMR) and Bayesian Colocalization (COLOC) to integrate AD GWAS summary statistics with eQTLs datasets. Combining data from five AD GWAS, two single-cell eQTL datasets, and one bulk eQTL dataset, we identified novel candidate causal genes and further confirmed known ones. We investigated gene regulation through enhancer activity using H3K27ac and ATAC-seq data, performed protein–protein interaction (PPI) and pathway enrichment, and conducted a drug/compound enrichment analysis with Drug Signatures Database (DSigDB) to support drug repurposing for AD. Results We identified 28 candidate causal genes for AD, of which 12 were uniquely detected at the cell-type level, 9 were exclusive to the bulk level and 7 detected in both. Among the 19 cell-type level candidate causal genes, microglia contributed the highest number of candidate genes, followed by excitatory neurons, astrocytes, inhibitory neurons, oligodendrocytes, and oligodendrocyte precursor cells (OPCs). PABPC1 emerged as a novel candidate causal gene in astrocytes. We generated PPI networks for the candidate causal genes and found that pathways such as membrane organization, cell migration, and ERK1/2 and PI3K/AKT signaling were enriched. The AD-risk variant associated with candidate causal gene PABPC1 is located near or within enhancers only active in astrocytes. We classified the 28 genes into three drug tiers and identified druggable interactions, with imatinib mesylate emerging as a key candidate. A drug-target gene network was created to explore potential drug targets for AD. Conclusions We systematically prioritized AD candidate causal genes based on cell type-level and bulk level molecular evidence. The integrative approach enhances our understanding of molecular mechanisms of AD-related genetic variants and facilitates interpretation of AD GWAS results.

Journal Article

Share this book

Add to My Shelf

Statistical Power Analysis for Designing Bulk, Single-Cell, and Spatial Transcriptomics Experiments: Review, Tutorial, and Perspectives

by Jeon, Yeseul , Gupta, Arkobrato , Jeon, Hyeongseon in Analysis , Cancer , Care and treatment

2023

Gene expression profiling technologies have been used in various applications such as cancer biology. The development of gene expression profiling has expanded the scope of target discovery in transcriptomic studies, and each technology produces data with distinct characteristics. In order to guarantee biologically meaningful findings using transcriptomic experiments, it is important to consider various experimental factors in a systematic way through statistical power analysis. In this paper, we review and discuss the power analysis for three types of gene expression profiling technologies from a practical standpoint, including bulk RNA-seq, single-cell RNA-seq, and high-throughput spatial transcriptomics. Specifically, we describe the existing power analysis tools for each research objective for each of the bulk RNA-seq and scRNA-seq experiments, along with recommendations. On the other hand, since there are no power analysis tools for high-throughput spatial transcriptomics at this point, we instead investigate the factors that can influence power analysis.

Journal Article

Share this book

Add to My Shelf

Multiplexed assay of variant effect reveals residues of functional importance in the BRCA1 coiled-coil and serine cluster domains

by Nagy, Gregory , Diabate, Mariame , Jeon, Hyeongseon in Amino acids , Analysis , Assaying

2023

Delineating functionally normal variants from functionally abnormal variants in tumor suppressor proteins is critical for cancer surveillance, prognosis, and treatment options. BRCA1 is a protein that has many variants of uncertain significance which are not yet classified as functionally normal or abnormal. In vitro functional assays can be used to identify the functional impact of a variant when the variant has not yet been categorized through clinical observation. Here we employ a homology-directed repair (HDR) reporter assay to evaluate over 300 missense and nonsense BRCA1 variants between amino acid residues 1280 and 1576, which encompasses the coiled-coil and serine cluster domains. Functionally abnormal variants tended to cluster in residues known to interact with PALB2, which is critical for homology-directed repair. Multiplexed results were confirmed by singleton assay and by ClinVar database variant interpretations. Comparison of multiplexed results to designated benign or likely benign or pathogenic or likely pathogenic variants in the ClinVar database yielded 100% specificity and 100% sensitivity of the multiplexed assay. Clinicians can reference the results of this functional assay for help in guiding cancer treatment and surveillance options. These results are the first to evaluate this domain of BRCA1 using a multiplexed approach and indicate the importance of this domain in the DNA repair process.

Journal Article

Share this book

Add to My Shelf

PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model

by Nam, Jin Hyun , Yu, Zhenning , Chung, Dongjun in Algorithms , Analysis , Annotations

2020

Background In systems biology, it is of great interest to identify previously unreported associations between genes. Recently, biomedical literature has been considered as a valuable resource for this purpose. While classical clustering algorithms have popularly been used to investigate associations among genes, they are not tuned for the literature mining data and are also based on strong assumptions, which are often violated in this type of data. For example, these approaches often assume homogeneity and independence among observations. However, these assumptions are often violated due to both redundancies in functional descriptions and biological functions shared among genes. Latent block models can be alternatives in this case but they also often show suboptimal performances, especially when signals are weak. In addition, they do not allow to utilize valuable prior biological knowledge, such as those available in existing databases. Results In order to address these limitations, here we propose PALMER, a constrained latent block model that allows to identify indirect relationships among genes based on the biomedical literature mining data. By automatically associating relevant Gene Ontology terms, PALMER facilitates biological interpretation of novel findings without laborious downstream analyses. PALMER also allows researchers to utilize prior biological knowledge about known gene-pathway relationships to guide identification of gene–gene associations. We evaluated PALMER with simulation studies and applications to studies of pathway-modulating genes relevant to cancer signaling pathways, while utilizing biological pathway annotations available in the KEGG database as prior knowledge. Conclusions We showed that PALMER outperforms traditional latent block models and it provides reliable identification of novel gene–gene associations by utilizing prior biological knowledge, especially when signals are weak in the biomedical literature mining dataset. We believe that PALMER and its relevant user-friendly software will be powerful tools that can be used to improve existing pathway annotations and identify novel pathway-modulating genes.

Journal Article

Share this book

Add to My Shelf

A Meta-Review of Spatial Transcriptomics Analysis Software

by Chung, Dongjun , Song, Min-Ae , Gillespie, Jessica in Accuracy , benchmarking , Benchmarks

2025

Spatial transcriptomics combines gene expression data with spatial coordinates to allow for the discovery of detailed RNA localization, study development, investigating the tumor microenvironment, and creating a tissue atlas. A large range of spatial transcriptomics software is available, with little information on which may be better suited for particular datasets or computing environments. A review was conducted to detail the useful metrics when choosing appropriate software for spatial transcriptomics analysis. Specifically, the results from benchmarking studies that compared software across four key areas of spatial transcriptomics analysis (tissue architecture identification, spatially variable gene discovery, cell–cell communication analysis, and deconvolution) were assimilated into a single review that can serve as guidance when choosing potential spatial transcriptomics analysis software.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter