Catalogue Search | MBRL

by Jolanki, Otto , Lam, Bonita , Cherry, J. Michael in 631/114/129/2043 , 631/114/2164 , 631/1647/2217

2025

Spanning two decades, the collaborative ENCODE project aims to identify all the functional elements within human and mouse genomes. To best serve the scientific community, the comprehensive ENCODE data including results from 23,000+ functional genomics experiments, 800+ functional elements characterization experiments and 60,000+ results from integrative computational analyses are available on an open-access data-portal ( https://www.encodeproject.org/ ). The final phase of the project includes data from several novel assays aimed at characterization and validation of genomic elements. In addition to developing and maintaining the data portal, the Data Coordination Center (DCC) implemented and utilised uniform processing pipelines to generate uniformly processed data. Here we report recent updates to the data portal including a redesigned home page, an improved search interface, new custom-designed pages highlighting biologically related datasets and an enhanced cart interface for data visualisation plus user-friendly data download options. A summary of data generated using uniform processing pipelines is also provided. Here, the authors report recent updates to the ENCODE data portal including a redesigned home page, an improved search interface, custom-designed pages highlighting biologically related datasets and an enhanced cart interface for data visualisation plus user-friendly data download options.

Journal Article

Share this book

Add to My Shelf

Annotating and prioritizing human non-coding variants with RegulomeDB v.2

by Spragins, Emma , Kagda, Meenakshi S. , Jolanki, Otto in 631/1647/2217/2138 , 631/1647/48 , 631/208/212/2166

2023

[...]we included chromatin state annotations known as from chromHMM in EpiMap for 833 biosamples8. Transcription factor motifs and ChIP-seq data together provide evidence about how a variant is likely to affect phenotype in a cell-specific context. [...]rs72635708 is predicted as a regulatory variant by RegulomeDB with a high probability of 0.91 due to its locus overlapping with DNase and ChIP-seq peaks, footprints, and it is an eQTL that associates with LINC01714 gene expression in the right lobe liver. Because rs72635708 lies in the FOS motif, it is likely to be a functional variant in the liver by modulating the binding of the AP-1 complex13.

Journal Article

Share this book

Add to My Shelf

Private information leakage from functional genomics data: Quantification with calibration experiments and reduction via data sanitization protocols

by Brannon, Charlotte , Prashant Siva Emani , Harmanci, Arif O in Bioinformatics , Experiments , Gene expression

2020

The generation of functional genomics datasets is surging, as they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intention of functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to share raw reads for better analyses and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, thus enabling principled privacy-utility trade-offs. It works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA-sequencing. The procedure depends on quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples. Competing Interest Statement The authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity

by Wold, Barbara J , Jolanki, Otto , Balderrama-Gutierrez, Gabriela in Genomics

2023

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

Journal Article

Share this book

Add to My Shelf

The ENCODE Uniform Analysis Pipelines

by Jolanki, Otto , Bochkov, Ivan , Sloan, Cricket A in Bioinformatics

2023

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the and genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments Cromwell. Access to the pipelines and data the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

Journal Article

Share this book

Add to My Shelf

Annotating and prioritizing human non-coding variants with RegulomeDB

by Spragins, Emma , Assis, Pedro R , Jolanki, Otto in Bioinformatics , Computer applications , Genome-wide association studies

2022

Nearly 90% of the disease risk-associated variants identified from genome-wide association studies (GWAS) are in non-coding regions of the genome. The annotations obtained from analyzing functional genomics assays can provide additional information to pinpoint causal variants, which are often not the lead variants identified from association studies. However, the lack of available annotation tools limits the use of such data. To address the challenge, we have previously built the RegulomeDB database for prioritizing and annotating variants in non-coding regions1, which has been a highly utilized resource for the research community (Supplementary Fig. 1). RegulomeDB annotates a variant by intersecting its position with genomic intervals identified from functional genomic assays and computational approaches. It also incorporates those hits of a variant into a heuristic ranking score, representing its potential to be functional in regulatory elements. Here we present a newer version of the RegulomeDB web server, RegulomeDB v2.1 (http://regulomedb.org). We improve and boost annotation power by incorporating thousands of newly processed data from functional genomic assays in GRCh38 assembly, and now include probabilistic scores from the SURF algorithm that was the top performing non-coding variant predictor in CAGI 52. We also provide interactive charts and genome browser views to allow users an easy way to perform exploratory analyses in different tissue contexts. Competing Interest Statement The authors have declared no competing interest. Footnotes * http://regulomedb.org

Paper

Share this book

Add to My Shelf

Data navigation on the ENCODE portal

by Jolanki, Otto , Sloan, Cricket A , Lam, Bonita in Alzheimer's disease , Annotations , Assaying

2023

Spanning two decades, the Encyclopaedia of DNA Elements (ENCODE) is a collaborative research project that aims to identify all the functional elements in the human and mouse genomes. To best serve the scientific community, all data generated by the consortium is shared through a web-portal (https://www.encodeproject.org/) with no access restrictions. The fourth and final phase of the project added a diverse set of new samples (including those associated with human disease), and a wide range of new assays aimed at detection, characterization and validation of functional genomic elements. The ENCODE data portal hosts results from over 23,000 functional genomics experiments, over 800 functional elements characterization experiments (including in vivo transgenic enhancer assays, reporter assays and CRISPR screens) along with over 60,000 results of computational and integrative analyses (including imputations, predictions and genome annotations). The ENCODE Data Coordination Center (DCC) is responsible for development and maintenance of the data portal, along with the implementation and utilisation of the ENCODE uniform processing pipelines to generate uniformly processed data. Here we report recent updates to the data portal. Specifically, we have completely redesigned the home page, improved search interface, added several new pages to highlight collections of biologically related data (deeply profiled cell lines, immune cells, Alzheimer's Disease, RNA-Protein interactions, degron matrix and a matrix of experiments organised by human donors), added single-cell experiments, and enhanced the cart interface for visualisation and download of user-selected datasets.

Paper

Share this book

Add to My Shelf

Narcolepsy risk loci are enriched in immune cells and suggest autoimmune modulation of the T cell receptor repertoire

by Jolanki, Otto , Dauvilliers, Yves , Pizza, Fabio in Antigen presentation , Cytotoxicity , Dendritic cells

2018

Type 1 narcolepsy (T1N) is a neurological condition, in which the death of hypocretin-producing neurons in the lateral hypothalamus leads to excessive daytime sleepiness and symptoms of abnormal Rapid Eye Movement (REM) sleep. Known triggers for narcolepsy are influenza-A infection and associated immunization during the 2009 H1N1 influenza pandemic. Here, we genotyped all remaining consented narcolepsy cases worldwide and assembled this with the existing genotyped individuals. We used this multi-ethnic sample in genome wide association study (GWAS) to dissect disease mechanisms and interactions with environmental triggers (5,339 cases and 20,518 controls). Overall, we found significant associations with HLA (2 GWA significant subloci) and 11 other loci. Six of these other loci have been previously reported (TRA, TRB, CTSH, IFNAR1, ZNF365 and P2RY11) and five are new (PRF1, CD207, SIRPG, IL27 and ZFAND2A). Strikingly, in vaccination-related cases GWA significant effects were found in HLA, TRA, and in a novel variant near SIRPB1. Furthermore, IFNAR1 associated polymorphisms regulated dendritic cell response to influenza-A infection in vitro (p-value =1.92*10-25). A partitioned heritability analysis indicated specific enrichment of functional elements active in cytotoxic and helper T cells. Furthermore, functional analysis showed the genetic variants in TRA and TRB loci act as remarkable strong chain usage QTLs for TRAJ*24 (p-value = 0.0017), TRAJ*28 (p-value = 1.36*10-10) and TRBV*4-2 (p-value = 3.71*10-117). This was further validated in TCR sequencing of 60 narcolepsy cases and 60 DQB1*06:02 positive controls, where chain usage effects were further accentuated. Together these findings show that the autoimmune component in narcolepsy is defined by antigen presentation, mediated through specific T cell receptor chains, and modulated by influenza-A as a critical trigger.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter