Catalogue Search | MBRL

Towards a complete map of the human long non-coding RNA transcriptome

by Lagarde, Julien , Frankish, Adam , Johnson, Rory in Gene expression , Gene mapping , Genomes

2018

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.

Journal Article

Share this book

Add to My Shelf

The GEM mapper: fast, accurate and versatile alignment by filtration

by Ribeca, Paolo , Sammeth, Michael , Guigó, Roderic in 631/1647/2217 , 631/1647/48 , 631/1647/514

2012

A sequence read mapper for versatile searching of genome space that returns all matches with precision and speed. Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy. The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).

Journal Article

Share this book

Add to My Shelf

Comparative transcriptomics in human and mouse

by Breschi, Alessandra , Gingeras, Thomas R. , Guigó, Roderic in 631/1647/334/1874/345 , 631/1647/514/1949 , 631/1647/514/2254

2017

Key Points The mouse is the most widely used model organism to study human disease, but often mouse biology cannot be extrapolated to humans. A deep comparison of mouse and human physiology at the molecular level is essential for understanding under which circumstances the mouse can be a suitable model of human biology and for creating better mouse models. Advances in next-generation sequencing technologies fostered genome-wide annotation of functional DNA elements, enabling extensive comparison of the human and mouse genomes. At the transcriptional level, human and mouse gene expression profiles are conserved overall, although the degree of conservation varies depending on the tissues and the genes that are compared. Therefore, the question of whether the human and mouse transcriptomes cluster preferentially by tissue or organ or by species does not have an answer overall, and it depends specifically on the genes being considered. Conservation of expression is not a direct consequence of conservation in regulatory sequences, including promoters and enhancers. Although gene regulatory networks are preserved overall between human and mouse, transcription binding sites are often not conserved. Inter-individual genetic variation can affect human gene expression, but such variation cannot be modelled in inbred strains of laboratory mice because their genetic variation is small compared to the human population. An expansion of the current studies on the relationship between genetic variation and gene expression in outbred mice might provide helpful insights to understand the same relationship in humans. Emerging technologies — such single-cell genomics and single-cell spatial transcriptomics — and time series experiments will improve the annotation of human and mouse genomes, refine the current definitions of homologous cell types and homologous (molecular) phenotypes, and ultimately help scientists to identify which mouse models are the most appropriate to address a given biological question. Next-generation sequencing technologies have enabled the comprehensive characterization of human and mouse genomes, including at the transcriptional level. This article reviews the degree of conservation of human and mouse transcriptomes, along with the challenges of identifying when the mouse is a suitable model of human physiology. Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.

Journal Article

Share this book

Add to My Shelf

Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome

by Reverter, Ferran , Garrido-Martín, Diego , Borsari, Beatrice in 631/114 , 631/337/2019 , Alternative Splicing

2021

Alternative splicing (AS) is a fundamental step in eukaryotic mRNA biogenesis. Here, we develop an efficient and reproducible pipeline for the discovery of genetic variants that affect AS (splicing QTLs, sQTLs). We use it to analyze the GTEx dataset, generating a comprehensive catalog of sQTLs in the human genome. Downstream analysis of this catalog provides insight into the mechanisms underlying splicing regulation. We report that a core set of sQTLs is shared across multiple tissues. sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many also affect the expression of the same or other genes, uncovering regulatory loci that act through different mechanisms. sQTLs tend to be located in post-transcriptionally spliced introns, which would function as hotspots for splicing regulation. While many variants affect splicing patterns by altering the sequence of splice sites, many more modify the binding sites of RNA-binding proteins. Genetic variants affecting splicing can have a stronger phenotypic impact than those affecting gene expression. The profiling of genetic variants affecting splicing can give insight into disease mechanisms. Here, the authors develop a pipeline for discovery of variants affecting splicing (sQTLs) and with application to the GTEx dataset they generate a catalog of human sQTLs.

Journal Article

Share this book

Add to My Shelf

ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization

by Garrido-Martín, Diego , Breschi, Alessandra , Palumbo, Emilio in Algorithms , Annotations , Bioinformatics

2018

We present ggsashimi, a command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments, a feature unique to this software. Compared to the existing versions of programs generating sashimi plots, it uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. ggsashimi is freely available at https://github.com/guigolab/ggsashimi. It is implemented in python, and internally generates R code for plotting.

Journal Article

Share this book

Add to My Shelf

Genomic and functional conservation of lncRNAs: lessons from flies

by Guigó Roderic , Corominas Montserrat , Camilleri-Robles, Carlos in Alzheimer's disease , Animal models , Annotations

2022

Over the last decade, the increasing interest in long non-coding RNAs (lncRNAs) has led to the discovery of these transcripts in multiple organisms. LncRNAs tend to be specifically, and often lowly, expressed in certain tissues, cell types and biological contexts. Although lncRNAs participate in the regulation of a wide variety of biological processes, including development and disease, most of their functions and mechanisms of action remain unknown. Poor conservation of the DNA sequences encoding for these transcripts makes the identification of lncRNAs orthologues among different species very challenging, especially between evolutionarily distant species such as flies and humans or mice. However, the functions of lncRNAs are unexpectedly preserved among different species supporting the idea that conservation occurs beyond DNA sequences and reinforcing the potential of characterising lncRNAs in animal models. In this review, we describe the features and roles of lncRNAs in the fruit fly Drosophila melanogaster, focusing on genomic and functional comparisons with human and mouse lncRNAs. We also discuss the current state of advances and limitations in the study of lncRNA conservation and future perspectives.

Journal Article

Share this book

Add to My Shelf

Overcoming the widespread flaws in the annotation of vertebrate selenoprotein genes in public databases

by Sullivan, Emerson , Ticó, Max , Mariotti, Marco in Analysis , Animals , Biocuration

2026

Genome annotations provide the essential framework for genomic analyses, capturing our current knowledge of gene structure and function as inferred from computational predictions and experimental evidence. Even as automated annotation pipelines become more sophisticated, their accuracy in representing unconventional gene expression events remains largely untested. Here, we address this gap by examining the most common form of translational recoding: the insertion of selenocysteine (Sec), a non-canonical amino acid incorporated into selenoproteins, oxidoreductase enzymes carrying essential roles in redox homeostasis. Sec insertion occurs in response to UGA, normally interpreted as stop codon, but recoded in selenoprotein mRNAs. Owing to the dual function of UGA, the identification of selenoprotein genes poses a challenge. We show that the vertebrate selenoprotein genes are widely misannotated in major public databases. Only 11% and 5% of selenoprotein genes are well annotated in Ensembl and NCBI GenBank, respectively, due to the lack of dedicated selenoprotein annotation pipelines. In most cases (81% and 84%), overlapping flawed annotations are present which lack the Sec-encoding UGA. In contrast, NCBI RefSeq employs a dedicated selenoprotein pipeline, yet with some shortcomings: its selenoprotein annotations are correct in 77% of cases, and most errors affect families with a C-terminal Sec residue. We argue that selenoproteins must be correctly annotated in public databases and that must occur via automated pipelines, to keep the pace with genome sequencing. To facilitate this task, we present a new version of Selenoprofiles, an homology based tool for selenoprotein prediction that produces predictions with accuracy comparable to manual curation, and can be easily deployed and integrated in existing annotation pipelines.

Journal Article

Share this book

Add to My Shelf

Day-night and seasonal variation of human gene expression across tissues

by Amador, Raziel , Guigó, Roderic , Wucher, Valentin in Adaptation, Physiological , Analysis , Animals

2023

Circadian and circannual cycles trigger physiological changes whose reflection on human transcriptomes remains largely uncharted. We used the time and season of death of 932 individuals from GTEx to jointly investigate transcriptomic changes associated with those cycles across multiple tissues. Overall, most variation across tissues during day-night and among seasons was unique to each cycle. Although all tissues remodeled their transcriptomes, brain and gonadal tissues exhibited the highest seasonality, whereas those in the thoracic cavity showed stronger day-night regulation. Core clock genes displayed marked day-night differences across multiple tissues, which were largely conserved in baboon and mouse, but adapted to their nocturnal or diurnal habits. Seasonal variation of expression affected multiple pathways, and it was enriched among genes associated with the immune response, consistent with the seasonality of viral infections. Furthermore, they unveiled cytoarchitectural changes in brain regions. Altogether, our results provide the first combined atlas of how transcriptomes from human tissues adapt to major cycling environmental conditions. This atlas may have multiple applications; for example, drug targets with day-night or seasonal variation in gene expression may benefit from temporally adjusted doses.

Journal Article

Share this book

Add to My Shelf

Computational identification of the selenocysteine tRNA (tRNASec) in genomes

by Mariotti, Marco , Santesmasses, Didac , Guigó, Roderic in Algorithms , Amino acids , Archaea

2017

Selenocysteine (Sec) is known as the 21st amino acid, a cysteine analogue with selenium replacing sulphur. Sec is inserted co-translationally in a small fraction of proteins called selenoproteins. In selenoprotein genes, the Sec specific tRNA (tRNASec) drives the recoding of highly specific UGA codons from stop signals to Sec. Although found in organisms from the three domains of life, Sec is not universal. Many species are completely devoid of selenoprotein genes and lack the ability to synthesize Sec. Since tRNASec is a key component in selenoprotein biosynthesis, its efficient identification in genomes is instrumental to characterize the utilization of Sec across lineages. Available tRNA prediction methods fail to accurately predict tRNASec, due to its unusual structural fold. Here, we present Secmarker, a method based on manually curated covariance models capturing the specific tRNASec structure in archaea, bacteria and eukaryotes. We exploited the non-universality of Sec to build a proper benchmark set for tRNASec predictions, which is not possible for the predictions of other tRNAs. We show that Secmarker greatly improves the accuracy of previously existing methods constituting a valuable tool to identify tRNASec genes, and to efficiently determine whether a genome contains selenoproteins. We used Secmarker to analyze a large set of fully sequenced genomes, and the results revealed new insights in the biology of tRNASec, led to the discovery of a novel bacterial selenoprotein family, and shed additional light on the phylogenetic distribution of selenoprotein containing genomes. Secmarker is freely accessible for download, or online analysis through a web server at http://secmarker.crg.cat.

Journal Article

Share this book

Add to My Shelf

Conserved long-range base pairings are associated with pre-mRNA processing of human genes

by Skvortsov, Dmitry , Mironov, Alexey , Kalmykova, Svetlana in 631/114/2397 , 631/114/2415 , 631/337/1645/1792

2021

The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3’-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation. Functional RNA secondary structure is important for the pre-mRNA processing including splicing, cleavage and polyadenylation, and RNA editing. Here the authors present a catalog of conserved long-range RNA structures in the human transcriptome by defining pairs of conserved complementary regions (PCCR) in pre-aligned evolutionarily conserved regions.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter