Catalogue Search | MBRL

A promoter-level mammalian expression atlas

by Jørgensen, Mette , Plessy, Charles , Chierici, Marco in 631/114/2114 , 631/208/200 , 631/337/2019

2014

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly ‘housekeeping’, whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research. A study from the FANTOM consortium using single-molecule cDNA sequencing of transcription start sites and their usage in human and mouse primary cells, cell lines and tissues reveals insights into the specificity and diversity of transcription patterns across different mammalian cell types. Mapping the human transcription FANTOM5 (standing for functional annotation of the mammalian genome 5) is the fifth major stage of a major international collaboration that aims to dissect the transcriptional regulatory networks that define every human cell type. Two Articles in this issue of Nature present some of the project's latest results. The first paper uses the FANTOM5 panel of tissue and primary cell samples to define an atlas of active, in vivo bidirectionally transcribed enhancers across the human body. These authors show that bidirectional capped RNAs are a signature feature of active enhancers and identify more than 40,000 enhancer candidates from over 800 human cell and tissue samples. The enhancer atlas is used to compare regulatory programs between different cell types and identify disease-associated regulatory SNPs, and will be a resource for studies on cell-type-specific enhancers. In the second paper, single-molecule sequencing is used to map human and mouse transcription start sites and their usage in a panel of distinct human and mouse primary cells, cell lines and tissues to produce the most comprehensive mammalian gene expression atlas to date. The data provide a plethora of insights into open reading frames and promoters across different cell types in addition to valuable annotation of mammalian cell-type-specific transcriptomes.

Journal Article

Share this book

Add to My Shelf

Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression

by Cormican, Paul , Kenny, Elaine M , Morris, Derek W in 5′ leader translation , Arsenite , Arsenites - pharmacology

2015

Eukaryotic cells rapidly reduce protein synthesis in response to various stress conditions. This can be achieved by the phosphorylation-mediated inactivation of a key translation initiation factor, eukaryotic initiation factor 2 (eIF2). However, the persistent translation of certain mRNAs is required for deployment of an adequate stress response. We carried out ribosome profiling of cultured human cells under conditions of severe stress induced with sodium arsenite. Although this led to a 5.4-fold general translational repression, the protein coding open reading frames (ORFs) of certain individual mRNAs exhibited resistance to the inhibition. Nearly all resistant transcripts possess at least one efficiently translated upstream open reading frame (uORF) that represses translation of the main coding ORF under normal conditions. Site-specific mutagenesis of two identified stress resistant mRNAs (PPP1R15B and IFRD1) demonstrated that a single uORF is sufficient for eIF2-mediated translation control in both cases. Phylogenetic analysis suggests that at least two regulatory uORFs (namely, in SLC35A4 and MIEF1) encode functional protein products. Proteins carry out essential tasks for living cells and genes contain the instructions to make proteins within their DNA. These instructions are copied to make a molecule of mRNA, and a molecular machine known as a ribosome then reads and translates the mRNA to build the protein. The first step in the translation process is called ‘initiation’ and requires a protein called eIF2 to work together with the ribosome. This step involves identifying an instruction called the start codon that marks the beginning of the mRNA's coding sequence. The section of an mRNA molecule before the start codon is not normally translated by the ribosome and is hence called the 5′ untranslated region. Building proteins requires energy and resources, and so it is carefully regulated. If a cell is stressed, such as by being exposed to harmful chemicals, it makes fewer proteins in order to conserve its resources. This down-regulation of protein production is achieved in part by the cell chemically modifying its eIF2 proteins to make them less able to initiate translation. However, stressed cells still continue to make more of certain proteins that help them to combat stress. The mRNA molecules for some of these proteins contain at least one other start codon in the 5′ untranslated region. The sequence that would be translated from such a start codon is known as an upstream open reading frame (or uORF for short)—and this feature is thought to help certain proteins to still be expressed despite low levels of active eIF2. Andreev, O'Connor et al. have now analysed which mRNAs are translated in human cells that have been treated with a chemical that induces stress and makes the eIF2 protein less able to initiate translation. To do so, a technique called ribosome profiling was used to identify all of the mRNA molecules bound to ribosomes shortly after treatment with this chemical. Overall translation of most mRNAs in stressed cells was reduced to a quarter of the normal level. However, Andreev, O'Connor et al. observed that the translation of a few mRNAs continued almost as normal, or even increased, after the chemical treatment. Notably, most of these mRNAs encoded regulatory proteins, which are not required in large amounts. With one exception, all of these resistant mRNAs contained uORFs. In unstressed cells, these uORFs were efficiently translated, while the same mRNA's coding sequences were translated less efficiently. Andreev, O'Connor et al. suggest that these two features could be used to identify mRNAs that are still translated into working proteins when cells are stressed. Further work is now needed to explore the mechanisms by which translation of these uORFs allows mRNAs to resist the stress.

Journal Article

Share this book

Add to My Shelf

Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures

by Low, Teck Yew , Pung, Yuh-Fen , Lee, Pey Yee in Biomedical and Life Sciences , Biomedicine , Computational Biology

2022

A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.

Journal Article

Share this book

Add to My Shelf

The Minimum Open Reading Frame, AUG-Stop, Induces Boron-Dependent Ribosome Stalling and mRNA Degradation

by Tanaka, Mayuki , Yamashita, Yui , Murota, Katsunori in Aquaporins - genetics , Aquaporins - metabolism , Arabidopsis

2016

Upstream open reading frames (uORFs) are often translated ahead of the main ORF of a gene and regulate gene expression, sometimes in a condition-dependent manner, but such a role for the minimum uORF (hereafter referred to as AUG-stop) in living organisms is currently unclear. Here, we show that AUG-stop plays an important role in the boron (B)-dependent regulation of NIP5;1, encoding a boric acid channel required for normal growth under low B conditions in Arabidopsis thaliana. High B enhanced ribosome stalling at AUG-stop, which was accompanied by the suppression of translation and mRNA degradation. This mRNA degradation was promoted by an upstream conserved sequence present near the 5′-edge of the stalled ribosome. Once ribosomes translate a uORF, reinitiation of translation must take place in order for the downstream ORF to be translated. Our results suggest that reinitiation of translation at the downstream NIP5;1 ORF is enhanced under low B conditions. A genome-wide analysis identified two additional B-responsive genes, SKU5 and the transcription factor gene ABS/NGAL1, which were regulated by B-dependent ribosome stalling through AUG-stop. This regulation was reproduced in both plant and animal transient expression and cell-free translation systems. These findings suggest that B-dependent AUG-stop-mediated regulation is common in eukaryotes.

Journal Article

Share this book

Add to My Shelf

Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq

by Couso, Juan-Pablo , Amin, Unum , Phillips, Rose J in 3' Untranslated Regions - genetics , Amino acids , Animals

2014

Thousands of small Open Reading Frames (smORFs) with the potential to encode small peptides of fewer than 100 amino acids exist in our genomes. However, the number of smORFs actually translated, and their molecular and functional roles are still unclear. In this study, we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions in Drosophila. We detect two types of smORFs bound by multiple ribosomes and thus undergoing productive translation. The ‘longer’ smORFs of around 80 amino acids resemble canonical proteins in translational metrics and conservation, and display a propensity to contain transmembrane motifs. The ‘dwarf’ smORFs are in general shorter (around 20 amino-acid long), are mostly found in 5′-UTRs and non-coding RNAs, are less well conserved, and have no bioinformatic indicators of peptide function. Our findings indicate that thousands of smORFs are translated in metazoan genomes, reinforcing the idea that smORFs are an abundant and fundamental genome component. To produce a protein, a stretch of DNA must first be transcribed to produce a molecule of messenger RNA (mRNA). The genetic information copied from the DNA is then read three letters at a time, in groups called codons. Each codon either encodes a particular amino acid to be added into a protein or provides further instructions: ‘start codons’ mark the beginning of a protein; ‘stop codons’ mark its end. The DNA between these two points is called an open reading frame (or ORF)—however, not all ORFs produce proteins. Most proteins are made of several hundred amino acids, but the genomes of animals contain thousands of ORFs that would generate much smaller proteins made of fewer than 100 amino acids, if they were translated. It is, however, unclear how many of these small ORFs are converted into mRNA molecules and functional proteins. Ribosomes are large molecular machines that translate the code in mRNA molecules and join together the appropriate amino acids in the right order to make a protein. Ribosome profiling is a technique that identifies which mRNA molecules are translated into proteins by determining the sequences of all the mRNA molecules bound to ribosomes at a particular moment. The mRNA sequences can then be compared with the sequence of the whole genome to work out which ORFs they correspond to. Ribosome profiling has been used to detect translated small ORFs, but the method yields a relatively high false positive rate as some mRNAs can bind to ribosomes without being translated. To better detect small protein-producing ORFs, Aspden et al. developed a technique based on ribosome profiling called Poly-Ribo-Seq. The method takes advantage of the fact that during active translation, clusters of multiple ribosomes, called polysomes, bind mRNAs. Poly-Ribo-Seq isolates these polysomes and determines the sequence bound by each of the ribosomes, thereby reducing the number of false positives. Applying Poly-Ribo-Seq to cells from the fruit fly Drosophila allowed Aspden et al. to identify two types of short ORF. The first type codes for proteins that are around 80 amino acids long and are translated with the same efficiency as larger ORFs. The sequences of these ORFs are found in other species, match at least in part sequences of known functional ORFs, and the proteins produced are found in specific locations inside cells. These small proteins may contribute to membrane integrity or function. Together, these properties suggest that these mRNAs create functional small proteins. The second pool consists of very small ORFs (‘dwarf smORFs’) that code for around 20 amino acids, which are not translated so often and do not show many similarities with other species. As the findings of Aspden et al. suggest that a large fraction of Drosophila small ORFs are translated into proteins, the next challenge will be to determine the roles of these small proteins in cells.

Journal Article

Share this book

Add to My Shelf

Unveiling conserved HIV-1 open reading frames encoding T cell antigens using ribosome profiling

by Esclatine, Audrey , Bertrand, Lisa , Verdier, Yann in 13/31 , 14/1 , 38/91

2025

The development of ribosomal profiling (Riboseq) revealed the immense coding capacity of human and viral genomes. Here, we used Riboseq to delineate the translatome of HIV-1 in infected CD4 + T cells. In addition to canonical viral protein coding sequences (CDSs), we identify 98 alternative open reading frames (ARFs), corresponding to small Open Reading Frames (sORFs) that are distributed across the HIV genome including the UTR regions. Using a database of HIV genomes, we observe that most ARF amino-acid sequences are likely conserved among clade B and C of HIV-1, with 8 ARF-encoded amino-acid sequences being more conserved than the overlapping CDSs. Using T cell-based assays and mass spectrometry-based immunopeptidomics, we demonstrate that ARFs encode viral polypeptides. In the blood of people living with HIV, ARF-derived peptides elicit potent poly-functional T cell responses mediated by both CD4 + and CD8 + T cells. Our discovery expands the list of conserved viral polypeptides that are targets for vaccination strategies and might reveal the existence of viral microproteins or pseudogenes. Here, using ribosomal profiling, the authors characterize the translatome of HIV-1 revealing tens of alternative open reading frames (ARF) that encode conserved viral antigens and show that ARF-derived peptides elicit potent HIV-specific poly-functional immune responses mediated by both CD4 + and CD8 + T cells.

Journal Article

Share this book

Add to My Shelf

Mutational constraint analysis workflow for overlapping short open reading frames and genomic neighbors

by Elbracht, Miriam , Kurth, Ingo , Danner, Martin in Animal Genetics and Genomics , Biomedical and Life Sciences , Computational genomics

2025

Understanding the dark genome is a priority task following the complete sequencing of the human genome. Short open reading frames (sORFs) are a group of largely unexplored elements of the dark genome with the potential for being translated into microproteins. The definitive number of coding and regulatory sORFs is not known, however they could account for up to 1–2% of the human genome. This corresponds to an order of magnitude in the range of canonical coding genes. For a few sORFs a clinical relevance has already been demonstrated, but for the majority of potential sORFs the biological function remains unclear. A major limitation in predicting their disease relevance using large-scale genomic data is the fact that no population-level constraint metrics for genetic variants in sORFs are yet available. To overcome this, we used the recently released gnomAD 4.0 dataset and analyzed the constraint of a consensus set of sORFs and their genomic neighbors. We demonstrate that sORFs are mostly embedded into a moderately constrained genomic context, but within the gencode dataset we identified a subset of highly constrained sORFs comparable to highly constrained canonical genes.

Journal Article

Share this book

Add to My Shelf

Distinguishing protein-coding and noncoding genes in the human genome

by Clamp, Michele , Fry, Ben , Kellis, Manolis in Animals , Base Sequence , Biological Sciences

2007

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of [almost equal to]24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs--specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to [almost equal to]20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.

Journal Article

Share this book

Add to My Shelf

Microprotein-encoding RNA regulation in cells treated with pro-inflammatory and pro-fibrotic stimuli

by Vaughan, Joan M. , Pinto, Antonio M. , Pai, Victor J. in Algorithms , Analysis , Animal Genetics and Genomics

2024

Background Recent analysis of the human proteome via proteogenomics and ribosome profiling of the transcriptome revealed the existence of thousands of previously unannotated microprotein-coding small open reading frames (smORFs). Most functional microproteins were chosen for characterization because of their evolutionary conservation. However, one example of a non-conserved immunomodulatory microprotein in mice suggests that strict sequence conservation misses some intriguing microproteins. Results We examine the ability of gene regulation to identify human microproteins with potential roles in inflammation or fibrosis of the intestine. To do this, we collected ribosome profiling data of intestinal cell lines and peripheral blood mononuclear cells and used gene expression of microprotein-encoding transcripts to identify strongly regulated microproteins, including several examples of microproteins that are only conserved with primates. Conclusion This approach reveals a number of new microproteins worthy of additional functional characterization and provides a dataset that can be queried in different ways to find additional gut microproteins of interest.

Journal Article

Share this book

Add to My Shelf

Computational discovery and annotation of conserved small open reading frames in fungal genomes

by Mat-Sharani, Shuhaila , Firdaus-Raih, Mohd in Algorithms , Amino Acid Sequence , Amino acids

2019

Background Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. Results A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. Conclusions It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter