Catalogue Search | MBRL

by Van Oss, Stephen Branden , Carvunis, Anne-Ruxandra in Animals , Base sequence , Biology and Life Sciences

2019

A 2009 report identified the first three de novo human genes, one of which is a therapeutic target in chronic lymphocytic leukemia [45]. Since this time, a plethora of genome-level studies have identified large numbers of orphan genes in many organisms (Table 1), although the extent to which they arose de novo remains debated. Phylogenetic trees are limited by the set of closely related genomes that are available, and results are dependent on BLAST search criteria [48]. Because it is based on sequence similarity, it is often difficult for phylostratigraphy to determine whether a novel gene has emerged de novo or has diverged from an ancestral gene beyond recognition, for instance following a duplication event. [...]the discovery of de novo gene birth has also led to a questioning of what constitutes a gene, with some models establishing a strict dichotomy between genic and non-genic sequences, and others proposing a more fluid continuum (see below). [...]it remains debated whether duplication and divergence or de novo gene birth represent the dominant mechanism for the emergence of new genes [63, 65, 73, 75–77], in part due to the fact that de novo genes are likely both to emerge and to be lost more frequently than other young genes (see below).

Journal Article

Share this book

Add to My Shelf

Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry

by Wacholder, Aaron , Carvunis, Anne-Ruxandra in Analysis , Biological Factors , Biology and Life Sciences

2023

Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.

Journal Article

Share this book

Add to My Shelf

Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes

by Vakirlis, Nikolaos , Carvunis, Anne-Ruxandra , McLysaght, Aoife in Analysis , Animals , Computational and Systems Biology

2020

The origin of ‘orphan’ genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown. Here, we harness the special circumstance of conserved synteny to estimate the contribution of complete divergence to the pool of orphan genes. By separately comparing yeast, fly and human genes to related taxa using conservative criteria, we find that complete divergence accounts, on average, for at most a third of eukaryotic orphan and taxonomically restricted genes. We observe that complete divergence occurs at a stable rate within a phylum but at different rates between phyla, and is frequently associated with gene shortening akin to pseudogenization.

Journal Article

Share this book

Add to My Shelf

Integrative approaches for finding modular structure in biological networks

by Mitra, Koyel , Ramesh, Sanath Kumar , Ideker, Trey in 631/114/2114 , 631/553 , 639/638/92/475/2290

2013

Key Points Bioinformatics approaches for integrating molecular networks across various types of interaction data, omics profiles, conditions or species have demonstrated considerable power for the detection and interpretation of biological modules. Module-discovery approaches are broadly classified into four categories: identification of 'active modules' through the integration of networks and molecular profiles, identification of 'conserved modules' across multiple species, identification of 'differential modules' across different conditions and identification of 'composite modules' through the integration of different interaction types. Active modules mark regions of a network that are most active during a given cellular or disease response and can identify important biomarkers, disease mechanisms and therapeutic targets. Conserved modules are revealed through the alignment or comparison of networks across multiple species. Such modules reflect biologically important pathways that have been conserved over long evolutionary periods. Differential modules are identified through differential analyses of experimentally mapped interactions across multiple conditions. Composite modules are detected through the simultaneous integration of diverse types of molecular interactions. Such integrative approaches reviewed here substantially increase the scope, scale and depth of bioinformatics analysis, by permitting joint interpretation of ensembles of distinct biological information. The recent proliferation of omics data has required a toolbox of integrative systems biology bioinformatics approaches to elucidate functional relationships between molecules. Here the authors explain the principles behind these approaches and discuss their applications. A central goal of systems biology is to elucidate the structural and functional architecture of the cell. To this end, large and complex networks of molecular interactions are being rapidly generated for humans and model organisms. A recent focus of bioinformatics research has been to integrate these networks with each other and with diverse molecular profiles to identify sets of molecules and interactions that participate in a common biological function — that is, 'modules'. Here, we classify such integrative approaches into four broad categories, describe their bioinformatic principles and review their applications.

Journal Article

Share this book

Add to My Shelf

Massively integrated coexpression analysis reveals transcriptional regulation, evolution and cellular implications of the yeast noncanonical translatome

by Acar, Omer , Rich, April , Carvunis, Anne-Ruxandra in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2024

Background Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae . Results Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors’ promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. Conclusions Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.

Journal Article

Share this book

Add to My Shelf

De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences

by Acar, Omer , Hines, Cameron P. , Wacholder, Aaron in 14/63 , 631/181/2474 , 631/181/735

2020

Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection. There is increasing evidence that protein-coding genes can emerge de novo from noncoding genomic regions. Vakirlis et al. propose that sequences encoding transmembrane polypeptides can emerge de novo in thymine-rich genomic regions and provide organisms with fitness benefits.

Journal Article

Share this book

Add to My Shelf

Of mice, men and immunity: a case for evolutionary systems biology

by Ernst, Peter B. , Carvunis, Anne-Ruxandra in 631/1647/2017 , 631/1647/767/1424 , 631/250/2520

2018

Animal models have been tremendously useful to translational research, but there is a need to maximize their predictive value to human disease. This Comment proposes novel strategies that consider evolutionary history and the presence, absence or modification of molecular networks in one species that are being studied in the other. Mice are generally the ‘go-to’ organism for modeling of the human immune system, but this often leads to inaccurate interpretations. Ernst and Carvunis argue in this Comment that taking into account the evolutionary and environmental context can generate better models of disease.

Journal Article

Share this book

Add to My Shelf

The meanings of 'function' in biology and the problematic case of de novo gene emergence

by Garza, Patricia , Nartey, Charisse Michelle , Keeling, Diane Marie in Binding sites , Biological Factors - metabolism , Biologists

2019

The word function has many different meanings in molecular biology. Here we explore the use of this word (and derivatives like functional) in research papers about de novo gene birth. Based on an analysis of 20 abstracts we propose a simple lexicon that, we believe, will help scientists and philosophers discuss the meaning of function more clearly.

Journal Article

Share this book

Add to My Shelf

Genome-Wide Identification of Pseudomonas aeruginosa Virulence-Related Genes Using a Caenorhabditis elegans Infection Model

by Feinbaum, Rhonda L. , Urbach, Jonathan M. , Carvunis, Anne-Ruxandra in ABC transporter , Adenosinetriphosphatase , Aminopeptidase

2012

Pseudomonas aeruginosa strain PA14 is an opportunistic human pathogen capable of infecting a wide range of organisms including the nematode Caenorhabditis elegans. We used a non-redundant transposon mutant library consisting of 5,850 clones corresponding to 75% of the total and approximately 80% of the non-essential PA14 ORFs to carry out a genome-wide screen for attenuation of PA14 virulence in C. elegans. We defined a functionally diverse 180 mutant set (representing 170 unique genes) necessary for normal levels of virulence that included both known and novel virulence factors. Seven previously uncharacterized virulence genes (ABC transporters PchH and PchI, aminopeptidase PepP, ATPase/molecular chaperone ClpA, cold shock domain protein PA0456, putative enoyl-CoA hydratase/isomerase PA0745, and putative transcriptional regulator PA14_27700) were characterized with respect to pigment production and motility and all but one of these mutants exhibited pleiotropic defects in addition to their avirulent phenotype. We examined the collection of genes required for normal levels of PA14 virulence with respect to occurrence in P. aeruginosa strain-specific genomic regions, location on putative and known genomic islands, and phylogenetic distribution across prokaryotes. Genes predominantly contributing to virulence in C. elegans showed neither a bias for strain-specific regions of the P. aeruginosa genome nor for putatively horizontally transferred genomic islands. Instead, within the collection of virulence-related PA14 genes, there was an overrepresentation of genes with a broad phylogenetic distribution that also occur with high frequency in many prokaryotic clades, suggesting that in aggregate the genes required for PA14 virulence in C. elegans are biased towards evolutionarily conserved genes.

Journal Article

Share this book

Add to My Shelf

Elastic network modeling of cellular networks unveils sensor and effector genes that control information flow

by Acar, Omer , Bahar, Ivet , Zhang, She in Allosteric properties , Analysis , Biology and Life Sciences

2022

The high-level organization of the cell is embedded in indirect relationships that connect distinct cellular processes. Existing computational approaches for detecting indirect relationships between genes typically consist of propagating abstract information through network representations of the cell. However, the selection of genes to serve as the source of propagation is inherently biased by prior knowledge. Here, we sought to derive an unbiased view of the high-level organization of the cell by identifying the genes that propagate and receive information most effectively in the cell, and the indirect relationships between these genes. To this aim, we adapted a perturbation-response scanning strategy initially developed for identifying allosteric interactions within proteins. We deployed this strategy onto an elastic network model of the yeast genetic interaction profile similarity network. This network revealed a superior propensity for information propagation relative to simulated networks with similar topology. Perturbation-response scanning identified the major distributors and receivers of information in the network, named effector and sensor genes, respectively. Effectors formed dense clusters centrally integrated into the network, whereas sensors formed loosely connected antenna-shaped clusters and contained genes with previously characterized involvement in signal transduction. We propose that indirect relationships between effector and sensor clusters represent major paths of information flow between distinct cellular processes. Genetic similarity networks for fission yeast and human displayed similarly strong propensities for information propagation and clusters of effector and sensor genes, suggesting that the global architecture enabling indirect relationships is evolutionarily conserved across species. Our results demonstrate that elastic network modeling of cellular networks constitutes a promising strategy to probe the high-level organization and cooperativity in the cell.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter