Catalogue Search | MBRL

Using networks to analyze and visualize the distribution of overlapping genes in virus genomes

by Muñoz-Baena, Laura , Poon, Art F. Y. in Biology and Life Sciences , Comparative analysis , Comparative studies

2022

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.

Journal Article

Share this book

Add to My Shelf

Inheritance of Rootstock Effects in Avocado (Persea americana Mill.) cv. Hass

by Díaz-Diez, Cipriano A. , Cortés, Andrés J. , Reyes-Herrera, Paula H. in Agricultural production , Breeding , Crop yield

2020

Grafting is typically utilized to merge adapted seedling rootstocks with highly productive clonal scions. This process implies the interaction of multiple genomes to produce a unique tree phenotype. However, the interconnection of both genotypes obscures individual contributions to phenotypic variation (rootstock-mediated heritability), hampering tree breeding. Therefore, our goal was to quantify the inheritance of seedling rootstock effects on scion traits using avocado ( Persea americana Mill.) cv. Hass as a model fruit tree. We characterized 240 diverse rootstocks from 8 avocado cv. Hass orchards with similar management in three regions of the province of Antioquia, northwest Andes of Colombia, using 13 microsatellite markers simple sequence repeats (SSRs). Parallel to this, we recorded 20 phenotypic traits (including morphological, biomass/reproductive, and fruit yield and quality traits) in the scions for 3 years (2015–2017). Relatedness among rootstocks was inferred through the genetic markers and inputted in a “genetic prediction” model to calculate narrow-sense heritabilities ( h 2 ) on scion traits. We used three different randomization tests to highlight traits with consistently significant heritability estimates. This strategy allowed us to capture five traits with significant heritability values that ranged from 0.33 to 0.45 and model fits ( r ) that oscillated between 0.58 and 0.73 across orchards. The results showed significance in the rootstock effects for four complex harvest and quality traits (i.e., total number of fruits, number of fruits with exportation quality, and number of fruits discarded because of low weight or thrips damage), whereas the only morphological trait that had a significant heritability value was overall trunk height (an emergent property of the rootstock–scion interaction). These findings suggest the inheritance of rootstock effects, beyond root phenotype, on a surprisingly wide spectrum of scion traits in “Hass” avocado. They also reinforce the utility of polymorphic SSRs for relatedness reconstruction and genetic prediction of complex traits. This research is, up to date, the most cohesive evidence of narrow-sense inheritance of rootstock effects in a tropical fruit tree crop. Ultimately, our work highlights the importance of considering the rootstock–scion interaction to broaden the genetic basis of fruit tree breeding programs while enhancing our understanding of the consequences of grafting.

Journal Article

Share this book

Add to My Shelf

Genome Sequencing of Potato yellow vein virus (PYVV) Strain Infecting Solanum lycopersicum in Colombia

by Laura Muñoz Baena , Mauricio Marín Montoya , Pablo Andrés Gutiérrez Sánchez in Crinivirus , RT-PCR , RT-qPCR

2017

Potato yellow vein virus (PYVV) is one of the most important pathogens of potato in the Andean region. In spite of having been detected in tomato crops in Colombia, knowledge on the biological characteristics of PYVV is limited on this host. In this study, next-generation sequencing (NGS) of a PYVV strain infecting tomato in Marinilla District (Antioquia) was performed; additionally, three primer set useful in RT-PCR and RT-qPCR detection were also tested. The consensus genome consisted of three RNA segments of 8043 nt (RNA1), 5346 nt (RNA2) and 3896 nt (RNA3) encoding ten ORF with slight lower sequence identity in relation to PYVV isolates from potato. Sequence analysis suggests the presence of regions potentially undergoing positive selection in the ORFs coding for MET/HEL and CPm possibly as a result of host adaptation. Experimental validation of primers resulted in amplicon with the expected size while melting temperature analysis and sequencing suggest the presence of at least two PYVV infecting S. lycopersicum in east Antioquia in agreement with the NGS data.

Journal Article

Share this book

Add to My Shelf

Genome sequencing of two Bell pepper endornavirus (BPEV) variants infecting Capsicum annuum in Colombia

by Marín-Montoya, Mauricio , Gutiérrez, Pablo A. , Muñoz-Baena, Laura in AGRONOMY , Amino acids , Capsicum annuum

2017

Transcriptome analysis of chili and bell pepper samples from commercial plots in the municipalities of Santa Fe de Antioquia and El Peñol in the province of Antioquia revealed the presence of viral sequences with significant similarity to genomes of members of the genus Endornavirus. Assembly of the chili and bell pepper transcriptomes resulted in consensus sequences of 14,727 nt and 14,714 nt that were identified as Bell pepper endornavirus (BPEV). Both sequences were nearly identical by 99.9 % at both nucleotide and amino acid levels. The presence of BPEV was confirmed by RT-qPCR, RT-PCR and Sanger sequencing using RdRp-specific primers designed from the assembled sequences in ten independent random samples taken from the investigated bell pepper stands. The phylogenetic analysis of both BPEV variants and their affiliation within the genus Endornavirus is discussed. For our knowledge, this is the first study on this group of viruses in Colombia.

Journal Article

Share this book

Add to My Shelf

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

by Ferreira, Roux-Cil , Baena, Laura Muñoz , Wong, Emmanuel in Data visualization , Genomes , Phylogenetics

2021

Abstract Phylogenetics has played a pivotal role in the genomic epidemiology of severe acute respiratory syndrome coronavirus 2, such as tracking the emergence and global spread of variants and scientific communication. However, the rapid accumulation of genomic data from around the world—with over two million genomes currently available in the Global Initiative on Sharing All Influenza Data database—is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2 and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into ‘variants’, generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neighbor-joining trees in RapidNJ that are converted into a majority-rule consensus tree for each lineage. Branches with support values below 50 per cent or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly sampled ancestral variants. Currently, we process about 2 million genomes in approximately 9 h on 52 cores. The resulting trees are visualized using the JavaScript framework D3.js as ‘beadplots’, in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

Journal Article

Share this book

Add to My Shelf

Secuenciación del genoma completo del Potato yellow vein virus (PYVV) en tomate (Solanum lycopersicum) en Colombia

by Gutiérrez Sánchez, Pablo Andrés , Marín Montoya, Mauricio , Muñoz Baena, Laura in BIOLOGY , Potatoes

2017

Potato yellow vein virus (PYVV), es uno de los fitopatógenos más limitantes para la producción de papa en la región de Los Andes. A pesar que se le ha detectado infectando tomate en Colombia, el conocimiento de las características biológicas de las cepas presentes en este hospedante es muy limitado. En este estudio, utilizando secuenciación masiva de nueva generación (NGS), se obtuvo la secuencia completa de los tres segmentos genómicos del PYVV en plantas de tomate en Marinilla (Antioquia) y se evalúo la utilidad de tres juegos de cebadores para su detección mediante pruebas de RT-PCR convencional y en tiempo real (RT-qPCR). El genoma de la secuencia consenso presentó tamaños de 8043 nt (ARN1), 5346 nt (ARN2) y 3896 nt (ARN3) y se identificaron los diez ORF previamente reportados en este virus, aunque, en general, éstos presentaron menores| niveles de identidad que los registrados entre cepas de PYVV de papa. Análisis de variación y de selección identificaron dos regiones en los ORF MET/HEL y CPm que presentan selección positiva, lo que podría estar asociado a la adaptación por hospedante. Los tres juegos de cebadores amplificaron las regiones esperadas de la cápside de PYVV, siendo posible identificar, por diferencias en valores de temperatura de fusión (Tm) y por secuenciación Sanger, la ocurrencia de al menos dos variantes principales de este virus en el Oriente Antioqueño, lo que concuerda con los niveles moderados de polimorfismos encontrados en las secuencias obtenidas por NGS.

Journal Article

Share this book

Add to My Shelf

Phylogenetic Reconstruction and Functional Characterization of the Ancestral Nef Protein of Primate Lentiviruses

by Olabode, Abayomi S , Poon, Art F Y , Wild, Tristan A in Analysis , CD4 antigen , Cell surface

2023

Abstract Nef is an accessory protein unique to the primate HIV-1, HIV-2, and SIV lentiviruses. During infection, Nef functions by interacting with multiple host proteins within infected cells to evade the immune response and enhance virion infectivity. Notably, Nef can counter immune regulators such as CD4 and MHC-I, as well as the SERINC5 restriction factor in infected cells. In this study, we generated a posterior sample of time-scaled phylogenies relating SIV and HIV Nef sequences, followed by reconstruction of ancestral sequences at the root and internal nodes of the sampled trees up to the HIV-1 Group M ancestor. Upon expression of the ancestral primate lentivirus Nef protein within CD4+ HeLa cells, flow cytometry analysis revealed that the primate lentivirus Nef ancestor robustly downregulated cell-surface SERINC5, yet only partially downregulated CD4 from the cell surface. Further analysis revealed that the Nef-mediated CD4 downregulation ability evolved gradually, while Nef-mediated SERINC5 downregulation was recovered abruptly in the HIV-1/M ancestor. Overall, this study provides a framework to reconstruct ancestral viral proteins and enable the functional characterization of these proteins to delineate how functions could have changed throughout evolutionary history.

Journal Article

Share this book

Add to My Shelf

HexSE: Simulating evolution in overlapping reading frames

by Poon, Art F Y , Wade, Kaitlyn E , Muñoz-Baena, Laura in Genomes , Resources

2023

Abstract Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another and vice versa. To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias and the distribution of selection biases (dN/dS) in the respective reading frames. Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License version 3 and are available at https://github.com/PoonLab/HexSE.

Journal Article

Share this book

Add to My Shelf

Evolution of Overlapping Reading Frames in Virus Genomes

by Baena, Laura Muñoz in Bioinformatics , Genetics , Genomes

2023

Viruses are formidable pathogens that represent the majority of biological entities on our planet, and their genomes are a source of interesting enigmas. One feature in which virus genomes are usually rich, is the presence of overlapping reading frames (OvRFs) — portions of the genome where the same nucleotide sequence encodes more than one protein. OvRFs are hypothesized to be used by viruses to encode proteins more compactly and to regulate transcription. In addition, OvRFs might be a source of gene novelty, facilitating the creation of new open reading frames (ORF) within the transcriptional context of existing ones.To characterize the distribution OvRFs in viruses, I analyzed 12,609 reference genomes from the NCBI virus database and discovered that, while the number of OvRFs increases the genome length, the overlapping regions tend to be shorter in longer genomes. I also demonstrated that different frameshifts have distinct patterns in OvRFs. For example, +2 frameshifts are predominantly found in dsDNA viruses, whereas +0 frameshifts in RNA viruses tend to involve longer overlaps, which may increase the selective burden of the same nucleotide positions within codons. Further, I retrieved n = 8, 586 protein-coding sequences from n = 1, 224 reference genomes, and used an alignment-free method to cluster these sequences within virus families. I used these clusters to develop a new network-based representation of the distribution of OvRFs, which provides a means of visualizing and analyzing these genome features for each virus family. I also used these networks to generate a high-level visualization of how overlapping genes are distributed among virus genomes in the same family.Evolution in overlapping genes is complicated because the e↵ect of a nucleotide substitution has multiple contexts. To unravel the e↵ects of OvRFs on virus evolution, I developed HexSE, a simulation model of nucleotide sequence evolution along a phylogeny that tracks the substitution rates at every nucleotide site. In HexSE, I implemented a customized data structure to eciently track the substitution rates at every nucleotide site. These rates are determined by the stationary nucleotide frequencies, transition bias, and the distribution of selection biases (dN and dS) in the respective reading frames. Next, I compared HexSE simulations under varying settings to an align ment of actual hepatitis B virus (HBV) genomes, which revealed consistent drops in synonymous substitution rates (dS) in association with overlapping regions of an ORF.This thesis explores the cryptic information contained in viral genomes to help explain the evolutionary processes that shape them. In particular, understanding the impact of OvRFs on the evolution of virus genomes will provide us with crucial pieces of a significant puzzle — understanding the origin of new genes in virus genomes, and thereby virus diversity.

Dissertation

Share this book

Add to My Shelf

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

by Ferreira, Roux-Cil , Abayomi, Samuel Olabode , Connor Chato in Bioinformatics , Cores , Coverage

2021

Phylogenetics has played a pivotal role in the genomic epidemiology of SARS-CoV-2, such as tracking the emergence and global spread of variants, and scientific communication. However, the rapid accumulation of genomic data from around the world - with over two million genomes currently available in the GISAID database - is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2, and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into 'variants', generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neigihbor-joining trees in RapidNJ and converted into a majority-rule consensus tree for the lineage. Branches with support values below 50% or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly-sampled ancestral variants. Currently, we process about 1.6 million genomes in approximately nine hours on 34 cores. The resulting trees are visualized using the JavaScript framework D3.js as 'beadplots', in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu. Competing Interest Statement The authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter