Catalogue Search | MBRL

Modeling of the GC content of the substituted bases in bacterial core genomes

by Alfsnes, Kristian , Brynildsrud, Ola , Bohlin, Jon in Analysis , Animal Genetics and Genomics , Bacteria

2018

Background The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT → GC ( α ) and GC → AT ( β ) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above. Results We found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study. Conclusion Not only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa.

Journal Article

Share this book

Add to My Shelf

Hidden Compositional Heterogeneity of Fish Chromosomes in the Era of Polished Genome Assemblies

by Symonová, Radka , Vohnoutová, Marta , Žifčáková, Lucia in Anabas testudineus , Artificial chromosomes , chromosome visualisation

2023

Fish chromosomes are considered homogeneous in their AT/GC nucleotide composition, and banding patterns enabling identification of homologs are largely missing. While cytogenomic approaches try to compensate for this issue by virtual karyotyping, they rely on the quality of genome assemblies available. Recently, soft-masked genome assemblies combining costly and arduous long- and short-read sequencing and new generation assemblers became available for two teleost fish species, climbing perch (Anabas testudineus) and channel bull blenny (Cottoperca gobio). Soft-masking turns repetitive sequences in a genome assembly into lower case letters, leaving unique sequences in upper case. This enables investigators to assess the proportion of guanine and cytosine nucleotides (GC%) of transposable elements as an indicator of AT/GC homogenisation in fish. We have developed a new version of our Python tool Evan, which utilises chromosome-level genome assemblies and combines the profiles of GC% and the proportion of repeats (rep%) along chromosomes. Our profiles of both of those fishes showed clear and abrupt but small-scale fluctuations in GC% along otherwise compositionally homogenised sequences. Our study also highlights the key role of the sliding window size in determining the resolution of GC% profiling. While the quality of the genome assemblies appeared to be sufficient for GC%/rep% profiling, more effective repeat masking is necessary to better distinguish to what extent repeats compositionally homogenize fish genomes.

Journal Article

Share this book

Add to My Shelf

GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae

by Sheng, Ziwei , Petes, Thomas D. , Kiktev, Denis A. in Amino acid sequence , Amino acids , Baking yeast

2018

The chromosomes of many eukaryotes have regions of high GC content interspersed with regions of low GC content. In the yeast Saccharomyces cerevisiae, high-GC regions are often associated with high levels of meiotic recombination. In this study, we constructed URA3 genes that differ substantially in their base composition [URA3-AT (31% GC), URA3-WT (43% GC), and URA3-GC (63% GC)] but encode proteins with the same amino acid sequence. The strain with URA3-GC had an approximately sevenfold elevated rate of ura3 mutations compared with the strains with URA3-WT or URA3-AT. About half of these mutations were single-base substitutions and were dependent on the error-prone DNA polymerase ζ. About 30% were deletions or duplications between short (5–10 base) direct repeats resulting from DNA polymerase slippage. The URA3-GC gene also had elevated rates of meiotic and mitotic recombination relative to the URA3-AT or URA3-WT genes. Thus, base composition has a substantial effect on the basic parameters of genome stability and evolution.

Journal Article

Share this book

Add to My Shelf

A positive correlation between GC content and growth temperature in prokaryotes

by Gao, Jie , Niu, Deng-Ke , Lan, Xin-Ran in Animal Genetics and Genomics , Archaea , Archaea - genetics

2022

Background GC pairs are generally more stable than AT pairs; GC-rich genomes were proposed to be more adapted to high temperatures than AT-rich genomes. Previous studies consistently showed positive correlations between growth temperature and the GC contents of structural RNA genes. However, for the whole genome sequences and the silent sites of the codons in protein-coding genes, the relationship between GC content and growth temperature is in a long-lasting debate. Results With a dataset much larger than previous studies (681 bacteria and 155 archaea with completely assembled genomes), our phylogenetic comparative analyses showed positive correlations between optimal growth temperature (Topt) and GC content both in bacterial and archaeal structural RNA genes and in bacterial whole genome sequences, chromosomal sequences, plasmid sequences, core genes, and accessory genes. However, in the 155 archaea, we did not observe a significant positive correlation of Topt with whole-genome GC content (GC w ) or GC content at four-fold degenerate sites. We randomly drew 155 samples from the 681 bacteria for 1000 rounds. In most cases (> 95%), the positive correlations between Topt and genomic GC contents became statistically nonsignificant ( P > 0.05). This result suggested that the small sample sizes might account for the lack of positive correlations between growth temperature and genomic GC content in the 155 archaea and the bacterial samples of previous studies. Comparing the GC content among four categories (psychrophiles/psychrotrophiles, mesophiles, thermophiles, and hyperthermophiles) also revealed a positive correlation between GC w and growth temperature in bacteria. By including the GC w of incompletely assembled genomes, we expanded the sample size of archaea to 303. Positive correlations between GC w and Topt appear especially after excluding the halophilic archaea whose GC contents might be strongly shaped by intense UV radiation. Conclusions This study explains the previous contradictory observations and ends a long debate. Prokaryotes growing in high temperatures have higher GC contents. Thermal adaptation is one possible explanation for the positive association. Meanwhile, we propose that the elevated efficiency of DNA repair in response to heat mutagenesis might have the by-product of increasing GC content like that happens in intracellular symbionts and marine bacterioplankton.

Journal Article

Share this book

Add to My Shelf

On the length, weight and GC content of the human genome

by Vitale, Lorenza , Antonaros, Francesca , Piovesan, Allison in Analysis , Biomedical and Life Sciences , Biomedicine

2019

Objective Basic parameters commonly used to describe genomes including length, weight and relative guanine-cytosine (GC) content are widely cited in absence of a primary source. By using updated data and original software we determined these values to the best of our knowledge as standard reference for the whole human nuclear genome, for each chromosome and for mitochondrial DNA. We also devised a method to calculate the relative GC content in the whole messenger RNA sequence set and in transcriptomes by multiplying the GC content of each gene by its mean expression level. Results The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg. The individual variability and the implication for the DNA informational density in terms of bits/volume were discussed. The genomic GC content is 40.9%. Following analysis in different transcriptomes and species, we showed that the greatest deviation was observed in the pathological condition analysed (trisomy 21 leukaemic cells) and in Caenorhabditis elegans . Our results may represent a solid basis for further investigation on human structural and functional genomics while also providing a framework for other genome comparative analysis.

Journal Article

Share this book

Add to My Shelf

SEQUENCE DIVERSITY AND MOLECULAR EVOLUTION ANALYSIS OF INTERNATIONAL TOMATO BASED ON THE ENTIRE ITS REGION

by Hawash, Mohammed M. , A. Hajeej, Thaer Hamid , Al-Shahwany, Ayyad W. in GC content , Haplotypes , Mismatch

2026

This study aimed to assess the genetic diversity and molecular evolution of tomato (Solanum lycopersicum) populations from various countries, which hold significant potential for future breeding strategies and germplasm conservation. To achieve this, a total of 15 sequences deposited in GenBank were analyzed using the complete internal transcribed spacer (ITS) region of the nuclear ribosomal DNA (nrDNA). The spacer sequence lengths ranged from 156 base pairs (bp) in the Swedish tomato to 713 bp in the Palestinian tomato. A notable variation in GC content was observed, with the Thai tomato exhibiting the highest value (67.48%) and the South Korean tomato the lowest (49.56%). Phylogenetic trees were constructed using both the distance-based Neighbor-Joining (NJ) and Maximum Parsimony (MP) methods. Sequence analysis revealed 38 monomorphic (invariable) sites and 15 polymorphic sites, of which 14 were singleton variable sites and one was parsimony-informative. Alignment of the 15 sequences enabled the identification of five haplotypes. The estimated transition/transversion bias (R) was 17.339, indicating a greater frequency of transitions over transversions in this region. Neutrality tests, including Tajima’s D and Fu and Li’s statistics, produced statistically significant results. The highest levels of genetic diversity were observed in South Korean, Iraqi, and Indian tomato samples. هدفت هذه الدراسة إلى تقييم التنوع الوراثي والتطور الجزيئي لمجتمعات الطماطم (Solanum lycopersicum) من دول مختلفة، لما لها من أهمية كبيرة في تطوير استراتيجيات التربية المستقبلية وحفظ الموارد الوراثية, و لذلك، قد تم تحليل 15 تسلسلاً وراثيًا مودعًا في قاعدة بيانات GenBank باستخدام منطقة فاصل النسخ الداخلي (ITS) بالكامل من الحمض النووي الريبوسومي (nrDNA), تراوح طول تسلسلات الفاصل بين 156 زوجًا قاعديًا في الطماطم السويدية و713 زوجًا قاعديًا في الطماطم الفلسطينية, كما لوحظ تباين واضح في محتوى GC، حيث سجلت الطماطم التايلاندية أعلى نسبة (67.48%)، بينما سجلت الطماطم الكورية الجنوبية أدنى نسبة (49.56%)وقد تم بناء الأشجار الوراثية باستخدام طريقتي ربط الجوار الأقرب (Neighbor-Joining) والحد الأدنى للتغير (Maximum Parsimony), وكشف تحليل التسلسلات عن وجود 38 موقعًا ثابتًا (أحادِي الشكل) و15 موقعًا متغيرًا، منها 14 موقعًا متغيرًا منفردًا وموقع واحد معلوماتياً (parsimony-informative), وقد أتاحت محاذاة التسلسلات التعرف على خمسة أنماط فردية (haplotypes), وبلغ معدل انحياز التحولات/التبدالات (R) نحو 17.339، مما يشير إلى أن التحولات تحدث بمعدل أكبر من التبدالات في هذه المنطقة, كما أظهرت اختبارات الحيادية، بما في ذلك اختبار تاجما (Tajima’s D) واختبار فو ولي (Fu and Li)، نتائج ذات دلالة إحصائية.

Journal Article

Share this book

Add to My Shelf

Codon Usage Bias in Animals: Disentangling the Effects of Natural Selection, Effective Population Size, and GC-Biased Gene Conversion

by Duret, Laurent , Roux, Camille , Figuet, Emeric in Animal species , Animals , Bias

2018

Selection on codon usage bias is well documented in a number of microorganisms. Whether codon usage is also generally shaped by natural selection in large organisms, despite their relatively small effective population size (Ne), is unclear. In animals, the population genetics of codon usage bias has only been studied in a handful of model organisms so far, and can be affected by confounding, nonadaptive processes such as GC-biased gene conversion and experimental artefacts. Using population transcriptomics data, we analyzed the relationship between codon usage, gene expression, allele frequency distribution, and recombination rate in 30 nonmodel species of animals, each from a different family, covering a wide range of effective population sizes. We disentangled the effects of translational selection and GC-biased gene conversion on codon usage by separately analyzing GC-conservative and GC-changing mutations. We report evidence for effective translational selection on codon usage in large-Ne species of animals, but not in small-Ne ones, in agreement with the nearly neutral theory of molecular evolution. C- and T-ending codons tend to be preferred over synonymous G- and A-ending ones, for reasons that remain to be determined. In contrast, we uncovered a conspicuous effect of GC-biased gene conversion, which is widespread in animals and the main force determining the fate of AT↔GC mutations. Intriguingly, the strength of its effect was uncorrelated with Ne.

Journal Article

Share this book

Add to My Shelf

GC content shapes mRNA storage and decay in human cells

by Yi, Zhou , Morillon, Antonin , Brest, Patrick in Base Composition , Base Composition - genetics , Chromosomes and Gene Expression

2019

mRNA translation and decay appear often intimately linked although the rules of this interplay are poorly understood. In this study, we combined our recent P-body transcriptome with transcriptomes obtained following silencing of broadly acting mRNA decay and repression factors, and with available CLIP and related data. This revealed the central role of GC content in mRNA fate, in terms of P-body localization, mRNA translation and mRNA stability: P-bodies contain mostly AU-rich mRNAs, which have a particular codon usage associated with a low protein yield; AU-rich and GC-rich transcripts tend to follow distinct decay pathways; and the targets of sequence-specific RBPs and miRNAs are also biased in terms of GC content. Altogether, these results suggest an integrated view of post-transcriptional control in human cells where most translation regulation is dedicated to inefficiently translated AU-rich mRNAs, whereas control at the level of 5’ decay applies to optimally translated GC-rich mRNAs.

Journal Article

Share this book

Add to My Shelf

Diversity in genome size and GC content shows adaptive potential in orchids and is closely linked to partial endoreplication, plant life-history traits and climatic conditions

by Trávníček, Pavel , Chumová, Zuzana , Jersáková, Jana in Angiosperms , Brownian motion , Climatic conditions

2019

• In angiosperms, genome size and nucleobase composition (GC content) exhibit pronounced variation with possible adaptive consequences. The hyperdiverse orchid family possessing the unique phenomenon of partial endoreplication (PE) provides a great opportunity to search for interactions of both genomic traits with the evolutionary history of the family. • Using flow cytometry, we report values of both genomic traits and the type of endoreplication for 149 orchid species and compare these with a suite of life-history traits and climatic niche data using phylogeny-based statistics. The evolution of genomic traits was further studied using the Brownian motion (BM) and Ornstein–Uhlenbeck (OU) models to access their adaptive potential. • Pronounced variation in genome size (341–54 878 Mb), and especially in GC content (23.9–50.5%), was detected among orchids. Diversity in both genomic traits was closely related to the type of endoreplication, plant growth form and climatic conditions. GC content was also associated with the type of dormancy. In all tested scenarios, OU models always out-performed BM models. • Unparalleled GC content variation was discovered in orchids, setting new limits for plants. Our study indicates that diversity in both genome size and GC content has adaptive consequences and is tightly linked with evolutionary transitions to PE.

Journal Article

Share this book

Add to My Shelf

A Molecular Portrait of De Novo Genes in Yeasts

by Opulente, Dana A , Achaz, Guillaume , Lafontaine, Ingrid in Developmental biology , Evolution , Genes

2018

New genes, with novel protein functions, can evolve “from scratch” out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter