Catalogue Search | MBRL

FLAVI: An Amino Acid Substitution Model for Flaviviruses

by Le, Thu Kim , Vinh, Le Sy in Amino acid substitution , Amino acids , Animal Genetics and Genomics

2020

Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.

Journal Article

Share this book

Add to My Shelf

Genetic landscape of autism spectrum disorder in Vietnamese children

by Ly, Ha Thi Thanh , Ha, Lien Thi , Le, Vinh Sy in 45/22 , 45/23 , 45/29

2020

Autism spectrum disorder (ASD) is a complex disorder with an unclear aetiology and an estimated global prevalence of 1%. However, studies of ASD in the Vietnamese population are limited. Here, we first conducted whole exome sequencing (WES) of 100 children with ASD and their unaffected parents. Our stringent analysis pipeline was able to detect 18 unique variants (8 de novo and 10 ×-linked, all validated), including 12 newly discovered variants. Interestingly, a notable number of X-linked variants were detected (56%), and all of them were found in affected males but not in affected females. We uncovered 17 genes from our ASD cohort in which CHD8 , DYRK1A , GRIN2B , SCN2A , OFD1 and MDB5 have been previously identified as ASD risk genes, suggesting the universal aetiology of ASD for these genes. In addition, we identified six genes that have not been previously reported in any autism database: CHM , ENPP1 , IGF1 , LAS1L, SYP and TBX22 . Gene ontology and phenotype-genotype analysis suggested that variants in IGF1 , SYP and LAS1L could plausibly confer risk for ASD. Taken together, this study adds to the genetic heterogeneity of ASD and is the first report elucidating the genetic landscape of ASD in Vietnamese children.

Journal Article

Share this book

Add to My Shelf

Novel findings from family-based exome sequencing for children with biliary atresia

by Tran, Quynh Anh , Le, Vinh Sy , Tran, Kien Trung in 631/208 , 631/208/1516 , 631/208/737

2021

Biliary atresia (BA) is a progressive inflammation and fibrosis of the biliary tree characterized by the obstruction of bile flow, which results in liver failure, scarring and cirrhosis. This study aimed to explore the elusive aetiology of BA by conducting whole exome sequencing for 41 children with BA and their parents (35 trios, including 1 family with 2 BA-diagnosed children and 5 child-mother cases). We exclusively identified and validated a total of 28 variants (17 X-linked, 6 de novo and 5 homozygous) in 25 candidate genes from our BA cohort. These variants were among the 10% most deleterious and had a low minor allele frequency against the employed databases: Kinh Vietnamese (KHV), GnomAD and 1000 Genome Project. Interestingly, AMER1 , INVS and OCRL variants were found in unrelated probands and were first reported in a BA cohort. Liver specimens and blood samples showed identical variants, suggesting that somatic variants were unlikely to occur during morphogenesis. Consistent with earlier attempts, this study implicated genetic heterogeneity and non-Mendelian inheritance of BA.

Journal Article

Share this book

Add to My Shelf

UFBoot2: Improving the Ultrafast Bootstrap Approximation

by Arndt von Haeseler , Chernomor, Olga , Bui, Quang Minh in Approximation , Computing time , Resampling

2018

The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.

Journal Article

Share this book

Add to My Shelf

QMaker

by Lanfear, Robert , Vinh, Le Sy , Minh, Bui Quang in Amino acid substitution , Amino acids , Heterogeneity

2021

Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible Q matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.

Journal Article

Share this book

Add to My Shelf

MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation

by Hoang, Diep Thi , Stamatakis, Alexandros , Vinh, Le Sy in Analysis , Animal Systematics/Taxonomy/Biogeography , Biomedical and Life Sciences

2018

Background The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. Results To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2–20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3–63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. Conclusions MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .

Journal Article

Share this book

Add to My Shelf

Improved mitochondrial amino acid substitution models for metazoan evolutionary studies

by Le, Quang Si , Le, Vinh Sy , Dang, Cuong Cao in Advantages , Amino acid sequence , Amino Acid Substitution

2017

Background Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences. Results We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes. Conclusions We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data.

Journal Article

Share this book

Add to My Shelf

AMRomics: a scalable workflow to analyze large microbial genome collections

by Ho, TH , Nguyen, T , Vo, NS in Analysis , Animal Genetics and Genomics , Antibiotics

2024

Whole genome analysis for microbial genomics is critical to studying and monitoring antimicrobial resistance strains. The exponential growth of microbial sequencing data necessitates a fast and scalable computational pipeline to generate the desired outputs in a timely and cost-effective manner. Recent methods have been implemented to integrate individual genomes into large collections of specific bacterial populations and are widely employed for systematic genomic surveillance. However, they do not scale well when the population expands and turnaround time remains the main issue for this type of analysis. Here, we introduce AMRomics, an optimized microbial genomics pipeline that can work efficiently with big datasets. We use different bacterial data collections to compare AMRomics against competitive tools and show that our pipeline can generate similar results of interest but with better performance. The software is open source and is publicly available at https://github.com/amromics/amromics under an MIT license.

Journal Article

Share this book

Add to My Shelf

De novo copy number variations in candidate genomic regions in patients of severe autism spectrum disorder in Vietnam

by Nguyen, Kien Trung , Ly, Ha Thi Thanh , Pham, Linh Thi Dieu in Autism Spectrum Disorder - diagnosis , Autism Spectrum Disorder - genetics , Biology and Life Sciences

2024

Autism spectrum disorder (ASD) is a developmental disorder with a prevalence of around 1% children worldwide and characterized by patient behaviour (communication, social interaction, and personal development). Data on the efficacy of diagnostic tests using copy number variations (CNVs) in candidate genes in ASD is currently around 10% but it is overrepresented by patients of Caucasian background. We report here that the diagnostic success of de novo candidate CNVs in Vietnamese ASD patients is around 6%. We recruited one hundred trios (both parents and a child) where the child was clinically diagnosed with ASD while the parents were not affected. We performed genetic screening to exclude RETT syndrome and Fragile X syndrome and performed genome-wide DNA microarray (aCGH) on all probands and their parents to analyse for de novo CNVs. We detected 1708 non-redundant CNVs in 100 patients and 118 (7%) of them were de novo . Using the filter for known CNVs from the Simons Foundation Autism Research Initiative (SFARI) database, we identified six CNVs (one gain and five loss CNVs) in six patients (3 males and 3 females). Notably, 3 of our patients had a deletion involving the SHANK3 gene–which is the highest compared to previous reports. This is the first report of candidate CNVs in ASD patients from Vietnam and provides the framework for building a CNV based test as the first tier screening for clinical management.

Journal Article

Share this book

Add to My Shelf

Loss of matK RNA editing in seed plant chloroplasts

by Schulerowitz, Katrin , Tillich, Michael , Maier, Uwe G in Base Sequence , Biological Evolution , Chloroplasts

2009

RNA editing in chloroplasts of angiosperms proceeds by C-to-U conversions at specific sites. Nuclear-encoded factors are required for the recognition of cis-elements located immediately upstream of editing sites. The ensemble of editing sites in a chloroplast genome differs widely between species, and editing sites are thought to evolve rapidly. However, large-scale analyses of the evolution of individual editing sites have not yet been undertaken. Here, we analyzed the evolution of two chloroplast editing sites, matK-2 and matK-3, for which DNA sequences from thousands of angiosperm species are available. Both sites are found in most major taxa, including deep-branching families such as the nymphaeaceae. However, 36 isolated taxa scattered across the entire tree lack a C at one of the two matK editing sites. Tests of several exemplary species from this in silico analysis of matK processing unexpectedly revealed that one of the two sites remain unedited in almost half of all species examined. A comparison of sequences between editors and non-editors showed that specific nucleotides co-evolve with the C at the matK editing sites, suggesting that these nucleotides are critical for editing-site recognition. (i) Both matK editing sites were present in the common ancestor of all angiosperms and have been independently lost multiple times during angiosperm evolution.(ii) The editing activities corresponding to matK-2 and matK-3 are unstable.(iii) A small number of third-codon positions in the vicinity of editing sites are selectively constrained independent of the presence of the editing site, most likely because of interacting RNA-binding proteins.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter