Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
52 result(s) for "Si, Quang Le"
Sort by:
Phylogenetic mixture models for proteins
Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution. We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TreeBase. We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TreeBase test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.
Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics Is Clearly Beneficial
Amino acid substitution models are essential to most methods to infer phylogenies from protein data. These models represent the ways in which proteins evolve and substitutions accumulate along the course of time. It is widely accepted that the substitution processes vary depending on the structural configuration of the protein residues. However, this information is very rarely used in phylogenetic studies, though the 3-dimensional structure of dozens of thousands of proteins has been elucidated. Here, we reinvestigate the question in order to fill this gap. We use an improved estimation methodology and a very large database comprising 1471 nonredundant globular protein alignments with structural annotations to estimate new amino acid substitution models accounting for the secondary structure and solvent accessibility of the residues. These models incorporate a confidence coefficient that is estimated from the data and reflects the reliability and usefulness of structural annotations in the analyzed sequences. Our results with 300 independent test alignments show an impressive likelihood gain compared with standard models such as JTT or WAG. Moreover, the use of these models induces significant topological changes in the inferred trees, which should be of primary interest to phylogeneticists. Our data, models, and software are available for download from http://atgc.lirmm.fr/phyml-structure/.
Cardiovascular magnetic resonance by non contrast T1-mapping allows assessment of severity of injury in acute myocardial infarction
Current cardiovascular magnetic resonance (CMR) methods, such as late gadolinium enhancement (LGE) and oedema imaging (T2W) used to depict myocardial ischemia, have limitations. Novel quantitative T1-mapping techniques have the potential to further characterize the components of ischemic injury. In patients with myocardial infarction (MI) we sought to investigate whether state-of the art pre-contrast T1-mapping (1) detects acute myocardial injury, (2) allows for quantification of the severity of damage when compared to standard techniques such as LGE and T2W, and (3) has the ability to predict long term functional recovery. 3T CMR including T2W, T1-mapping and LGE was performed in 41 patients [of these, 78% were ST elevation MI (STEMI)] with acute MI at 12-48 hour after chest pain onset and at 6 months (6M). Patients with STEMI underwent primary PCI prior to CMR. Assessment of acute regional wall motion abnormalities, acute segmental damaged fraction by T2W and LGE and mean segmental T1 values was performed on matching short axis slices. LGE and improvement in regional wall motion at 6M were also obtained. We found that the variability of T1 measurements was significantly lower compared to T2W and that, while the diagnostic performance of acute T1-mapping for detecting myocardial injury was at least as good as that of T2W-CMR in STEMI patients, it was superior to T2W imaging in NSTEMI. There was a significant relationship between the segmental damaged fraction assessed by either by LGE or T2W, and mean segmental T1 values (P < 0.01). The index of salvaged myocardium derived by acute T1-mapping and 6M LGE was not different to the one derived from T2W (P = 0.88). Furthermore, the likelihood of improvement of segmental function at 6M decreased progressively as acute T1 values increased (P < 0.0004). In acute MI, pre-contrast T1-mapping allows assessment of the extent of myocardial damage. T1-mapping might become an important complementary technique to LGE and T2W for identification of reversible myocardial injury and prediction of functional recovery in acute MI.
An Improved General Amino Acid Replacement Matrix
Amino acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement matrices and methods to estimate these matrices from protein alignments have been proposed since the seminal work of Dayhoff et al. (1972). An important advance was achieved by Whelan and Goldman (2001) and their WAG matrix, thanks to an efficient maximum likelihood estimation approach that accounts for the phylogenies of sequences within each training alignment. We further refine this method by incorporating the variability of evolutionary rates across sites in the matrix estimation and using a much larger and diverse database than BRKALN, which was used to estimate WAG. To estimate our new matrix (called LG after the authors), we use an adaptation of the XRATE software and 3,912 alignments from Pfam, comprising ~50,000 sequences and ~6.5 million residues overall. To evaluate the LG performance, we use an independent sample consisting of 59 alignments from TreeBase and randomly divide Pfam alignments into 3,412 training and 500 test alignments. The comparison with WAG and JTT shows a clear likelihood improvement. With TreeBase, we find that 1) the average Akaike information criterion gain per site is 0.25 and 0.42, when compared with WAG and JTT, respectively; 2) LG is significantly better than WAG for 38 alignments (among 59), and significantly worse with 2 alignments only; and 3) tree topologies inferred with LG, WAG, and JTT frequently differ, indicating that using LG impacts not only the likelihood value but also the output tree. Results with the test alignments from Pfam are analogous. LG and a PHYML implementation can be downloaded from http://atgc.lirmm.fr/LG. [PUBLICATION ABSTRACT]
FLU, an amino acid substitution model for influenza proteins
Background The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses. Results A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from ~113, 000 influenza protein sequences, consisting of ~20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains ~42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/ . Conclusions FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.
Modeling Protein Evolution with Several Amino Acid Replacement Matrices Depending on Site Rates
Most protein substitution models use a single amino acid replacement matrix summarizing the biochemical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors that influence the substitution patterns. In this paper, we investigate the use of different substitution matrices for different site evolutionary rates. Indeed, the variability of evolutionary rates corresponds to one of the most apparent heterogeneity factors among sites, and there is no reason to assume that the substitution patterns remain identical regardless of the evolutionary rate. We first introduce LG4M, which is composed of four matrices, each corresponding to one discrete gamma rate category (of four). These matrices differ in their amino acid equilibrium distributions and in their exchangeabilities, contrary to the standard gamma model where only the global rate differs from one category to another. Next, we present LG4X, which also uses four different matrices, but leaves aside the gamma distribution and follows a distribution-free scheme for the site rates. All these matrices are estimated from a very large alignment database, and our two models are tested using a large sample of independent alignments. Detailed analysis of resulting matrices and models shows the complexity of amino acid substitutions and the advantage of flexible models such as LG4M and LG4X. Both significantly outperform single-matrix models, providing gains of dozens to hundreds of log-likelihood units for most data sets. LG4X obtains substantial gains compared with LG4M, thanks to its distribution-free scheme for site rates. Since LG4M and LG4X display such advantages but require the same memory space and have comparable running times to standard models, we believe that LG4M and LG4X are relevant alternatives to single replacement matrices. Our models, data, and software are available from http://www.atgc-montpellier.fr/models/lg4x.
Improved mitochondrial amino acid substitution models for metazoan evolutionary studies
Background Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences. Results We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes. Conclusions We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data.
Admixture into and within sub-Saharan Africa
Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology. Our genomes contain a record of historical events. This is because when groups of people are separated for generations, the DNA sequence in the two groups’ genomes will change in different ways. Looking at the differences in the genomes of people from the same population can help researchers to understand and reconstruct the historical interactions that brought their ancestors together. The mixing of two populations that were previously separate is known as admixture. Africa as a continent has few written records of its history. This means that it is somewhat unknown which important movements of people in the past generated the populations found in modern-day Africa. Busby et al. have now attempted to use DNA to look into this and reconstruct the last 4000 years of genetic history in African populations. As has been shown in other regions of the world, the new analysis showed that all African populations are the result of historical admixture events. However, Busby et al. could characterize these events to unprecedented level of detail. For example, multiple ethnic groups from The Gambia and Mali all show signs of sharing the same set of ancestors from West Africa, Europe and Asia who mixed around 2000 years ago. Evidence of a migration of people from Central West Africa, known as the Bantu expansion, could also be detected, and was shown to carry genes to the south and east. An important next step will be to now look at the consequences of the observed gene-flow, and ask if it has contributed to spreading beneficial, or detrimental, mutations around Africa.
FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets
Background Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled. Results The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used. Conclusions FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com .
Whole genome analysis of a Vietnamese trio
ABSTRACT We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.