Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
101
result(s) for
"Golding, G Brian"
Sort by:
Pervasive Cryptic Epistasis in Molecular Evolution
by
Golding, G. Brian
,
Dean, Antony M.
,
Lunzer, Mark
in
3-Isopropylmalate Dehydrogenase - classification
,
3-Isopropylmalate Dehydrogenase - genetics
,
3-Isopropylmalate Dehydrogenase - metabolism
2010
The functional effects of most amino acid replacements accumulated during molecular evolution are unknown, because most are not observed naturally and the possible combinations are too numerous. We created 168 single mutations in wild-type Escherichia coli isopropymalate dehydrogenase (IMDH) that match the differences found in wild-type Pseudomonas aeruginosa IMDH. 104 mutant enzymes performed similarly to E. coli wild-type IMDH, one was functionally enhanced, and 63 were functionally compromised. The transition from E. coli IMDH, or an ancestral form, to the functional wild-type P. aeruginosa IMDH requires extensive epistasis to ameliorate the combined effects of the deleterious mutations. This result stands in marked contrast with a basic assumption of molecular phylogenetics, that sites in sequences evolve independently of each other. Residues that affect function are scattered haphazardly throughout the IMDH structure. We screened for compensatory mutations at three sites, all of which lie near the active site and all of which are among the least active mutants. No compensatory mutations were found at two sites indicating that a single site may engage in compound epistatic interactions. One complete and three partial compensatory mutations of the third site are remote and lie in a different domain. This demonstrates that epistatic interactions can occur between distant (>20Å) sites. Phylogenetic analysis shows that incompatible mutations were fixed in different lineages.
Journal Article
A draft genome of Yersinia pestis from victims of the Black Death
by
McPhee, Joseph B.
,
Wood, James
,
Burbano, Hernán A.
in
631/181/19
,
631/208/212
,
692/699/255/1318
2011
Reconstruction of Black Death genome
The latest DNA recovery and sequencing technologies have been used to reconstruct the genome of the
Yersinia pestis
bacterium responsible for the Black Death pandemic of bubonic plague that spread across Europe in the fourteenth century. The genome was pieced together from total DNA extracted from the skeletal remains of four individuals excavated from a large cemetery on the site of the Royal Mint in East Smithfield in London, where more than 2,000 plague victims were buried in 1348 and 1349. The draft genome sequence does not differ substantially from modern
Y. pestis
strains, providing no answer to the question of why the Black Death was more deadly than modern bubonic plague outbreaks.
Technological advances in DNA recovery and sequencing have drastically expanded the scope of genetic analyses of ancient specimens to the extent that full genomic investigations are now feasible and are quickly becoming standard
1
. This trend has important implications for infectious disease research because genomic data from ancient microbes may help to elucidate mechanisms of pathogen evolution and adaptation for emerging and re-emerging infections. Here we report a reconstructed ancient genome of
Yersinia pestis
at 30-fold average coverage from Black Death victims securely dated to episodes of pestilence-associated mortality in London, England, 1348–1350. Genetic architecture and phylogenetic analysis indicate that the ancient organism is ancestral to most extant strains and sits very close to the ancestral node of all
Y. pestis
commonly associated with human infection. Temporal estimates suggest that the Black Death of 1347–1351 was the main historical event responsible for the introduction and widespread dissemination of the ancestor to all currently circulating
Y. pestis
strains pathogenic to humans, and further indicates that contemporary
Y. pestis
epidemics have their origins in the medieval era. Comparisons against modern genomes reveal no unique derived positions in the medieval organism, indicating that the perceived increased virulence of the disease during the Black Death may not have been due to bacterial phenotype. These findings support the notion that factors other than microbial genetics, such as environment, vector dynamics and host susceptibility, should be at the forefront of epidemiological discussions regarding emerging
Y. pestis
infections.
Journal Article
A new way to contemplate Darwin's tangled bank: how DNA barcodes are reconnecting biodiversity science and biomonitoring
by
Hajibabaei, Mehrdad
,
Fahner, Nicole A.
,
Golding, G. Brian
in
Biodiversity
,
Computational Biology - methods
,
Conservation of Natural Resources - methods
2016
Encompassing the breadth of biodiversity in biomonitoring programmes has been frustrated by an inability to simultaneously identify large numbers of species accurately and in a timely fashion. Biomonitoring infers the state of an ecosystem from samples collected and identified using the best available taxonomic knowledge. The advent of DNA barcoding has now given way to the extraction of bulk DNA from mixed samples of organisms in environmental samples through the development of high-throughput sequencing (HTS). This DNA metabarcoding approach allows an unprecedented view of the true breadth and depth of biodiversity, but its adoption poses two important challenges. First, bioinformatics techniques must simultaneously perform complex analyses of large datasets and translate the results of these analyses to a range of users. Second, the insights gained from HTS need to be amalgamated with concepts such as Linnaean taxonomy and indicator species, which are less comprehensive but more intuitive. It is clear that we are moving beyond proof-of-concept studies to address the challenge of implementation of this new approach for environmental monitoring and regulation. Interpreting Darwin's ‘tangled bank’ through a DNA lens is now a reality, but the question remains: how can this information be generated and used reliably, and how does it relate to accepted norms in ecosystem study?
This article is part of the themed issue ‘From DNA barcodes to biomes’.
Journal Article
Spatial Patterns of Gene Expression in Bacterial Genomes
by
Golding, G. Brian
,
Lato, Daniella F.
in
Animal Genetics and Genomics
,
Bacteria
,
Biomedical and Life Sciences
2020
Gene expression in bacteria is a remarkably controlled and intricate process impacted by many factors. One such factor is the genomic position of a gene within a bacterial genome. Genes located near the origin of replication generally have a higher expression level, increased dosage, and are often more conserved than genes located farther from the origin of replication. The majority of the studies involved with these findings have only noted this phenomenon in a single gene or cluster of genes that was re-located to pre-determined positions within a bacterial genome. In this work, we look at the overall expression levels from eleven bacterial data sets from
Escherichia coli
,
Bacillus subtilis
,
Streptomyces
, and
Sinorhizobium meliloti
. We have confirmed that gene expression tends to decrease when moving away from the origin of replication in majority of the replicons analysed in this study. This study sheds light on the impact of genomic location on molecular trends such as gene expression and highlights the importance of accounting for spatial trends in bacterial molecular analysis.
Journal Article
Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform
2015
Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome
c
oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.
Journal Article
Eighteenth century Yersinia pestis genomes reveal the long-term persistence of an historical plague focus
2016
The 14th–18th century pandemic of Yersinia pestis caused devastating disease outbreaks in Europe for almost 400 years. The reasons for plague’s persistence and abrupt disappearance in Europe are poorly understood, but could have been due to either the presence of now-extinct plague foci in Europe itself, or successive disease introductions from other locations. Here we present five Y. pestis genomes from one of the last European outbreaks of plague, from 1722 in Marseille, France. The lineage identified has not been found in any extant Y. pestis foci sampled to date, and has its ancestry in strains obtained from victims of the 14th century Black Death. These data suggest the existence of a previously uncharacterized historical plague focus that persisted for at least three centuries. We propose that this disease source may have been responsible for the many resurgences of plague in Europe following the Black Death. A bacterium called Yersina pestis is responsible for numerous human outbreaks of plague throughout history. It is carried by rats and other rodents and can spread to humans causing what we conventionally refer to as plague. The most notorious of these plague outbreaks – the Black Death – claimed millions of lives in Europe in the mid-14th century. Several other plague outbreaks emerged in Europe over the next 400 years. Then, there was a large gap before the plague re-emerged as threat in the 19th century and it continues to infect humans today, though on a smaller scale. Scientists have extensively studied Y. pestis to understand its origin and how it evolved to become such a deadly threat. These studies led to the assumption that the plague outbreaks of the 14–18th centuries likely originated in rodents in Asia and spread along trade routes to other parts of the world. However, it is not clear why the plague persisted in Europe for 400 years after the Black Death. Could the bacteria have gained a foothold in local rodents instead of being reintroduced from Asia each time? If it did, why did it then disappear for such a long period from the end of the 18th century? To help answer these questions, Bos, Herbig et al. sequenced the DNA of Y. pestis samples collected from the teeth of five individuals who died of plague during the last major European outbreak of plague in 1722 in Marseille, France. The DNA sequences of these bacterial samples were then compared with the DNA sequences of modern day Y. pestis and other historical samples of the bacteria. The results showed the bacteria in the Marseille outbreak likely evolved from the strain that caused the Black Death back in the 14th century. The comparisons showed that the strain isolated from the teeth is not found today, and may be extinct. This suggests that a historical reservoir for plague existed somewhere, perhaps in Asia, or perhaps in Europe itself, and was able to cause outbreaks up until the 18th century.Bos, Herbig et al.’s findings may help researchers trying to control the current outbreaks of the plague in Madagascar and other places.
Journal Article
Prediction of plant lncRNA by ensemble machine learning classifiers
by
Weretilnyk, Elizabeth A.
,
Golding, G. Brian
,
Simopoulos, Caitlin M. A.
in
Animal Genetics and Genomics
,
Biomedical and Life Sciences
,
Classifier
2018
Background
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation.
Results
Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from
Arabidopsis thaliana
,
Oryza sativa
and
Eutrema salsugineum
ranged from 51 to 83% with the highest agreement in
Eutrema salsugineum
. Most of the highest ranking predictions from
Arabidopsis thaliana
were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified.
Conclusions
This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Journal Article
Are similarity-or phylogeny-based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons?
2011
The internal transcribed spacer (ITS) of the nuclear ribosomal DNA region is a widely used species marker for plants and fungi. Recent metagenomic studies using next-generation sequencing, however, generate only partial ITS sequences. Here we compare the performance of partial and full-length ITS sequences with several classification methods. We compiled a full-length ITS data set and created short fragments to simulate the read lengths commonly recovered from current next-generation sequencing platforms. We compared recovery, erroneous recovery, and coverage for the following methods: best BLAST hit classification, MEGAN classification, and automated phylogenetic assignment using the Statistical Assignment Program (SAP). We found that summarizing results with more inclusive taxonomie ranks increased recovery and reduced erroneous recovery. The similarity-based methods BLAST and MEGAN performed consistently across most fragment lengths. Using a phylogeny-based method, SAP runs with queries 400 bp or longer worked best. Overall, BLAST had the highest recovery rates and MEGAN had the lowest erroneous recovery rates. A high-throughput ITS classification method should be selected, taking into consideration read length, an acceptable tradeoff between maximizing the total number of classifications and minimizing the number of erroneous classifications, and the computational speed of the assignment method.
Journal Article
Explainability of Protein Deep Learning Models
2025
Protein embeddings are the new main source of information about proteins, producing state-of-the-art solutions to many problems, including protein interaction prediction, a fundamental issue in proteomics. Understanding the embeddings and what causes the interactions is very important, as these models lack transparency due to their black-box nature. In the first study of its kind, we investigate the inner workings of these models using XAI (explainable AI) approaches. We perform extensive testing (3.3 TB of total data) involving nine of the best-known XAI methods on two problems: (i) the prediction of protein interaction sites using the current top method, Seq-InSite, and (ii) the production of protein embedding vectors using three methods, ProtBERT, ProtT5, and Ankh. The results are evaluated in terms of their ability to correlate with six basic amino acid properties—aromaticity, acidity/basicity, hydrophobicity, molecular mass, van der Waals volume, and dipole moment—as well as the propensity for interaction with other proteins, the impact of distant residues, and the infidelity scores of the XAI methods. The results are unexpected. Some XAI methods are much better than others at discovering essential information. Simple methods can be as good as advanced ones. Different protein embedding vectors can capture distinct properties, indicating significant room for improvement in embedding quality.
Journal Article
LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data
by
Porter, Teresita M.
,
Hajibabaei, Mehrdad
,
Rudar, Josip
in
Algorithms
,
Analysis
,
Approximation
2022
Background
Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery.
Results
We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (
p
≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries.
Conclusions
Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models.
Journal Article