Catalogue Search | MBRL

Identification of the Pangenome and Its Components in 14 Distinct Aggregatibacter actinomycetemcomitans Strains by Comparative Genomic Analysis

by Asikainen, Sirkka , Chen, Casey , Bumgarner, Roger E. in Actinobacillus - genetics , Actinobacillus - pathogenicity , Aggregatibacter actinomycetemcomitans

2011

Aggregatibacter actinomycetemcomitans is genetically heterogeneous and comprises distinct clonal lineages that may have different virulence potentials. However, limited information of the strain-to-strain genomic variations is available. The genome sequences of 11 A. actinomycetemcomitans strains (serotypes a-f) were generated de novo, annotated and combined with three previously sequenced genomes (serotypes a-c) for comparative genomic analysis. Two major groups were identified; serotypes a, d, e, and f, and serotypes b and c. A serotype e strain was found to be distinct from both groups. The size of the pangenome was 3,301 genes, which included 2,034 core genes and 1,267 flexible genes. The number of core genes is estimated to stabilize at 2,060, while the size of the pangenome is estimated to increase by 16 genes with every additional strain sequenced in the future. Within each strain 16.7-29.4% of the genome belonged to the flexible gene pool. Between any two strains 0.4-19.5% of the genomes were different. The genomic differences were occasionally greater for strains of the same serotypes than strains of different serotypes. Furthermore, 171 genomic islands were identified. Cumulatively, 777 strain-specific genes were found on these islands and represented 61% of the flexible gene pool. Substantial genomic differences were detected among A. actinomycetemcomitans strains. Genomic islands account for more than half of the flexible genes. The phenotype and virulence of A. actinomycetemcomitans may not be defined by any single strain. Moreover, the genomic variation within each clonal lineage of A. actinomycetemcomitans (as defined by serotype grouping) may be greater than between clonal lineages. The large genomic data set in this study will be useful to further examine the molecular basis of variable virulence among A. actinomycetemcomitans strains.

Journal Article

Share this book

Add to My Shelf

Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks

by Drees, Becky , Schadt, Eric E , Smith, Erin N in Agriculture , Animal Genetics and Genomics , Bayes Theorem

2008

Eric Schadt and colleagues report the construction of yeast regulatory networks from multiple sources of large-scale functional genomic data, and show that a network constructed from the integration of genotypic, transcription factor binding site, and protein–protein interaction data is the most predictive. A key goal of biology is to construct networks that predict complex system behavior. We combine multiple types of molecular data, including genotypic, expression, transcription factor binding site (TFBS), and protein–protein interaction (PPI) data previously generated from a number of yeast experiments, in order to reconstruct causal gene networks. Networks based on different types of data are compared using metrics devised to assess the predictive power of a network. We show that a network reconstructed by integrating genotypic, TFBS and PPI data is the most predictive. This network is used to predict causal regulators responsible for hot spots of gene expression activity in a segregating yeast population. We also show that the network can elucidate the mechanisms by which causal regulators give rise to larger-scale changes in gene expression activity. We then prospectively validate predictions, providing direct experimental evidence that predictive networks can be constructed by integrating multiple, appropriate data types.

Journal Article

Share this book

Add to My Shelf

Stitching together Multiple Data Dimensions Reveals Interacting Metabolomic and Transcriptomic Networks That Modulate Cell Regulation

by Schadt, Eric E. , Brem, Rachel B. , Bumgarner, Roger E. in Biology , Biosynthetic Pathways - genetics , Cellular control mechanisms

2012

Cells employ multiple levels of regulation, including transcriptional and translational regulation, that drive core biological processes and enable cells to respond to genetic and environmental changes. Small-molecule metabolites are one category of critical cellular intermediates that can influence as well as be a target of cellular regulations. Because metabolites represent the direct output of protein-mediated cellular processes, endogenous metabolite concentrations can closely reflect cellular physiological states, especially when integrated with other molecular-profiling data. Here we develop and apply a network reconstruction approach that simultaneously integrates six different types of data: endogenous metabolite concentration, RNA expression, DNA variation, DNA-protein binding, protein-metabolite interaction, and protein-protein interaction data, to construct probabilistic causal networks that elucidate the complexity of cell regulation in a segregating yeast population. Because many of the metabolites are found to be under strong genetic control, we were able to employ a causal regulator detection algorithm to identify causal regulators of the resulting network that elucidated the mechanisms by which variations in their sequence affect gene expression and metabolite concentrations. We examined all four expression quantitative trait loci (eQTL) hot spots with colocalized metabolite QTLs, two of which recapitulated known biological processes, while the other two elucidated novel putative biological mechanisms for the eQTL hot spots.

Journal Article

Share this book

Add to My Shelf

Variable Nitrogen Fixation in Wild Populus

by Sher, Andrew W. , Fleck, Neil D. , Bumgarner, Roger E. in Acetylene , Acetylene reduction , Bacteria

2016

The microbiome of plants is diverse, and like that of animals, is important for overall health and nutrient acquisition. In legumes and actinorhizal plants, a portion of essential nitrogen (N) is obtained through symbiosis with nodule-inhabiting, N2-fixing microorganisms. However, a variety of non-nodulating plant species can also thrive in natural, low-N settings. Some of these species may rely on endophytes, microorganisms that live within plants, to fix N2 gas into usable forms. Here we report the first direct evidence of N2 fixation in the early successional wild tree, Populus trichocarpa, a non-leguminous tree, from its native riparian habitat. In order to measure N2 fixation, surface-sterilized cuttings of wild poplar were assayed using both 15N2 incorporation and the commonly used acetylene reduction assay. The 15N label was incorporated at high levels in a subset of cuttings, suggesting a high level of N-fixation. Similarly, acetylene was reduced to ethylene in some samples. The microbiota of the cuttings was highly variable, both in numbers of cultured bacteria and in genetic diversity. Our results indicated that associative N2-fixation occurred within wild poplar and that a non-uniformity in the distribution of endophytic bacteria may explain the variability in N-fixation activity. These results point to the need for molecular studies to decipher the required microbial consortia and conditions for effective endophytic N2-fixation in trees.

Journal Article

Share this book

Add to My Shelf

Genomic Islands Shape the Genetic Background of Both JP2 and Non-JP2 Aggregatibacter actinomycetemcomitans

by Bumgarner, Roger E. , Chen, Casey , Kittichotirat, Weerayuth in A. actinomycetemcomitans , Actinobacillus actinomycetemcomitans , Aggregatibacter actinomycetemcomitans

2022

Aggregatibacter actinomycetemcomitans is a periodontal pathogen associated with periodontitis. This species exhibits substantial variations in gene content among different isolates and has different virulence potentials. This study examined the distribution of genomic islands and their insert sites among genetically diverse A. actinomycetemcomitans strains by comparative genomic analysis. The results showed that some islands, presumably more ancient, were found across all genetic clades of A. actinomycetemcomitans. In contrast, other islands were specific to individual clades or a subset of clades and may have been acquired more recently. The islands for the biogenesis of serotype-specific antigens comprise distinct genes located in different loci for serotype a and serotype b–f strains. Islands that encode the same cytolethal distending toxins appear to have been acquired via distinct mechanisms in different loci for clade b/c and for clade a/d/e/f strains. The functions of numerous other islands remain to be elucidated. JP2 strains represent a small branch within clade b, one of the five major genetic clades of A. actinomycetemcomitans. In conclusion, the complex process of genomic island acquisition, deletion, and modification is a significant force in the genetic divergence of A. actinomycetemcomitans. Assessing the genetic distinctions between JP2 and non-JP2 strains must consider the landscape of genetic variations shaped by evolution.

Journal Article

Share this book

Add to My Shelf

Construction of regulatory networks using expression time-series data of a genotyped population

by Mittler, John E , Schadt, Eric E , Yeung, Ka Yee in Algorithms , Bayes Theorem , Bayesian networks

2011

The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene–gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

Journal Article

Share this book

Add to My Shelf

Temporal genetic association and temporal genetic causality methods for dissecting complex networks

by Schadt, Eric E. , Bumgarner, Roger E. , Hirsch, Jeanne P. in 631/114/2401 , 631/553/2711 , Algorithms

2018

A large amount of panomic data has been generated in populations for understanding causal relationships in complex biological systems. Both genetic and temporal models can be used to establish causal relationships among molecular, cellular, or phenotypical traits, but with limitations. To fully utilize high-dimension temporal and genetic data, we develop a multivariate polynomial temporal genetic association (MPTGA) approach for detecting temporal genetic loci (teQTLs) of quantitative traits monitored over time in a population and a temporal genetic causality test (TGCT) for inferring causal relationships between traits linked to the locus. We apply MPTGA and TGCT to simulated data sets and a yeast F2 population in response to rapamycin, and demonstrate increased power to detect teQTLs. We identify a teQTL hotspot locus interacting with rapamycin treatment, infer putative causal regulators of the teQTL hotspot, and experimentally validate RRD1 as the causal regulator for this teQTL hotspot. Temporal omics data have the potential to dissect complex biological networks. Here the authors develop methods for detecting temporal genetic loci (teQTLs) of quantitative traits monitored over time and inferring causal relationships between traits linked to the locus.

Journal Article

Share this book

Add to My Shelf

Comparison of Major and Minor Viral SNPs Identified through Single Template Sequencing and Pyrosequencing in Acute HIV-1 Infection

by Deng, Wenjie , Iyer, Shyamala , Rolland, Morgane in Algorithms , Conserved sequence , Deoxyribonucleic acid

2015

Massively parallel sequencing (MPS) technologies, such as 454-pyrosequencing, allow for the identification of variants in sequence populations at lower levels than consensus sequencing and most single-template Sanger sequencing experiments. We sought to determine if the greater depth of population sampling attainable using MPS technology would allow detection of minor variants in HIV founder virus populations very early in infection in instances where Sanger sequencing detects only a single variant. We compared single nucleotide polymorphisms (SNPs) during acute HIV-1 infection from 32 subjects using both single template Sanger and 454-pyrosequencing. Pyrosequences from a median of 2400 viral templates per subject and encompassing 40% of the HIV-1 genome, were compared to a median of five individually amplified near full-length viral genomes sequenced using Sanger technology. There was no difference in the consensus nucleotide sequences over the 3.6kb compared in 84% of the subjects infected with single founders and 33% of subjects infected with multiple founder variants: among the subjects with disagreements, mismatches were found in less than 1% of the sites evaluated (of a total of nearly 117,000 sites across all subjects). The majority of the SNPs observed only in pyrosequences were present at less than 2% of the subject's viral sequence population. These results demonstrate the utility of the Sanger approach for study of early HIV infection and provide guidance regarding the design, utility and limitations of population sequencing from variable template sources, and emphasize parameters for improving the interpretation of massively parallel sequencing data to address important questions regarding target sequence evolution.

Journal Article

Share this book

Add to My Shelf

Cellular Transcriptional Profiling in Influenza A Virus-Infected Lung Epithelial Cells: The Role of the Nonstructural NS1 Protein in the Evasion of the Host Innate Defense and Its Potential Contribution to Pandemic Influenza

by García-Sastre, Adolfo , Geiss, Gary K. , Palese, Peter in Antivirals , Biological Sciences , Cell Line

2002

The NS1 protein of influenza A virus contributes to viral pathogenesis, primarily by enabling the virus to disarm the host cell type IFN defense system. We examined the downstream effects of NS1 protein expression during influenza A virus infection on global cellular mRNA levels by measuring expression of over 13,000 cellular genes in response to infection with wild-type and mutant viruses in human lung epithelial cells. Influenza A/PR/8/34 virus infection resulted in a significant induction of genes involved in the IFN pathway. Deletion of the viral NS1 gene increased the number and magnitude of expression of cellular genes implicated in the IFN, NF-κB, and other antiviral pathways. Interestingly, different IFN-induced genes showed different sensitivities to NS1-mediated inhibition of their expression. A recombinant virus with a C-terminal deletion in its NS1 gene induced an intermediate cellular mRNA expression pattern between wild-type and NS1 knockout viruses. Most significantly, a virus containing the 1918 pandemic NS1 gene was more efficient at blocking the expression of IFN-regulated genes than its parental influenza A/WSN/33 virus. Taken together, our results suggest that the cellular response to influenza A virus infection in human lung cells is significantly influenced by the sequence of the NS1 gene, demonstrating the importance of the NS1 protein in regulating the host cell response triggered by virus infection.

Journal Article

Share this book

Add to My Shelf

Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data

by Annest, Amalia , Yeung, Ka Yee , Raftery, Adrian E in Algorithms , Bayes Theorem , Bayesian statistical decision theory

2009

Background Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. Results We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139). Conclusion The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter