Catalogue Search | MBRL

A robust benchmark for detection of germline large deletions and insertions

by Alkan Can , Hajirasouliha Iman , Ghaffari Noushin in Benchmarks , Consortia , Deoxyribonucleic acid

2020

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions.

Journal Article

Share this book

Add to My Shelf

Speeding genomic island discovery through systematic design of reference database composition

by Ghaffari, Noushin , Yu, Steven L. , Mageeney, Catherine M. in Archaeal taxonomy , Bacterial genomics , BASIC BIOLOGICAL SCIENCES

2024

Genomic islands (GIs) are mobile genetic elements that integrate site-specifically into bacterial chromosomes, bearing genes that affect phenotypes such as pathogenicity and metabolism. GIs typically occur sporadically among related bacterial strains, enabling comparative genomic approaches to GI identification. For a candidate GI in a query genome, the number of reference genomes with a precise deletion of the GI serves as a support value for the GI. Our comparative software for GI identification was slowed by our original use of large reference genome databases (DBs). Here we explore smaller species-focused DBs. With increasing DB size, recovery of our reliable prophage GI calls reached a plateau, while recovery of less reliable GI calls (FPs) increased rapidly as DB sizes exceeded ~500 genomes; i.e., overlarge DBs can increase FP rates. Paradoxically, relative to prophages, FPs were both more frequently supported only by genomes outside the species and more frequently supported only by genomes inside the species; this may be due to their generally lower support values. Setting a DB size limit for our SMAll Ranked Tailored (SMART) DB design speeded runtime ~65-fold. Strictly intra-species DBs would tend to lower yields of prophages for small species (with few genomes available); simulations with large species showed that this could be partially overcome by reaching outside the species to closely related taxa, without an FP burden. Employing such taxonomic outreach in DB design generated redundancy in the DB set; as few as 2984 DBs were needed to cover all 47894 prokaryotic species. Runtime decreased dramatically with SMART DB design, with only minor losses of prophages. We also describe potential utility in other comparative genomics projects.

Journal Article

Share this book

Add to My Shelf

Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

by Ghaffari, Noushin , Sawyer, Jason , Dindot, Scott V in Animal Genetics and Genomics , Animals , Biomedical and Life Sciences

2012

Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

Journal Article

Share this book

Add to My Shelf

HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment

by Ghaffari, Noushin , Johnson, Charles D. , Datta, Aniruddha in Algorithms , Animal Genetics and Genomics , Assemblies

2017

Background The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. Methods Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. Results Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. Conclusions Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.

Journal Article

Share this book

Add to My Shelf

Author Correction: A robust benchmark for detection of germline large deletions and insertions

by Alkan Can , Hajirasouliha Iman , Ghaffari Noushin

2020

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Journal Article

Share this book

Add to My Shelf

Inter-Individual Variation in DNA Methylation Patterns across Two Tissues and Leukocytes in Mature Brahman Cattle

by Littlejohn, Brittni P. , Riley, David G. , Cilkiz, Kubra Z. in Amygdala , Analysis , anterior pituitary

2023

Quantifying the natural inter-individual variation in DNA methylation patterns is important for identifying its contribution to phenotypic variation, but also for understanding how the environment affects variability, and for incorporation into statistical analyses. The inter-individual variation in DNA methylation patterns in female cattle and the effect that a prenatal stressor has on such variability have yet to be quantified. Thus, the objective of this study was to utilize methylation data from mature Brahman females to quantify the inter-individual variation in DNA methylation. Pregnant Brahman cows were transported for 2 h durations at days 60 ± 5; 80 ± 5; 100 ± 5; 120 ± 5; and 140 ± 5 of gestation. A non-transport group was maintained as a control. Leukocytes, amygdala, and anterior pituitary glands were harvested from eight cows born from the non-transport group (Control) and six from the transport group (PNS) at 5 years of age. The DNA harvested from the anterior pituitary contained the greatest variability in DNA methylation of cytosine-phosphate-guanine (mCpG) sites from both the PNS and Control groups, and the amygdala had the least. Numerous variable mCpG sites were associated with retrotransposable elements and highly repetitive regions of the genome. Some of the genomic features that had high variation in DNA methylation are involved in immune responses, signaling, responses to stimuli, and metabolic processes. The small overlap of highly variable CpG sites and features between tissues and leukocytes supports the role of variable DNA methylation in regulating tissue-specific gene expression. Many of the CpG sites that exhibited high variability in DNA methylation were common between the PNS and Control groups within a tissue, but there was little overlap in genomic features with high variability. The interaction between the prenatal environment and the genome could be responsible for the differences in location of the variable DNA methylation.

Journal Article

Share this book

Add to My Shelf

Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens

by Braga-Neto, Ulisses M , Johnson, Charles D , Wang, Hui in Algorithms , Animals , Bioinformatics

2011

Background RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. Results Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. Conclusion The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.

Journal Article

Share this book

Add to My Shelf

Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture

by Garcia-Orozco, Karina D. , Blood, Philip D. , Johnson, Charles D. in 38/39 , 38/90 , 38/91

2014

We present a new transcriptome assembly of the Pacific whiteleg shrimp ( Litopenaeus vannamei ), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality of this assembly and the affiliated targeted homology searches greatly enrich the curated transcripts currently available in public databases for this species. Comparison with the model arthropod Daphnia allows some insights into defining characteristics of decapod crustaceans. This large-scale gene discovery gives the broadest depth yet to the annotated transcriptome of this important species and should be of value to ongoing genomics and immunogenetic resistance studies in this shrimp of paramount global economic importance.

Journal Article

Share this book

Add to My Shelf

Evaluation of Prenatal Transportation Stress on DNA Methylation Axis Tissues of Mature Brahman Cows

by Cilkiz, Kubra Z , Riggs, Penny K , Randel, Ronald D in Analysis , Beef cattle , Brain

2025

Background/Objectives: The experience of prenatal stress results in various physiological disorders due to an alteration of an offspring’s methylome and transcriptome. The objective of this study was to determine whether PNS affects DNA methylation (DNAm) and gene expression in the stress axis tissues of mature Brahman cows. Methods: Samples were collected from the paraventricular nucleus (PVN), anterior pituitary (PIT), and adrenal cortex (AC) of 5-year-old Brahman cows that were prenatally exposed to either transportation stress (PNS, n = 6) or were not transported (Control, n = 8). The isolated DNA and RNA samples were, respectively, used for methylation and RNA-Seq analyses. A gene ontology and KEGG pathway enrichment analysis of each data set within each sample tissue was conducted with the DAVID Functional Annotation Tool. Results: The DNAm analysis revealed 3, 64, and 99 hypomethylated and 2, 93, and 90 hypermethylated CpG sites (FDR < 0.15) within the PVN, PIT, and AC, respectively. The RNA-Seq analysis revealed 6, 25, and 5 differentially expressed genes (FDR < 0.15) in the PVN, PIT, and AC, respectively, that were up-regulated in the PNS group relative to the Control group, as well as 24 genes in the PIT that were down-regulated. Based on the enrichment analysis, several developmental and cellular processes, such as maintenance of the actin cytoskeleton, cell motility, signal transduction, neurodevelopment, and synaptic function, were potentially modulated. Conclusions: The methylome and transcriptome were altered in the stress axis tissues of mature cows that had been exposed to prenatal transportation stress. These findings are relevant to understanding how prenatal experiences may affect postnatal neurological functions.

Journal Article

Share this book

Add to My Shelf

A Colletotrichum graminicola mutant deficient in the establishment of biotrophy reveals early transcriptional events in the maize anthracnose disease interaction

by Johnson, Charles D. , Buiate, Ester A. S. , Schwartz, Scott in Animal Genetics and Genomics , Ascomycota , Biomedical and Life Sciences

2016

Background Colletotrichum graminicola is a hemibiotrophic fungal pathogen that causes maize anthracnose disease. It progresses through three recognizable phases of pathogenic development in planta : melanized appressoria on the host surface prior to penetration; biotrophy, characterized by intracellular colonization of living host cells; and necrotrophy, characterized by host cell death and symptom development. A “Mixed Effects” Generalized Linear Model (GLM) was developed and applied to an existing Illumina transcriptome dataset, substantially increasing the statistical power of the analysis of C. graminicola gene expression during infection and colonization. Additionally, the in planta transcriptome of the wild-type was compared with that of a mutant strain impaired in the establishment of biotrophy, allowing detailed dissection of events occurring specifically during penetration, and during early versus late biotrophy. Results More than 2000 fungal genes were differentially transcribed during appressorial maturation, penetration, and colonization. Secreted proteins, secondary metabolism genes, and membrane receptors were over-represented among the differentially expressed genes, suggesting that the fungus engages in an intimate and dynamic conversation with the host, beginning prior to penetration. This communication process probably involves reception of plant signals triggering subsequent developmental progress in the fungus, as well as production of signals that induce responses in the host. Later phases of biotrophy were more similar to necrotrophy, with increased production of secreted proteases, inducers of plant cell death, hydrolases, and membrane bound transporters for the uptake and egress of potential toxins, signals, and nutrients. Conclusions This approach revealed, in unprecedented detail, fungal genes specifically expressed during critical phases of host penetration and biotrophic establishment. Many encoded secreted proteins, secondary metabolism enzymes, and receptors that may play roles in host-pathogen communication necessary to promote susceptibility, and thus may provide targets for chemical or biological controls to manage this important disease. The differentially expressed genes could be used as ‘landmarks’ to more accurately identify developmental progress in compatible versus incompatible interactions involving genetic variants of both host and pathogen.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter