Catalogue Search | MBRL

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis

by Zhang, Xiao , Hou, Lifang , Du, Pan in Algorithms , Bioinformatics , Biomedical and Life Sciences

2010

Background High-throughput profiling of DNA methylation status of CpG islands is crucial to understand the epigenetic regulation of genes. The microarray-based Infinium methylation assay by Illumina is one platform for low-cost high-throughput methylation profiling. Both Beta-value and M-value statistics have been used as metrics to measure methylation levels. However, there are no detailed studies of their relations and their strengths and limitations. Results We demonstrate that the relationship between the Beta-value and M-value methods is a Logit transformation, and show that the Beta-value method has severe heteroscedasticity for highly methylated or unmethylated CpG sites. In order to evaluate the performance of the Beta-value and M-value methods for identifying differentially methylated CpG sites, we designed a methylation titration experiment. The evaluation results show that the M-value method provides much better performance in terms of Detection Rate (DR) and True Positive Rate (TPR) for both highly methylated and unmethylated CpG sites. Imposing a minimum threshold of difference can improve the performance of the M-value method but not the Beta-value method. We also provide guidance for how to select the threshold of methylation differences. Conclusions The Beta-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels. Therefore, we recommend using the M-value method for conducting differential methylation analysis and including the Beta-value statistics when reporting the results to investigators.

Journal Article

Share this book

Add to My Shelf

ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

by Pagès, Hervé , Green, Michael R , Lapointe, David S in Algorithms , Applications software , Binding Sites

2010

Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenom e, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.

Journal Article

Share this book

Add to My Shelf

Genome-Wide DNA Methylation Indicates Silencing of Tumor Suppressor Genes in Uterine Leiomyoma

by Du, Pan , Navarro, Antonia , Lin, Simon M. in Abnormalities , Adult , African Americans

2012

Uterine leiomyomas, or fibroids, represent the most common benign tumor of the female reproductive tract. Fibroids become symptomatic in 30% of all women and up to 70% of African American women of reproductive age. Epigenetic dysregulation of individual genes has been demonstrated in leiomyoma cells; however, the in vivo genome-wide distribution of such epigenetic abnormalities remains unknown. We characterized and compared genome-wide DNA methylation and mRNA expression profiles in uterine leiomyoma and matched adjacent normal myometrial tissues from 18 African American women. We found 55 genes with differential promoter methylation and concominant differences in mRNA expression in uterine leiomyoma versus normal myometrium. Eighty percent of the identified genes showed an inverse relationship between DNA methylation status and mRNA expression in uterine leiomyoma tissues, and the majority of genes (62%) displayed hypermethylation associated with gene silencing. We selected three genes, the known tumor suppressors KLF11, DLEC1, and KRT19 and verified promoter hypermethylation, mRNA repression and protein expression using bisulfite sequencing, real-time PCR and western blot. Incubation of primary leiomyoma smooth muscle cells with a DNA methyltransferase inhibitor restored KLF11, DLEC1 and KRT19 mRNA levels. These results suggest a possible functional role of promoter DNA methylation-mediated gene silencing in the pathogenesis of uterine leiomyoma in African American women.

Journal Article

Share this book

Add to My Shelf

Annotating the human genome with Disease Ontology

by Osborne, John D , Holko, Michelle , Chisholm, Rex L in Animal Genetics and Genomics , Biomedical and Life Sciences , Computational Biology - methods

2009

Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.

Journal Article

Share this book

Add to My Shelf

Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies

by Vega, Silvia C. , Lin, Simon M. , Park, Kyunghee in Algorithms , Analysis , Arrays

2012

During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.

Journal Article

Share this book

Add to My Shelf

A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads

by An, Lingling , Lin, Simon M. , Qiu, Yuqing in Abundance , Algorithms , Bioinformatics

2012

The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.

Journal Article

Share this book

Add to My Shelf

Windows into the past: lake sediment phosphorus trajectories act as integrated archives of watershed disturbance legacies over centennial scales

by Lin, Simon G M , Bhattacharya, Ruchi , Basu, Nandita B in Accumulation , Anthropogenic factors , Archives & records

2022

Historic land alterations and agricultural intensification have resulted in legacy phosphorus (P) accumulations within lakes and reservoirs. Internal loading from such legacy stores can be a major driver of future water quality degradation. Yet, little is known about the magnitude and spatial patterns of legacy P accumulation in lentic systems, and how watershed disturbance trajectories drive these patterns. Here, we used a meta-analysis of 113 paleolimnological studies across 124 lakes and four reservoirs (referred here on as lakes) in 20 countries to quantify the linkages between the 100 year trajectories of P concentrations in lake sediments, watershed inputs, and lake morphology. We find five distinct clusters for lake sediment P trajectories, with lakes in the developing and developed world showing distinctly different patterns. Lakes in the developed world (Europe and North America) with early agricultural intensification had the highest sediment P concentrations (1176–1628 mg kg −1 ), with a peak between the 1970–1980s and a decline since then, while lakes in the developing world, specifically China, documented monotonically increasing sediment P concentrations (857–1603 mg kg −1 ). Sediment P trajectories reflected watershed disturbance patterns and were driven by a combination of anthropogenic drivers (fertilizer input and population density) and lake morphology (watershed to lake area ratio). Specifically, we found the largest legacy accumulation rates to occur in shallow lakes experiencing long-term land-use disturbances. These links between land-use change and P accumulation in lentic systems can provide insights about inland water quality response and help to develop robust predictive models useful for resource managers and decision-makers.

Journal Article

Share this book

Add to My Shelf

Transcriptional Profiling of the Sonic Hedgehog Response: A Critical Role for N-myc in Proliferation of Neuronal Precursors

by Kaiser, Constanze , Grasfeder, Linda L. , Scott, Matthew P. in Animals , Biological Sciences , Brain

2003

Cerebellar granule cells are the most abundant neurons in the brain, and granule cell precursors (GCPs) are a common target of transformation in the pediatric brain tumor medulloblastoma. Proliferation of GCPs is regulated by the secreted signaling molecule Sonic hedgehog (Shh), but the mechanisms by which Shh controls proliferation of GCPs remain inadequately understood. We used DNA microarrays to identify targets of Shh in these cells and found that Shh activates a program of transcription that promotes cell cycle entry and DNA replication. Among the genes most robustly induced by Shh are cyclin D1 and N-myc. N-myc transcription is induced in the presence of the protein synthesis inhibitor cycloheximide, so it appears to be a direct target of Shh. Retroviral transduction of N-myc into GCPs induces expression of cyclin D1, E2F1, and E2F2, and promotes proliferation. Moreover, dominant-negative N-myc substantially reduces Shh-induced proliferation, indicating that N-myc is required for the Shh response. Finally, cyclin D1 and N-myc are overexpressed in murine medulloblastoma. These findings suggest that cyclin D1 and N-myc are important mediators of Shh-induced proliferation and tumorigenesis.

Journal Article

Share this book

Add to My Shelf

A Framework for Annotating Human Genome in Disease Context

by Fu, Dong , Lin, Simon M. , Cheng, Wenqing in Annotations , Anopheles , Bioinformatics

2012

Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.

Journal Article

Share this book

Add to My Shelf