Catalogue Search | MBRL

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

by Sale, Kevin , Clapham, Peter , Qi, Guoying in Academies and Institutes , Algorithms , Applications software

2011

Background Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. Results We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. Conclusions iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.

Journal Article

Share this book

Add to My Shelf

Pan-cancer analysis of whole genomes

by Dagg, Rebecca A. , Wu, Guanming , Viksna, Juris in 45/23 , 631/67/69 , 692/699/67/69

2020

Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale 1 – 3 . Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter 4 ; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation 5 , 6 ; analyses timings and patterns of tumour evolution 7 ; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity 8 , 9 ; and evaluates a range of more-specialized features of cancer genomes 8 , 10 – 18 . The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 cancer whole genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.

Journal Article

Share this book

Add to My Shelf

The repertoire of mutational signatures in human cancer

by Covington, Kyle R. , Bergstrom, Erik N. , Mustonen, Ville in 45/23 , 631/208/737 , 631/67/68

2020

Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3 – 15 , enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer. The characterization of 4,645 whole-genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small-insertion-and-deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.

Journal Article

Share this book

Add to My Shelf

SARS-CoV-2 evolution during treatment of chronic infection

by Ceron-Gutierrez, Lourdes , Gayed, Salma , Illingworth, Christopher J. R. in 13/1 , 13/31 , 45/22

2021

The spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for virus infection through the engagement of the human ACE2 protein 1 and is a major antibody target. Here we show that chronic infection with SARS-CoV-2 leads to viral evolution and reduced sensitivity to neutralizing antibodies in an immunosuppressed individual treated with convalescent plasma, by generating whole-genome ultra-deep sequences for 23 time points that span 101 days and using in vitro techniques to characterize the mutations revealed by sequencing. There was little change in the overall structure of the viral population after two courses of remdesivir during the first 57 days. However, after convalescent plasma therapy, we observed large, dynamic shifts in the viral population, with the emergence of a dominant viral strain that contained a substitution (D796H) in the S2 subunit and a deletion (ΔH69/ΔV70) in the S1 N-terminal domain of the spike protein. As passively transferred serum antibodies diminished, viruses with the escape genotype were reduced in frequency, before returning during a final, unsuccessful course of convalescent plasma treatment. In vitro, the spike double mutant bearing both ΔH69/ΔV70 and D796H conferred modestly decreased sensitivity to convalescent plasma, while maintaining infectivity levels that were similar to the wild-type virus.The spike substitution mutant D796H appeared to be the main contributor to the decreased susceptibility to neutralizing antibodies, but this mutation resulted in an infectivity defect. The spike deletion mutant ΔH69/ΔV70 had a twofold higher level of infectivity than wild-type SARS-CoV-2, possibly compensating for the reduced infectivity of the D796H mutation. These data reveal strong selection on SARS-CoV-2 during convalescent plasma therapy, which is associated with the emergence of viral variants that show evidence of reduced susceptibility to neutralizing antibodies in immunosuppressed individuals. Chronic infection with SARS-CoV-2 leads to the emergence of viral variants that show reduced susceptibility to neutralizing antibodies in an immunosuppressed individual treated with convalescent plasma.

Journal Article

Share this book

Add to My Shelf

Sensitivity of SARS-CoV-2 B.1.1.7 to mRNA vaccine-elicited antibodies

by Snell, Gyorgy , Lanzavecchia, Antonio , Ceron-Gutierrez, Lourdes in 13/106 , 631/250/590 , 631/326/596

2021

Transmission of SARS-CoV-2 is uncontrolled in many parts of the world; control is compounded in some areas by the higher transmission potential of the B.1.1.7 variant 1 , which has now been reported in 94 countries. It is unclear whether the response of the virus to vaccines against SARS-CoV-2 on the basis of the prototypic strain will be affected by the mutations found in B.1.1.7. Here we assess the immune responses of individuals after vaccination with the mRNA-based vaccine BNT162b2 2 . We measured neutralizing antibody responses after the first and second immunizations using pseudoviruses that expressed the wild-type spike protein or a mutated spike protein that contained the eight amino acid changes found in the B.1.1.7 variant. The sera from individuals who received the vaccine exhibited a broad range of neutralizing titres against the wild-type pseudoviruses that were modestly reduced against the B.1.1.7 variant. This reduction was also evident in sera from some patients who had recovered from COVID-19. Decreased neutralization of the B.1.1.7 variant was also observed for monoclonal antibodies that target the N-terminal domain (9 out of 10) and the receptor-binding motif (5 out of 31), but not for monoclonal antibodies that recognize the receptor-binding domain that bind outside the receptor-binding motif. Introduction of the mutation that encodes the E484K substitution in the B.1.1.7 background to reflect a newly emerged variant of concern (VOC 202102/02) led to a more-substantial loss of neutralizing activity by vaccine-elicited antibodies and monoclonal antibodies (19 out of 31) compared with the loss of neutralizing activity conferred by the mutations in B.1.1.7 alone. The emergence of the E484K substitution in a B.1.1.7 background represents a threat to the efficacy of the BNT162b2 vaccine. Sera from vaccinated individuals and some monoclonal antibodies show a modest reduction in neutralizing activity against the B.1.1.7 variant of SARS-CoV-2; but the E484K substitution leads to a considerable loss of neutralizing activity.

Journal Article

Share this book

Add to My Shelf

The evolutionary history of 2,658 cancers

by Demeulemeester, Jonas , Boutros, Paul C. , Lee, Juhee in 45/23 , 631/114 , 631/181/735

2020

Cancer develops through a process of somatic evolution 1 , 2 . Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes 3 . Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) 4 , we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection. Whole-genome sequencing data for 2,778 cancer samples from 2,658 unique donors across 38 cancer types is used to reconstruct the evolutionary history of cancer, revealing that driver mutations can precede diagnosis by several years to decades.

Journal Article

Share this book

Add to My Shelf

Patterns of somatic structural variation in human cancer genomes

by Weischenfeldt, Joachim , Korbel, Jan O. , Schumacher, Steven E. in 45/23 , 631/208/211 , 631/67/69

2020

A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes 1 – 7 . Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types 8 . Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions—as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2–7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and—in liver cancer—frequently activate the telomerase gene TERT . A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act. Whole-genome sequencing data from more than 2,500 cancers of 38 tumour types reveal 16 signatures that can be used to classify somatic structural variants, highlighting the diversity of genomic rearrangements in cancer.

Journal Article

Share this book

Add to My Shelf

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing

by Yang, Lixing , Jain, Dhawal , Klimczak, Leszek J. in 45/23 , 631/114 , 631/208/212

2020

Chromothripsis is a mutational phenomenon characterized by massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in selected cancer types have suggested that chromothripsis may be more common than initially inferred from low-resolution copy-number data. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we analyze patterns of chromothripsis across 2,658 tumors from 38 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of more than 50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy-number states, a considerable fraction of events involve multiple chromosomes and additional structural alterations. In addition to non-homologous end joining, we detect signatures of replication-associated processes and templated insertions. Chromothripsis contributes to oncogene amplification and to inactivation of genes such as mismatch-repair-related genes. These findings show that chromothripsis is a major process that drives genome evolution in human cancer. Analysis of whole-genome sequencing data across 2,658 tumors spanning 38 cancer types shows that chromothripsis is pervasive, with a frequency of more than 50% in several cancer types, contributing to oncogene amplification, gene inactivation and cancer genome evolution.

Journal Article

Share this book

Add to My Shelf

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes

by Tubio, Jose M. C. , Sander, Chris , Herrmann, Carl in 3' Untranslated regions , 45/23 , 631/114

2020

The discovery of drivers of cancer has traditionally focused on protein-coding genes 1 – 4 . Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers 6 , 7 , raise doubts about others and identify novel candidates, including point mutations in the 5′ region of TP53 , in the 3′ untranslated regions of NFKBIZ and TOB1 , focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available. Analyses of 2,658 whole genomes across 38 types of cancer identify the contribution of non-coding point mutations and structural variants to driving cancer.

Journal Article

Share this book

Add to My Shelf

Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition

by Ju, Young Seok , Burns, Kathleen H. , Tubio, Jose M. C. in 45/23 , 631/208/212 , 692/699/67

2020

About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage–fusion–bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of 22 L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors. An analysis of 2,954 genomes from 38 cancer subtypes identified 19,166 retrotransposition events in 35% of samples. Aberrant LINE-1 retrotranspositions can lead to the deletion of tumor-suppressor genes as well as the amplification of oncogenes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter