Catalogue Search | MBRL

The Dfam community resource of transposable element families, sequence models, and genome annotations

by Storer, Jessica , Smit, Arian F. , Rosen, Jeb in Animal Genetics and Genomics , Annotations , Biomedical and Life Sciences

2021

Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0–3.3 releases of Dfam ( https://dfam.org ) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam’s new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.

Journal Article

Share this book

Add to My Shelf

A beginner’s guide to manual curation of transposable elements

by Bilat, Agustin F. , Craig, Rory J. , Peona, Valentina in Algorithms , Animal Genetics and Genomics , Automation

2022

Background In the study of transposable elements (TEs), the generation of a high confidence set of consensus sequences that represent the diversity of TEs found in a given genome is a key step in the path to investigate these fascinating genomic elements. Many algorithms and pipelines are available to automatically identify putative TE families present in a genome. Despite the availability of these valuable resources, producing a library of high-quality full-length TE consensus sequences largely remains a process of manual curation. This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill. Results Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs. Detailed step-by-step protocols, aimed at the complete beginner, are presented in the Supplementary Methods. Conclusions The proposed set of programs and tools presented here will make the process of manual curation achievable and amenable to all researchers and in special to those new to the field of TEs.

Journal Article

Share this book

Add to My Shelf

Tools and best practices for retrotransposon analysis using high-throughput sequencing data

by Bourc’his, Deborah , Servant, Nicolas , Teissandier, Aurélie in Algorithms , Animal Genetics and Genomics , Best practices

2019

Background Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets. Results Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution. Conclusions Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.

Journal Article

Share this book

Add to My Shelf

Identification of transposable element families from pangenome polymorphisms

by Durbin, Richard , Sierra, Pío in Animal behavior , Animal Genetics and Genomics , Biomedical and Life Sciences

2024

Background Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility. Results We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies. Conclusions Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.

Journal Article

Share this book

Add to My Shelf

TE-Seq: a transposable element annotation and RNA-Seq pipeline

by Sedivy, John M. , Kelsey, Maxfield M. G. , Kalekar, Radha L. in Analysis , Animal Genetics and Genomics , Annotations

2025

Background The recognition that transposable elements (TEs) play important roles in many biological processes has elicited growing interest in analyzing sequencing data derived from this ‘dark genome’. This goal is complicated by the highly repetitive nature of these sequences in genomes, requiring the deployment of several problem-specific tools as well as the curation of appropriate genome annotations. This pipeline aims to make the analysis of TE sequences and their expression more generally accessible. Results The TE-Seq pipeline conducts an end-to-end analysis of RNA sequencing data, examining both genes and TEs, and is compatible with most eukaryotic species. It implements computational methods tailor-made for TEs, and produces a comprehensive analysis of TE expression at both the level of the individual element and at the TE clade level. Furthermore, if supplied with long-read DNA sequencing data, it is able to assess TE expression from non-reference (polymorphic) loci. As a demonstration, we analyzed proliferating, early senescent, and late senescent human lung fibroblast RNA-Seq data, and created a custom reference genome and annotations for this cell strain using Nanopore sequencing data. We found that several retrotransposable element clades were upregulated in senescence, which included non-reference, intact, and potentially active elements. Conclusions TE-Seq is made available as a Snakemake pipeline which can be obtained at https://github.com/maxfieldk/TE-Seq .

Journal Article

Share this book

Add to My Shelf

A benchmark of transposon insertion detection tools using real data

by Vendrell-Mir, Pol , Merenciano, Miriam , Castanera, Raúl in Animal Genetics and Genomics , Benchmark , Benchmarking

2019

Background Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. Results We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. Conclusions We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.

Journal Article

Share this book

Add to My Shelf

The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families

by Kent, W. James , Haussler, David , Clawson, Hiram in Animal Genetics and Genomics , Annotations , Biomedical and Life Sciences

2020

Background Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats. Results We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at https://repeatbrowser.ucsc.edu how researchers can map their own human genome annotations to these reference repeat sequences. Conclusions The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.

Journal Article

Share this book

Add to My Shelf

Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation

by Zhao, Lina , Liu, Xuanzeng , Huang, Yuan in Analysis , Animal Genetics and Genomics , Annotations

2024

Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of “level1/level2-level3”. Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa , which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.

Journal Article

Share this book

Add to My Shelf

Teaching transposon classification as a means to crowd source the curation of repeat annotation – a tardigrade perspective

by Hałakuc, Paweł , DeVries, Jon , Potente, Giacomo in Animal behavior , Animal Genetics and Genomics , Annotation

2024

Background The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. Results Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. Conclusions The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.

Journal Article

Share this book

Add to My Shelf

ColabCuraTE: an easy-to-use, web-based pipeline for the manual curation of transposable elements

by Khansa, Abbas , Ellison, Christopher E. , Travers, Scott L. in Analysis , Animal Genetics and Genomics , Annotations

2025

Background Transposable elements (TEs) are widespread mobile DNA sequences that shape genome structure, function, and evolution. Although automated tools exist for the de novo identification and classification of TEs, their output often requires manual refinement to generate accurate consensus sequences for individual TE families. This curation process is essential but remains time-consuming and inaccessible to many researchers, particularly those without bioinformatics expertise or access to sufficient computing resources. To address this gap, we developed ColabCuraTE , a web-based, user-friendly pipeline implemented in Google Colaboratory that enables manual curation of TEs without the need for local software installation or advanced programming skills. Results ColabCuraTE includes built-in visualization tools and guides users through a streamlined workflow—from TE copy identification, alignment extension, and refinement, to consensus sequence generation and TE family analysis. We validated the pipeline using both megabase-sized and gigabase-sized genomes and found that it reliably improves the quality and completeness of TE consensus sequences compared to outputs from automated de novo TE annotation tools. Conclusions ColabCuraTE enables easier participation in TE curation by removing infrastructure and expertise requirements that typically limit participation in genomic research. It excels at the targeted curation of individual TE families but can also be used for large-scale curation efforts when deployed via a course or workshop. Its accessibility, intuitive interface, and compatibility with existing tools make it a valuable resource for both researchers and educators. ColabCuraTE enables broader participation in TE annotation efforts and supports the integration of undergraduates in genomics research.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter