Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
30
result(s) for
"Mobile DNA Tools"
Sort by:
The Dfam community resource of transposable element families, sequence models, and genome annotations
by
Storer, Jessica
,
Smit, Arian F.
,
Rosen, Jeb
in
Animal Genetics and Genomics
,
Annotations
,
Biomedical and Life Sciences
2021
Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0–3.3 releases of Dfam (
https://dfam.org
) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam’s new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.
Journal Article
A beginner’s guide to manual curation of transposable elements
by
Bilat, Agustin F.
,
Craig, Rory J.
,
Peona, Valentina
in
Algorithms
,
Animal Genetics and Genomics
,
Automation
2022
Background
In the study of transposable elements (TEs), the generation of a high confidence set of consensus sequences that represent the diversity of TEs found in a given genome is a key step in the path to investigate these fascinating genomic elements. Many algorithms and pipelines are available to automatically identify putative TE families present in a genome. Despite the availability of these valuable resources, producing a library of high-quality full-length TE consensus sequences largely remains a process of manual curation. This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill.
Results
Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs. Detailed step-by-step protocols, aimed at the complete beginner, are presented in the Supplementary Methods.
Conclusions
The proposed set of programs and tools presented here will make the process of manual curation achievable and amenable to all researchers and in special to those new to the field of TEs.
Journal Article
Tools and best practices for retrotransposon analysis using high-throughput sequencing data
by
Bourc’his, Deborah
,
Servant, Nicolas
,
Teissandier, Aurélie
in
Algorithms
,
Animal Genetics and Genomics
,
Best practices
2019
Background
Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets.
Results
Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution.
Conclusions
Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.
Journal Article
Identification of transposable element families from pangenome polymorphisms
by
Durbin, Richard
,
Sierra, Pío
in
Animal behavior
,
Animal Genetics and Genomics
,
Biomedical and Life Sciences
2024
Background
Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility.
Results
We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies.
Conclusions
Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.
Journal Article
TE-Seq: a transposable element annotation and RNA-Seq pipeline
by
Sedivy, John M.
,
Kelsey, Maxfield M. G.
,
Kalekar, Radha L.
in
Analysis
,
Animal Genetics and Genomics
,
Annotations
2025
Background
The recognition that transposable elements (TEs) play important roles in many biological processes has elicited growing interest in analyzing sequencing data derived from this ‘dark genome’. This goal is complicated by the highly repetitive nature of these sequences in genomes, requiring the deployment of several problem-specific tools as well as the curation of appropriate genome annotations. This pipeline aims to make the analysis of TE sequences and their expression more generally accessible.
Results
The TE-Seq pipeline conducts an end-to-end analysis of RNA sequencing data, examining both genes and TEs, and is compatible with most eukaryotic species. It implements computational methods tailor-made for TEs, and produces a comprehensive analysis of TE expression at both the level of the individual element and at the TE clade level. Furthermore, if supplied with long-read DNA sequencing data, it is able to assess TE expression from non-reference (polymorphic) loci. As a demonstration, we analyzed proliferating, early senescent, and late senescent human lung fibroblast RNA-Seq data, and created a custom reference genome and annotations for this cell strain using Nanopore sequencing data. We found that several retrotransposable element clades were upregulated in senescence, which included non-reference, intact, and potentially active elements.
Conclusions
TE-Seq is made available as a Snakemake pipeline which can be obtained at
https://github.com/maxfieldk/TE-Seq
.
Journal Article
A benchmark of transposon insertion detection tools using real data
by
Vendrell-Mir, Pol
,
Merenciano, Miriam
,
Castanera, Raúl
in
Animal Genetics and Genomics
,
Benchmark
,
Benchmarking
2019
Background
Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions.
Results
We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in
Drosophila
and humans and show that this trend is maintained in genomes of different size and complexity.
Conclusions
We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.
Journal Article
The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families
by
Kent, W. James
,
Haussler, David
,
Clawson, Hiram
in
Animal Genetics and Genomics
,
Annotations
,
Biomedical and Life Sciences
2020
Background
Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats.
Results
We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at
https://repeatbrowser.ucsc.edu
how researchers can map their own human genome annotations to these reference repeat sequences.
Conclusions
The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.
Journal Article
Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation
by
Zhao, Lina
,
Liu, Xuanzeng
,
Huang, Yuan
in
Analysis
,
Animal Genetics and Genomics
,
Annotations
2024
Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of “level1/level2-level3”. Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is
Angaracris rhodopa
, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.
Journal Article
Teaching transposon classification as a means to crowd source the curation of repeat annotation – a tardigrade perspective
by
Hałakuc, Paweł
,
DeVries, Jon
,
Potente, Giacomo
in
Animal behavior
,
Animal Genetics and Genomics
,
Annotation
2024
Background
The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences.
Results
Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries.
Conclusions
The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
Journal Article
ColabCuraTE: an easy-to-use, web-based pipeline for the manual curation of transposable elements
by
Khansa, Abbas
,
Ellison, Christopher E.
,
Travers, Scott L.
in
Analysis
,
Animal Genetics and Genomics
,
Annotations
2025
Background
Transposable elements (TEs) are widespread mobile DNA sequences that shape genome structure, function, and evolution. Although automated tools exist for the
de novo
identification and classification of TEs, their output often requires manual refinement to generate accurate consensus sequences for individual TE families. This curation process is essential but remains time-consuming and inaccessible to many researchers, particularly those without bioinformatics expertise or access to sufficient computing resources. To address this gap, we developed
ColabCuraTE
, a web-based, user-friendly pipeline implemented in Google Colaboratory that enables manual curation of TEs without the need for local software installation or advanced programming skills.
Results
ColabCuraTE
includes built-in visualization tools and guides users through a streamlined workflow—from TE copy identification, alignment extension, and refinement, to consensus sequence generation and TE family analysis. We validated the pipeline using both megabase-sized and gigabase-sized genomes and found that it reliably improves the quality and completeness of TE consensus sequences compared to outputs from automated
de novo
TE annotation tools.
Conclusions
ColabCuraTE
enables easier participation in TE curation by removing infrastructure and expertise requirements that typically limit participation in genomic research. It excels at the targeted curation of individual TE families but can also be used for large-scale curation efforts when deployed via a course or workshop. Its accessibility, intuitive interface, and compatibility with existing tools make it a valuable resource for both researchers and educators.
ColabCuraTE
enables broader participation in TE annotation efforts and supports the integration of undergraduates in genomics research.
Journal Article