Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

RepeatExplorer

General information

URL: http://repeatexplorer.org/
Full name:
Description: RepeatExplorer is a computational pipeline designed to identify and characterize repetitive DNA elements in next-generation sequencing data from plant and animal genomes. It employs graph-based clustering of sequence reads to identify repetitive elements and a number of additional programs that aid in their annotation and quantification.
Year founded: 2013
Last update:
Version:
Accessibility:
Accessible
Country/Region: Czech Republic

Classification & Tag

Data type:
DNA
Data object:
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Institute of Plant Molecular Biology
Address: Institute of Plant Molecular Biology, Biology Centre ASCR, Branisˇovska´ 31, Cˇ eske´ Budeˇ jovice, CZ-37005, Czech Republic
City:
Province/State:
Country/Region: Czech Republic
Contact name (PI/Team): Jirˇı´ Macas
Contact email (PI/Helpdesk): macas@umbr.cas.cz

Publications

30622655
Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. [PMID: 30622655]
Pavel Neumann, Petr Novák, Nina Hoštáková, Jiří Macas

Background: Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons.
Results: We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains.
Conclusions: We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).

Mob DNA. 2019:10() | 274 Citations (from Europe PMC, 2025-12-13)
23376349
RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. [PMID: 23376349]
Novák P, Neumann P, Pech J, Steinhaisl J, Macas J.

MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.

Bioinformatics. 2013:29(6) | 540 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
256/6895 (96.302%)
Gene genome and annotation:
96/2021 (95.299%)
Literature:
30/577 (94.974%)
256
Total Rank
788
Citations
65.667
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2019-09-25
Curated by:
furrukh mehmood [2019-10-30]
Ghulam Abbas [2019-09-25]