Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

EID

General information

URL: http://mcb.harvard.edu/gilbert/EID
Full name: the Exon-Intron Databas
Description: The database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif information.
Year founded: 2000
Last update:
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Harvard University
Address: Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
City:
Province/State:
Country/Region: United States
Contact name (PI/Team): Walter Gilbert
Contact email (PI/Helpdesk): gilbert@nucleus.harvard.edu

Publications

10592221
EID: the Exon-Intron Database-an exhaustive database of protein-coding intron-containing genes. [PMID: 10592221]
Saxonov S, Daizadeh I, Fedorov A, Gilbert W.

To aid studies of molecular evolution and to assist in gene prediction research, we have constructed an Exon-Intron Database (EID) in FASTA format. Currently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif information. There is 17% redundancy inherited from GenBank-a purge at the 99% identity level reduced the database to 42 460 genes (243 589 exons). We have created subdatabases of genes whose intron positions have been experimentally determined. One such database, constructed by comparing genomic and mRNA sequences, contains 11 242 genes (62 474 exons). A larger database of 22 196 genes (105 595 exons) was constructed by selecting on keywords to eliminate computer-predicted genes. By examining the two nucleotides adjacent to the intron boundary, we infer that there is a 2% rate of errors or other deviations from the standard GTellipsisAG motif in nuclear genes. This criterion can be used to eliminate 4921 genes from the overall database. Various tools are provided to enable generation of user-specific subsets of the EID. The EID distribution can be obtained from http://mcb.harvard.edu/gilbert/EID

Nucleic Acids Res. 2000:28(1) | 84 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
3122/6895 (54.735%)
Gene genome and annotation:
975/2021 (51.806%)
3122
Total Rank
83
Citations
3.32
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-02-08
Curated by:
Zhaohua Li [2018-02-24]
Pei Wang [2018-02-08]