Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

Wikidata

General information

URL: https://www.wikidata.org/
Full name:
Description: a free and open knowledge base that can be read and edited by both humans and machines
Year founded: 2016
Last update: 2017-03-05
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: Scripps Research
Address:
City: La Jolla
Province/State: California
Country/Region: United States
Contact name (PI/Team): Andrew I Su
Contact email (PI/Helpdesk): asu@scripps.edu

Publications

26989148
Wikidata as a semantic framework for the Gene Wiki initiative. [PMID: 26989148]
Burgstaller-Muehlbacher S, Waagmeester A, Mitraka E, Turner J, Putman T, Leong J, Naik C, Pavlidis P, Schriml L, Good BM, Su AI.

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/. © The Author(s) 2016. Published by Oxford University Press.

Database (Oxford). 2016:2016() | 27 Citations (from Europe PMC, 2025-12-13)
27022157
Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes. [PMID: 27022157]
Putman TE, Burgstaller-Muehlbacher S, Waagmeester A, Wu C, Su AI, Good BM.

The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ?900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert. © The Author(s) 2016. Published by Oxford University Press.

Database (Oxford). 2016:2016() | 6 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
2930/6895 (57.52%)
Metadata:
301/719 (58.275%)
2930
Total Rank
33
Citations
3.667
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Tags

Related Databases

Citing
Cited by

Record metadata

Created on: 2017-03-07
Curated by:
Lina Ma [2017-06-02]
Shixiang Sun [2017-03-27]
Shixiang Sun [2017-03-07]