The phenotypic data includes breeds, diseases, and genotype-to-phenotype (G2P) pairs.
For the breed data, information was collected from the AKC, CKC, UKC, FCI and Wikipedia. Initially, original breed information was obtained through web crawling technology from public websites. We defined a breed nomenclature to standardize breed names by considering the characteristics, origins, and popularity of the names. Based on this nomenclature, we combined all the data, normalized both the breed names and the descriptions of characteristics, and masked any redundant information.
For disease data, original information was sourced from CIDD, OMIA and The DogPlace. After curation and merging, we identified 897 diseases. To help dog owners and veterinarians understand and manage dog diseases more effectively, this section includes detailed information integrated from various sites, covering disease descriptions, symptoms, causes, inheritance modes, diagnostic methods, treatments, and advisory tips for veterinarians or breeders.
For G2P pairs, we acquired data from Genome-Wide Association Study (GWAS) papers published and available in PubMed until 2024. Additionally, two ontologies, the Dog Breed Ontology (DBO) and the Dog Disease and Trait Ontology (DDTO), have been constructed for data standardization. Finally, we obtained 3,207 G2P pairs.
The overall data processing procedure is as follows.
1. Breed
iDog collects breed information from 5 websites: AKC, UKC, CKC, UC, FCI and Wikipedia. The name of the breed is included and basic information of each breed is curated manually. The breed module provides three categories to browse the data: breed group, breed name from A to Z, and the geographic distribution of breed.
1.1 Breed Group
iDog provides 10 types of breed group. The information is curated manually from the 5 websites mentioned above. Users can click on each group to show a breed list.
1.1.1 Dog Breeds A to Z
The breed names are ordered from A to Z. Users can click an interested letter to show all breed names starting with that letter. For example, clicking “A”, would show all breed names starting “A”.
1.1.2 Dog Breed of the World
The breed of world includes 5 continents: Europe, America, Asia, Africa and Oceania. Clicking on a country from a continent will show the breed list of that origin.
1.1.3 The detail information of breed
Clicking a breed name will show detailed information in 6 categories.
a) General information: introduces the personality, energy level, shedding, grooming and other basic information of this breed. Other names of the breed will also be included.
b) Breed registries: shows if the breed is recognized by any international kennel clubs.
c) Associated disease: includes this breed’s disease name, the associated gene and disease level information.
d) Associated SNP information includes the SNP information from DogSD associated with this breed.
e) Breed standards includes the breed standard file provided by registered international kennel clubs.
f) Reference: includes the information sources of this breed.
2. Disease
iDog provides dog disease information for users. The disease information is integrated from several websites such as OMIA, CIDD and the dog place.
The disease list includes the disease name, disease description, the associated gene and reference papers.
2.1 The Disease Detail Information
The disease detail information includes the basic information, associated disease in other database such as OMIA, and the associated breeds.
In basic information, we provide the disease description, inherit mode, disease symptom, disease cause and other information curated manually from public disease resource. User can click the link of the disease description to access the source.
Associated Disease provides the disease information curate from the OMIA database.
Associated Breeds provides the information of breeds that also have this disease.
3. G2P pairs
The phenotype, alleles and disease information is curated from public papers. The information includes chromosome, position, dbSNP rsid, species/population, disease trait, effect allele, OR value, P value, reported gene symbol and PMID.