This workflow was adapted from Battey et al. (2020). Battey CJ, Ralph PL, Kern AD. Predicting geographic location from genetic variation with deep neural networks. eLife. 2020;9:e54507. https://doi.org/10.7554/eLife.54507
LOCATOR-Based Geographic Origin Inference Using Genome-Wide Variants
Predict geographic coordinates for pangolin samples of unknown origin from whole-genome genetic variation data. LOCATOR is a deep learning-based tool for geographic-origin inference. It learns the relationship between unphased diploid genotypes and sampling locations from reference samples with known coordinates and predicts coordinates for query samples. This service accepts whole-genome VCF files containing biallelic SNPs. LOCATOR converts genotypes into allele-count vectors, represented as 0, 1, or 2 at each biallelic site, and performs inference using a deep neural network. The method does not require an explicit model of spatial allele-frequency variation. LOCATOR can also generate predictions across genomic windows to describe uncertainty and explore geographic-ancestry variation. Predicted coordinates should be interpreted as geographic-origin estimates, as accuracy depends on the number, geographic coverage, and genetic representativeness of reference samples.