Single nucleotide polymorphism marker combinations for classifying Yeonsan Ogye chicken using a machine learning approach.
Eunjin Cho, Sunghyun Cho, Minjun Kim, Thisarani Kalhari Ediriweera, Dongwon Seo, Seung-Sook Lee, Jihye Cha, Daehyeok Jin, Young-Kuk Kim, Jun Heon Lee
Author Information
Eunjin Cho: Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea. ORCID
Sunghyun Cho: Research and Development Center, Insilicogen Inc., Yongin 19654, Korea. ORCID
Minjun Kim: Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea. ORCID
Thisarani Kalhari Ediriweera: Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea. ORCID
Dongwon Seo: Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea. ORCID
Seung-Sook Lee: Yeonsan Ogye Foundation, Nonsan 32910, Korea. ORCID
Jihye Cha: Animal Genome & Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju 55365, Korea. ORCID
Daehyeok Jin: Animal Genetic Resources Research Center, National Institute of Animal Science, Rural Development Administration, Hamyang 50000, Korea. ORCID
Young-Kuk Kim: Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea. ORCID
Jun Heon Lee: Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134, Korea. ORCID
Genetic analysis has great potential as a tool to differentiate between different species and breeds of livestock. In this study, the optimal combinations of single nucleotide polymorphism (SNP) markers for discriminating the Yeonsan Ogye chicken () breed were identified using high-density 600K SNP array data. In 3,904 individuals from 198 chicken breeds, SNP markers specific to the target population were discovered through a case-control genome-wide association study (GWAS) and filtered out based on the linkage disequilibrium blocks. Significant SNP markers were selected by feature selection applying two machine learning algorithms: Random Forest (RF) and AdaBoost (AB). Using a machine learning approach, the 38 (RF) and 43 (AB) optimal SNP marker combinations for the Yeonsan Ogye chicken population demonstrated 100% accuracy. Hence, the GWAS and machine learning models used in this study can be efficiently utilized to identify the optimal combination of markers for discriminating target populations using multiple SNP markers.