IC4R009-GWAS-2016-26860200
Contents
Project Title
- Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement
The Background of This Project
- To address the multiple challenges to food security posed by global climate change, population growth and rising incomes, plant breeders are developing new crop varieties that can enhance both agricultural productivity and environmental sustainability. Current breeding practices, however, are unable to keep pace with demand. Genomic selection (GS) is a new technique that helps accelerate the rate of genetic gain in breeding by using whole-genome data to predict the breeding value of offspring. In this project, the researchers describe a new GS model that combines RR-BLUP with markers fit as fixed effects selected from the results of a genome-wide-association study (GWAS) on the RR-BLUP training data. The researchers term this model GS + de novo GWAS.
Plant Culture & Treatment
- In all, 369 elite breeding lines (F6–F7) were selected for genotyping from the IRRI irrigated rice breeding program based on the planned inclusion of the lines in the 2011 Multi-Environment Testing Program and presence in the 2011 (Los Baños).
- Two phenotype data sets were used in this study, (1) the RYT data set, consisting of field data from 2009–2012, two seasons per year (dry season (DS) and wet season (WS)) collected in a single field at IRRI in Los Baños, Philippines, and (2) the MET data set, consisting of field data from 2011 and 2012, two season per year (dry and wet), at a total of eight sites in SE Asia (Table 1).
- For the RYT data set, phenotypes collected included plant height, flowering time, maturity date, number of effective tillers or panicles per plant, lodging score, grain yield and rep number (Supplementary Materials and Methods).
- For the MET data set, in addition to the phenotypes collected for the RYT data set, data were collected on field row, field column, phenotypic acceptability score for whole plant, phenotypic acceptability score for panicle and phenotypic acceptability score for grain (Supplementary Materials and Methods). The eight sites at which the MET data were collected to compose IRRI's target population for irrigated rice in SE Asia including IRRI/Los Baños (‘MET field’), Isabela, Nueva Ecija, Agusan del Norte, Bohol, and Midsayap—all in the Philippines, Batalagoda, Sri Lanka, and Hai Dong, Vietnam. Data were highly unbalanced (Table 1).
Research Findings
- The results of the GS + de novo GWAS were compared with 1. GS + historical GWAS models, in which the markers fit as fixed effects were selected from previously published GWAS data, and 2. the five other genotype-based prediction methods previously tested in this population: RR-BLUP without any fixed effects, RKHS, random forest (RF), Bayesian LASSO, and multiple linear selection. Across all traits and experiments, the most accurate statistical methods of those tested were the GS + de novo GWAS models (Figure 1).The plots show the results using the optimized training population for prediction of each trait in the RYT 2012 dry season (DS) and RYT 2012 wet seasons (WS) (that is, the cross-validation experiment that resulted in the best prediction accuracy for each trait in each validation season, see Supplementary Table S2A). GWAS for the GS + de novo GWAS models were run using both the RYT 2012 DS data (light blue) and the RYT 2012 WS data (dark blue). Percent decrease in accuracy of RR-BLUP and RF models versus the average of the two GS + de novo GWAS models (FLW), or the GS + de novo GWAS WS model are shown over the RR-BLUP and RF bars, respectively. Bars not labeled with the same letter (Pairwise Student’s t-test) indicate a significant difference in accuracy of the statistical methods across all experiments. Red X's mapped to the right axis=− log * average P value (using the Wald test) of the SNPs fit as fixed effects in the GS + de novo GWAS models, after FDR multiple-test correction.
- To determine whether significant GWAS-SNPs identified for the same traits but using different germplasm would be equally useful as fixed variables in our GS models, we compared prediction accuracies for flowering time and plant height of the above GS + de novo GWAS models to three additional RR-BLUP+fixed effects GS models in which the fixed SNPs were selected using GWAS data from Zhao et al. (2011). Regardless of whether previously published results might perform as well as de novo results, in our experiment, the previously published GWAS data never improved model accuracy over the GS + de novo GWAS models, thus, there appears to be no reason to pursue this strategy (Figure 2).
Figure 2 Figure 2 Comparison of GS + de novo GWAS with GS + historical GWAS models for flowering time (FLW, top), and plant height (PH, bottom). Graphs shows the results using the optimized training population for prediction of each trait in the RYT 2012 dry season (DS) and RYT 2012 wet seasons (WS; that is, the cross-validation experiment that resulted in the best prediction accuracy for each trait in each validation season; see Supplementary Table S2A). GS + GWAS models differed in the GWAS data used to select the SNPs fit as fixed effects. GS + de novo GWAS: 2012 DS (light blue)=de novo GWAS using 2012 DS data on training population individuals, GS + de novo GWAS: 2012 WS (dark blue) =de novo GWAS run using 2012 WS data on training population individuals, GS + historical GWAS: 44K all (red)=previously published (historical) GWAS data were used from Zhao et al., 2011 the 'all subpopulations' results, GS + historical GWAS: 44K indica (burnt orange)= the indica subpopulation results from Zhao et al. 2011 were used, GS + historical GWAS: 44K TRJ (green)=the tropical japonica results from Zhao et al. (2011) were used. Bars not labeled with the same letter indicate a significant difference in model accuracies across all experiments.
- The researchers also tested the accuracy of the best performing GS + de novo GWAS models at decreasing numbers of genome-wide markers. (GWAS were also run with the decreasing number of markers as it is fully integrated into the RR-BLUP+fixed effects GS model). Consistent with previous results, they found that ~ 5000 SNPs were as effective for prediction as the full marker set of 108 005 SNPs (Figure 3).
Figure 3 Mean accuracies of cross-validation for prediction of flowering time (FLW, top), plant height (PH, middle) and grain yield (YLD, bottom) in the 2012 dry season (left), and the 2012 wet season (right), using 10 selections of SNP subsets chosen to be either distributed evenly throughout the genome (light shades) or chosen at random (dark shades); left axis. The best performing GS + de novo GWAS models (blues), as well as RR-BLUP models (oranges) and previous best performing CV experiments were run for each trait, see Supplementary Table S2A. Right axis (blue X's)=–log * average P value (Wald test) of the SNPs fit as fixed effects in the GS + de novo GWAS models, after FDR multiple-test correction. All error bars were construed using 1 s.e. of the mean.
- The researchers propose a two-stream/two-part GS breeding schema in which under-utilized germplasm is systematically incorporated into a GS breeding pipeline to test for and predict the presence of new, highly effective allele combinations (Figure 4). The approach would enable the breeder to learn directly from data on new and diverse germplasm and make rapid genetic gain in a way that would not be possible using simple RR-BLUP models, as it is the GS + de novo GWAS strategy that makes it possible to extract the information necessary for fixing valuable exotic alleles during model development as well as enhancing prediction accuracy.
Labs working on this Project
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, USA;
- Department of Plant Breeding, Genetics and Biotechnology, International Rice Research Institute, Los Baños, Philippines
- USDA-ARS, North Atlantic Ares, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, USA
- Current address: Bangladesh Rice Research Institute, Gazipur 1701, Bangladesh.
- Current address: Delta Research and Extension Center, 82 Stoneville Road, PO Box 197, Stoneville, MS 38776, USA.
Corresponding Author
- Professor S McCouch: srm4@cornell.edu

