0 avis
Using Random Forests as a prescreening tool for genomic prediction: impact of subsets of SNPs on prediction accuracy of total genetic values
Archive ouverte
Edité par CCSD ; Massey University -
International audience. Machine learning methods have been shown to be superior for predicting phenotypic values over conventional statistical methods when epistatic effects of SNPs play a key role in controlling complex traits. However, it is unknown if non-additive effects captured by machine learning methods contribute to the prediction accuracy of total genetic values. In this study, using a 5-fold cross-validation approach and a dataset from 2,109 Brahman cattle with 651,253 SNP genotypes, we applied the machine learning method - Random Forests (RF) as a prescreening tool to identify subsets of SNPs for genomic prediction of total genetic values of yearling weight (YWT). Both additive and dominance effects of the subset SNPs, from 500, 1,000, 5,000, 10,000 to 50,000, were included in the genomic models. The results were compared with those from all SNPs or the same-size subsets of SNPs selected evenly distributed along the genome. The results show that including the dominance variation in the genomic model had no impact on the estimates of additive variance, heritability and genomic prediction accuracy. However, the subsets of SNPs identified by RF had significantly higher genomic prediction accuracy values than the evenly spaced SNPs and the whole SNP panel.