Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts

Archive ouverte

Bouwman, Aniek C. | Hayes, Ben J. | Calus, Mario P. L.

Edité par CCSD ; BioMed Central -

International audience. AbstractBackgroundGenomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance.ResultsIn a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from − 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of − 0.8, while the mean squared error of prediction was minimized with a scaling parameter of − 1, i.e., RRcs.ConclusionsLarge differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.

Suggestions

Du même auteur

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection

Archive ouverte | Calus, Mario P. L. | CCSD

International audience. BackgroundUse of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased comput...

Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle

Archive ouverte | Veerkamp, Roel F. | CCSD

International audience. AbstractBackgroundWhole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance ex...

Utility of whole-genome sequence data for across-breed genomic prediction

Archive ouverte | Raymond, Biaty | CCSD

International audience. AbstractBackgroundGenomic prediction (GP) across breeds has so far resulted in low accuracies of the predicted genomic breeding values. Our objective was to evaluate whether using whole-genom...

Chargement des enrichissements...