Genomic prediction based on data from three layer lines using non-linear regression models

Archive ouverte

Huang, Heyun | Windig, Jack J | Vereijken, Addie | Calus, Mario Pl

Edité par CCSD ; BioMed Central -

International audience. AbstractBackgroundMost studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.MethodsIn an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.ResultsWhen the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.ConclusionsLinear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.

Suggestions

Du même auteur

Genomic prediction based on data from three layer lines: a comparison between linear methods

Archive ouverte | Calus, Mario Pl | CCSD

International audience. Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic predic...

A comparison of principal component regression and genomic REML for genomic prediction across populations

Archive ouverte | Dadousis, Christos | CCSD

International audience. Background Genomic prediction faces two main statistical problems: multicollinearity and n ≪ p (many fewer observations than predictor variables). Principal component (PC) analysis is a mult...

Impact of QTL properties on the accuracy of multi-breed genomic prediction

Archive ouverte | Wientjes, Yvonne Cj | CCSD

International audience. Background Although simulation studies show that combining multiple breeds in one reference population increases accuracy of genomic prediction, this is not always confirmed in empirical stud...

Chargement des enrichissements...