Optimizing artificial neural network methodologies for enhanced genomic predictions: a case study with oil palm (Elaeis guineensis) data

Archive ouverte

Cros, David | Rouan, Lauriane | Navratil, Daphné | Tchounke, Billy | Leroy, Nicolas | Le Squin, Sandrine | Ulfah, Najelaa | Nodichao, Léifi | Beurier, Grégory

Edité par CCSD -

Genomic selection (GS) has revolutionized animal and crop breeding by enabling the prediction of genetic values for unphenotyped individuals. Artificial neural networks (ANNs) gave promising results for GS, but their optimal implementation remains challenging. This study investigates critical factors influencing ANN performance in genomic prediction, using an oil palm dataset comprising two sites (Site 1 for training, Site 2 to test performances on different crosses). For ANNs, Site 1 was divided in training and validation subsets, to control overfitting and to optimize architecture and hyperparameters of the models. We compared multi-layer perceptron (MLP), convolutional neural networks (CNN), gated recurrent unit (GRU) networks and conventional statistical methods (GBLUP and Bayesian approaches). Our results revealed substantial variability in MLP performance depending on architecture and hyperparameters, highlighting the necessity for model optimization. The prediction accuracy in validation subsets correlated enough with the prediction accuracy in the test set (r test ) to enable effective optimization and identification of ANN models with high performance in the test set. Bayesian optimization was superior to random search by allowing major reduction in computation time. Optimized MLP models increased r test by up to 32.8% for total bunch production and 5.1% for bunch number compared to the best conventional method, with similar r test for height increment. ANN type did not significantly affect prediction accuracy, as Bayesian-optimized models performed similarly across MLP, CNN, and GRU. Well-performing ANN had satisfactory repeatability, similar to Bayesian GS methods. Replicates remain essential for accurately evaluating the performance of ANNs. Computation time can be reduced by simplifying the search space for model optimization, and reducing the complexity of SNP data. This study confirmed the great potential of ANNs for genomic predictions and identify critical factors for their optimal application. Future research should explore how to design validation subsets that enable optimization to identify models with improved performance on the test set. Additionally, broader evaluations of machine learning tools, such as stacking, alternative types of ANNs, and other machine learning models, should be conducted.

Suggestions

Du même auteur

Mate selection: A useful approach to maximize genetic gain and control inbreeding in genomic and conventional oil palm (Elaeis guineensis Jacq.) hybrid breeding

Archive ouverte | Tchounke, Billy | CCSD

International audience. Genomic selection (GS) is an effective method for the genetic improvement of complex traits in plants and animals. Optimization approaches could be used in conjunction with GS to further incr...

Training genomic selection models across several breeding cycles increases genetic gain in oil palm in silico study

Archive ouverte | Cros, David | CCSD

International audience. Genomic selection (GS) is expected to increase the rate of genetic gain in oil palm. In a GS scheme, breeding cycles with progeny tests (phenotypic selection, PS) used to calibrate the GS pre...

A perspective on plant phenomics: coupling deep learning and near-infrared spectroscopy

Archive ouverte | Vasseur, François | CCSD

International audience. The trait-based approach in plant ecology aims at understanding and classifying the diversity of ecological strategies by comparing plant morphology and physiology across organisms. The major...

Chargement des enrichissements...