0 avis
Optimizing artificial neural network methodologies for enhanced genomic predictions: a case study with oil palm (Elaeis guineensis) data
Archive ouverte
Edité par CCSD -
Genomic selection (GS) has revolutionized animal and crop breeding by enabling the prediction of genetic values for unphenotyped individuals. Artificial neural networks (ANNs) gave promising results for GS, but their optimal implementation remains challenging. This study investigates critical factors influencing ANN performance in genomic prediction, using an oil palm dataset comprising two sites (Site 1 for training, Site 2 to test performances on different crosses). For ANNs, Site 1 was divided in training and validation subsets, to control overfitting and to optimize architecture and hyperparameters of the models. We compared multi-layer perceptron (MLP), convolutional neural networks (CNN), gated recurrent unit (GRU) networks and conventional statistical methods (GBLUP and Bayesian approaches). Our results revealed substantial variability in MLP performance depending on architecture and hyperparameters, highlighting the necessity for model optimization. The prediction accuracy in validation subsets correlated enough with the prediction accuracy in the test set (r test ) to enable effective optimization and identification of ANN models with high performance in the test set. Bayesian optimization was superior to random search by allowing major reduction in computation time. Optimized MLP models increased r test by up to 32.8% for total bunch production and 5.1% for bunch number compared to the best conventional method, with similar r test for height increment. ANN type did not significantly affect prediction accuracy, as Bayesian-optimized models performed similarly across MLP, CNN, and GRU. Well-performing ANN had satisfactory repeatability, similar to Bayesian GS methods. Replicates remain essential for accurately evaluating the performance of ANNs. Computation time can be reduced by simplifying the search space for model optimization, and reducing the complexity of SNP data. This study confirmed the great potential of ANNs for genomic predictions and identify critical factors for their optimal application. Future research should explore how to design validation subsets that enable optimization to identify models with improved performance on the test set. Additionally, broader evaluations of machine learning tools, such as stacking, alternative types of ANNs, and other machine learning models, should be conducted.