Evaluating validation strategies on the performance of soil property prediction from regional to continental spectral data

Archive ouverte

Chen, Songchao | Xu, Hanyi | Xu, Dongyun | Ji, Wenjun | Li, Shuo | Yang, Meihua | Hu, Bifeng | Zhou, Yin | Wang, Nan | Arrouays, Dominique | Shi, Zhou

Edité par CCSD ; Elsevier -

International audience. Visible-near infrared (vis–NIR) spectroscopy has been widely used to characterize soil information from field to global scales. Before applying a calibrated spectral predictive model to acquire soil information, either independent validation or k-fold cross validation is used to evaluate model performance. However, there is no consensus on which validation strategy is more suitable and robust when evaluating model performance for the studies in different scales. The objective of this study is to evaluate and compare the model performance of two validation strategies coupling different calibration sizes (a ratio of calibration to validation of 2:1, 4:1 and 9:1) and calibration sampling strategies (random sampling (RS), rank, Kennard-Stone (KS), rank-Kennard-Stone (RKS) and conditioned Latin hypercube sampling (cLHS)) across scales. A total of 17,272 vis–NIR spectra of mineral soils from LUCAS data (continental scale) and their soil organic carbon (SOC) and clay contents were used in this study, and the dataset was further split into national (2761 samples in France) and five regional datasets (110 to 248 samples from five French administrative regions). To eliminate the effect of changing validation set on the model performance, a consistent test set (20% of total samples at each scale) was split to evaluate all the combinations involved in two validation strategies. The Lin’s concordance correlation coefficient (CCC) of the cubist model were stable for both SOC and clay for different calibration sizes, calibration sampling and validation strategies for a large calibration size (>1400) at the national and continental scales. A larger calibration size can potentially improve model performance for a small dataset (<300) at the regional scale, and a wider calibration range would result in better model performance. No silver bullet was found among the different calibration sampling strategies at the regional scale. For five French regions (small data set), we found a high variation (95th percentile minus the 5th percentile) in the CCC among the models built from 50 repeated RS (0.10–0.44 for SOC, 0.16–0.52 for clay) and cLHS (0.08–0.40 for SOC, 0.12–0.36 for clay). This finding indicates that a one-time RS or cLHS for selecting the calibration set has high uncertainty in model evaluation for a small dataset and therefore should be used with caution. Therefore, we suggest the following: (1) for a large data set (thousands), either one-time random sampling for independent validation or k-fold cross validation would be appropriate; (2) for a small data set (dozens to hundreds), k-fold cross validation and/or repeated random sampling for independent validation would be more robust for spectral predictive model evaluation.

Consulter en ligne

Suggestions

Du même auteur

Monitoring soil organic carbon in alpine soils using in situ vis‐NIR spectroscopy and a multilayer perceptron

Archive ouverte | Chen, Songchao | CCSD

Monitoring soil organic carbon in alpine soils using in situ vis‐NIR spectroscopy and a multilayer perceptron

Spatio-temporal variation and source changes of potentially toxic elements in soil on a typical plain of the Yangtze River Delta, China (2002–2012)

Archive ouverte | Hu, Bifeng | CCSD

International audience. The spatio-temporal variation and temporal changes in the sources of Cr, Pb, Cd, Hg, and As in soil on the Hangzhou-Jiaxing-Huzhou (H-J-H) Plain were analysed based on 4,359 soil samples coll...

Improved Mapping of Potentially Toxic Elements in Soil via Integration of Multiple Data Sources and Various Geostatistical Methods

Archive ouverte | Xia, Fang | CCSD

International audience. Soil pollution by potentially toxic elements (PTEs) has become a core issue around the world. Knowledge of the spatial distribution of PTEs in soil is crucial for soil remediation. Portable X...

Chargement des enrichissements...