0 avis
Reproducing Deep Learning experiments: common challenges and recommendations for improvement
Archive ouverte
Edité par CCSD -
IDW 2022 was hosted in Seoul, the Republic of Korea, by the Korea Institute of Science and Technology Information (KISTI), committed by the Ministry of Science and ICT, Seoul Metropolitan Government, National Library of Korea, and National Assembly Library, with the support of the Korea Research Institute of Standards and Science (KRISS), Sungkyunkwan University (SKKU), Korea Institute of Oriental Medicine and the Korea Institute of Geoscience and Mineral Resources.This landmark event brought together data scientists, researchers, industry leaders, entrepreneurs, policymakers, and data stewards from disciplines across the globe to explore how best to exploit the data revolution to improve science and society through data-driven discovery and innovation. IDW 2022 combined the 19th RDA Plenary Meeting, the biannual meeting of this international member organization working to develop and support global infrastructure facilitating data sharing and reuse, and SciDataCon 2022, the scientific conference addressing the frontiers of data in research organized by CODATA and WDS.. International audience. One of the challenges in Machine Learning research is to ensure that the presented and published results are sound and reliable. Reproducibility is an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. We already went through the path of darkness: We proposed a set of recommendations ('fixes') to overcome these reproducibility challenges that a researcher may encounter in order to improve Reproducibility and Replicability (R&R) and reduce the likelihood of wasted effort. These strategies can be used as "swiss army knife" to move from DL to more general areas as they are organized as (i) the quality of the dataset (and associated metadata), (ii) the Deep Learning method, (iii) the implementation, and the infrastructure used. We identified the main challenges and constraints from these papers and presented them accordingly. Finally, with the lessons learned in the previous step, we propose a set of mitigation strategies to overcome the main reproducibility challenges and help researchers achieve their goals.