0 avis
D2.2 Quality annotation protocols for phenotypic platform data
Archive ouverte
Edité par CCSD -
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 731013. This publication reflects only the view of the author, and the European Commission cannot be held responsible for any use which may be made of the information contained therein.
EPPN 2020.The present deliverable specifically addresses quality annotation protocols for phenotyping platform data. We first explain what cleaning phenotypic data is and why it is important to do it and keep track of how it was done. We then provide platform users with clearly described and defined rules for outliers identification and annotation in an automatic and traceable way. An outlier is usually defined as an observation that appears to be inconsistent with the remainder of the dataset. After visiting a number of facilities and discussing with platform users, we have defined three types of outliers to annotate in the phenotypic data: (1) time points within a time course, (2) whole time courses of one or more variables and (3) a whole plant, defined here as a biological replicate deviating from the overall distribution of plants on a multi-criteria basis. This classification of outliers was proven relevant by the consortium partners. In this document, we propose procedures to identify them. For the first two types of outliers, statistical methods already exist and have been adapted and applied to datasets from differentplatform/species. The «plant outlier» type is new and a method has recently been published (Alvarez Prado et al., 2019). The common idea here is to provide annotated data to the user who, in the end, will decide whether or not to keep the annotated points, time course or plant for further analyses.