Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool

Archive ouverte

Mariette, Jérôme, J. | Noirot, Céline | Klopp, Christophe

Edité par CCSD ; BioMed Central -

International audience. Background: Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. Findings: PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease theassembly process. This program is a free software and is distributed under the terms of the GNU General PublicLicense as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Conclusions: Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

Suggestions

Du même auteur

Unsupervised multiple kernel learning for heterogeneous data integration

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. Motivation: Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has ...

Des noyaux pour les omiques

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. Le développement des techniques de séquençage haut débit génère un volume de données en forte croissance à des coûts relativement faibles. Ces données sont souvent de très grande dimension, h...

Aggregating Self-Organizing Maps with Topology Preservation

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. In the online version of Self-Organizing Maps, the results obtained from different instances of the algorithm can be rather different. In this paper, we explore a novel approach which aggrega...

Chargement des enrichissements...