Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?

Archive ouverte

Deshayes, Caroline | Perrodou, Emmanuel | Gallien, Sebastien | Euphrasie, Daniel | Schaeffer, Christine | Van-Dorsselaer, Alain | Poch, Olivier | Lecompte, Odile | Reyrat, Jean-Marc

Edité par CCSD ; BioMed Central -

BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects. RESULTS: We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors. CONCLUSION: The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.

Consulter en ligne

Suggestions

Du même auteur

Detecting the molecular scars of evolution in the Mycobacterium tuberculosis complex by analyzing interrupted coding sequences.

Archive ouverte | Deshayes, Caroline | CCSD

BACKGROUND: Computer-assisted analyses have shown that all bacterial genomes contain a small percentage of open reading frames with a frameshift or in-frame stop codon We report here a comparative analysis of these interrupted cod...

Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol.

Archive ouverte | Gallien, Sébastien | CCSD

International audience. The progress in sequencing technologies irrigates biology with an ever-increasing number of genome sequences. In most cases, the gene repertoire is predicted in silico and conceptually transl...

ICDS database: interrupted CoDing sequences in prokaryotic genomes.

Archive ouverte | Perrodou, Emmanuel | CCSD

International audience. Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, ...

Chargement des enrichissements...