Theoretical and empirical quality assessment of transcription factor-binding motifs

Archive ouverte

Medina-Rivera, Alejandra | Abreu-Goodger, Cei | Thomas-Chollier, Morgane | Salgado, Heladia | Collado-Vides, Julio | van Helden, Jacques

Edité par CCSD ; Oxford University Press -

International audience. Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program 'matrix-quality', that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied 'matrix-quality' to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP-seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.

Suggestions

Du même auteur

RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond

Archive ouverte | Gama-Castro, Socorro | CCSD

International audience. RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation, as it integrates the scattered scientific knowledge of the best-charac...

RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding

Archive ouverte | Santana-Garcia, Walter | CCSD

International audience. Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby alteri...

RSAT 2022: regulatory sequence analysis tools

Archive ouverte | Santana-Garcia, Walter | CCSD

International audience. RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (inc...

Chargement des enrichissements...