Watch out for a second SNP: focus on multi-nucleotide variants in coding regions and rescued stop-gained

Archive ouverte

Degalez, Fabien | Jehl, Frédéric | Muret, Kévin | Bernard, Maria | Lecerf, Frédéric | Lagoutte, Laetitia | Désert, Colette | Pitel, Frederique | Klopp, Christophe | Lagarrigue, Sandrine

Edité par CCSD ; Frontiers Media -

International audience. Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called “multi-nucleotide variants” (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs— including 3.3M SNPs with reliable genotypes—were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stopgained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.

Suggestions

Du même auteur

RNA-Seq Data for Reliable SNP Detection and Genotype Calling. RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

Archive ouverte | Jehl, Frédéric | CCSD

International audience. In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different popu...

An integrative atlas of chicken long non-coding genes and their annotations across 25 tissues

Archive ouverte | Jehl, Frédéric | CCSD

International audience. Long non-coding RNAs (LNC) regulate numerous biological processes. In contrast to human, the identification of LNC in farm species, like chicken, is still lacunar. We propose a catalogue of 5...

RNA-seq data for detecting reliable SNPs & genotypes in livestock species: interest for coding variant characterization and cis-regulation analysis by allele-specific expression !

Archive ouverte | Jehl, Frédéric | CCSD

International audience. Context 2 • For detecting polymorphisms in the whole genome of a population, DNA-seq data analyzed by the bioinformatics GATK tool is the standard approach. • DNA-seq data are expensive to ge...

Chargement des enrichissements...