Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions

Archive ouverte

Denoeud, France | Kapranov, Philipp | Ucla, Catherine | Frankish, Adam | Castelo, Robert | Drenkow, Jorg | Lagarde, Julien | Alioto, Tyler | Manzano, Caroline | Chrast, Jacqueline | Dike, Sujit | Wyss, Carine | Henrichsen, Charlotte N | Holroyd, Nancy | Dickson, Mark C | Taylor, Ruth | Hance, Zahra | Foissac, Sylvain | Myers, Richard M | Rogers, Jane | Hubbard, Tim | Harrow, Jennifer | Guigó, Roderic | Gingeras, Thomas R | Antonarakis, Stylianos E | Reymond, Alexandre

Edité par CCSD ; Cold Spring Harbor Laboratory Press -

Notice à Reprendre Sur les Auteurs. This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

Consulter en ligne

Suggestions

Du même auteur

Evidence for transcript networks composed of chimeric RNAs in human cells

Archive ouverte | Djebali, Sarah | CCSD

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to hum...

Efficient targeted transcript discovery via array-based normalization of RACE libraries

Archive ouverte | Djebali, Sarah | CCSD

Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundan...

Landscape of transcription in human cells

Archive ouverte | Djebali, Sarah | CCSD

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic su...

Chargement des enrichissements...