Statistics about the GENCODE Release M5

The statistics derive from the gtf file that contains only the annotation of the main chromosomes.

For details about the calculation of these statistics please see the README_stats.txt file.

General stats

Total No of Genes 45232
Protein-coding genes 21953
Long non-coding RNA genes 7989
Small non-coding RNA genes 6109
Pseudogenes 8687
- processed pseudogenes 6077
- unprocessed pseudogenes 2235
- unitary pseudogenes 15
- polymorphic pseudogenes 19
- pseudogenes 136
Immunoglobulin/T-cell receptor gene segments
- protein coding segments 494
- pseudogenes 205
Total No of Transcripts 107842
Protein-coding transcripts 49145
- full length protein-coding 38869
- partial length protein-coding 10276
Nonsense mediated decay transcripts 4800
Long non-coding RNA loci transcripts 11206
 
Total No of distinct translations 39706
Genes that have more than one distinct translations 8580

Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 2 3
antisense 2000 2925
IG_C_gene 13 20
IG_C_pseudogene 1 1
IG_D_gene 19 20
IG_D_pseudogene 4 4
IG_J_gene 14 18
IG_LV_gene 4 4
IG_V_gene 218 306
IG_V_pseudogene 156 156
lincRNA 3297 4912
macro_lncRNA 1 2
miRNA 2202 2202
misc_RNA 564 566
Mt_rRNA 2 2
Mt_tRNA 22 22
non_stop_decay 0 10
nonsense_mediated_decay 0 4800
polymorphic_pseudogene 19 24
processed_pseudogene 5920 5921
processed_transcript 751 13406
protein_coding 21953 49145
pseudogene 136 136
retained_intron 0 15101
ribozyme 22 22
rRNA 354 354
scaRNA 51 51
sense_intronic 231 251
sense_overlapping 22 44
snoRNA 1508 1508
snRNA 1382 1382
sRNA 2 2
TEC 1685 1752
TR_C_gene 8 11
TR_D_gene 4 5
TR_J_gene 70 76
TR_J_pseudogene 10 10
TR_V_gene 144 194
TR_V_pseudogene 34 34
transcribed_processed_pseudogene 156 160
transcribed_unprocessed_pseudogene 129 139
translated_processed_pseudogene 1 13
translated_unprocessed_pseudogene 1 1
unitary_pseudogene 15 15
unprocessed_pseudogene 2105 2112