Gene/Transcript Biotypes in GENCODE & Ensembl
Please also compare to the VEGA descriptions.
Further details about the annotation of non-coding RNAs are listed on this Ensembl page.
Gencode GTF format description.
Biotype | Definition |
---|---|
IG_C_gene IG_D_gene IG_J_gene IG_LV_gene IG_V_gene TR_C_gene TR_J_gene TR_V_gene TR_D_gene |
Immunoglobulin (Ig) variable chain and T-cell receptor (TcR) genes imported or annotated according to the IMGT. |
IG_pseudogene IG_C_pseudogene IG_J_pseudogene IG_V_pseudogene TR_V_pseudogene TR_J_pseudogene |
Inactivated immunoglobulin gene. |
Mt_rRNA Mt_tRNA miRNA misc_RNA rRNA scRNA snRNA snoRNA ribozyme sRNA scaRNA |
Non-coding RNA predicted using sequences from Rfam and miRBase |
lncRNA | Generic long non-coding RNA biotype that replaced the following biotypes: 3prime_overlapping_ncRNA, antisense, bidirectional_promoter_lncRNA, lincRNA, macro_lncRNA, non_coding, processed_transcript, sense_intronic and sense_overlapping. |
Mt_tRNA_pseudogene tRNA_pseudogene snoRNA_pseudogene snRNA_pseudogene scRNA_pseudogene rRNA_pseudogene misc_RNA_pseudogene miRNA_pseudogene |
Non-coding RNA predicted to be pseudogene by the Ensembl pipeline |
TEC | To be Experimentally Confirmed. This is used for non-spliced EST clusters that have polyA features. This category has been specifically created for the ENCODE project to highlight regions that could indicate the presence of protein coding genes that require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies. |
nonsense_mediated_decay | If the coding sequence (following the appropriate reference) of a transcript finishes >50bp from a downstream splice site then it is tagged as NMD. If the variant does not cover the full reference coding sequence then it is annotated as NMD if NMD is unavoidable i.e. no matter what the exon structure of the missing portion is the transcript will be subject to NMD. |
non_stop_decay | Transcript that has polyA features (including signal) without a prior stop codon in the CDS, i.e. a non-genomic polyA tail attached directly to the CDS without 3' UTR. These transcripts are subject to degradation. |
retained_intron | Alternatively spliced transcript believed to contain intronic sequence relative to other, coding, variants. |
protein_coding | Contains an open reading frame (ORF). |
protein_coding_LoF | Not translated in the reference genome owing to a SNP/DIP but in other individuals/haplotypes/strains the transcript is translated. Replaces the polymorphic_pseudogene transcript biotype. |
protein_coding_CDS_not_defined | Transcript that belongs to a protein_coding gene and doesn't contain an ORF. Replaces the processed_transcript transcript biotype in protein_coding genes. |
processed_transcript | Doesn't contain an ORF. |
non_coding | Transcript which is known from the literature to not be protein coding. |
ambiguous_orf | Transcript believed to be protein coding, but with more than one possible open reading frame. |
sense_intronic | Long non-coding transcript in introns of a coding gene that does not overlap any exons. |
sense_overlapping | Long non-coding transcript that contains a coding gene in its intron on the same strand. |
antisense/antisense_RNA | Has transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand. |
known_ncrna | |
pseudogene | Have homology to proteins but generally suffer from a disrupted coding sequence and an active homologous gene can be found at another locus. Sometimes these entries have an intact coding sequence or an open but truncated ORF, in which case there is other evidence used (for example genomic polyA stretches at the 3' end) to classify them as a pseudogene. Can be further classified as one of the following. |
processed_pseudogene | Pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome. |
polymorphic_pseudogene | Pseudogene owing to a SNP/DIP but in other individuals/haplotypes/strains the gene is translated. |
retrotransposed | Pseudogene owing to a reverse transcribed and re-inserted sequence. |
transcribed_processed_pseudogene transcribed_unprocessed_pseudogene transcribed_unitary_pseudogene |
Pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression. |
translated_processed_pseudogene translated_unprocessed_pseudogene |
Pseudogene that has mass spec data suggesting that it is also translated. |
unitary_pseudogene | A species-specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species. |
unprocessed_pseudogene | Pseudogene that can contain introns since produced by gene duplication. |
artifact | Annotated on an artifactual region of the genome assembly. |
lincRNA | Long, intervening noncoding (linc) RNA that can be found in evolutionarily conserved, intergenic regions. |
macro_lncRNA | Unspliced lncRNA that is several kb in size. |
3prime_overlapping_ncRNA | Transcript where ditag and/or published experimental data strongly supports the existence of short non-coding transcripts transcribed from the 3'UTR. |
disrupted_domain | Otherwise viable coding region omitted from this alternatively spliced transcript because the splice variation affects a region coding for a protein domain. |
vaultRNA/vault_RNA | Short non coding RNA gene that forms part of the vault ribonucleoprotein complex. |
bidirectional_promoter_lncRNA | A non-coding locus that originates from within the promoter region of a protein-coding gene, with transcription proceeding in the opposite direction on the other strand. |