Ribo-seq ORFs
In recent years, Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – i.e. unannotated – translated open reading frames (ORFs) in the human genome. GENCODE are working on a long-term community-driven project to incorporate these features into reference gene annotation. This pioneering work is being done in collaboration with the UniProtKB, HUPO-PP, PeptideAtlas and HGNC annotation projects, alongside a variety of experimental and analytical research groups from across the globe.
The initial stage of this work involved making a first consensus set of Ribo-seq ORFs identified by seven recent experimental publications mapped to GENCODE version 35 annotations. A manuscript detailing this work is available here. A supplementary file containing these 7,264 Ribo-seq ORFs, plus other data, is attached to the publication. A bigBed file is available here, and the URL https://ftp.ebi.ac.uk/pub/databases/gencode/riboseq_orfs/data/Ribo-seq_ORFs.bb can be used to create a custom track on the Ensembl or UCSC genome browsers.
The second part of this work addresses the question of which Ribo-seq ORFs are supported by peptide evidence, and our consortium has now posted a preprint to this effect. Official GENCODE annotation files will be released to accompany the final publication of this work. In the meantime, Supplementary Table S3 of the preprint contains a list of Ribo-seq ORFs from the prior set of 7,264 that are supported by peptide evidence from regular tryptic digests, while Supplementary Table S5 lists Ribo-seq ORFs supported by immunopeptidomics data.
We strongly recommend reading these manuscripts before exploring the annotations, as the biological interpretation of Ribo-seq ORFs is not straightforward. In particular, we emphasise that thus far GENCODE have annotated only a modest number as protein-coding genes. Furthermore, as we start to established which Ribo-seq ORFs are translated into stable proteins, we are also questioning whether the experimental detection of a corresponding protein is sufficient to support protein-coding gene annotation. This question is especially relevant for the substantial number of Ribo-seq ORFs which are supported by peptide data from immunopeptidomics datasets, as this technique can likely detect translation products that are not fully stable. We also note that translation can instead impart function through gene regulation, while it is plausible that certain Ribo-seq ORFs do not make important contributions to cellular physiology. We hope that these preliminary annotations will help researchers interested in addressing such questions, and anticipate that answers provided will lead to the further improvement and refinement of our catalog.