The GENCODE Project: Encyclopædia of genes and gene variants
Background
The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. After a successful pilot phase on 1% of the genome, the scale-up to the entire genome is now underway. The Wellcome Sanger Institute was awarded a grant to carry out a scale-up of the GENCODE project for integrated annotation of gene features.
Having been involved in successfully delivering the definitive annotation of functional elements in the human genome, the GENCODE group were awarded a second grant in 2013 in order to continue their human genome annotation work and expand GENCODE to include annotation of the mouse genome. A third grant was awarded in 2017 for the continued improvement of the annotation of the human and mouse genomes.
The GENCODE gene sets are used by the entire ENCODE consortium and by many other projects (eg. Genotype-Tissue Expression (GTEx), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), NIH Roadmap Epigenomics Mapping Consortium, Blueprint Epigenome Project, Exome Aggregation Consortium (EXAC), Genome Aggregation Database (gnomAD), 1000 Genomes Project and the Human Cell Atlas (HCA)) as reference gene sets.
Current GENCODE Goals
The aims of the current GENCODE phase running from 2017 to 2021 are:
- To continue to improve the coverage and accuracy of the GENCODE human and mouse gene sets by enhancing and extending the annotation of all evidence-based gene features in the human genome at a high accuracy, including protein-coding loci with alternatively splices variants, non-coding loci and pseudogenes.
The process to create this annotation involves manual curation, computational analysis and targeted experimental approaches.
The human and mouse GENCODE resources will continue to be available to the research community with regular releases of Ensembl genome browser and the UCSC genome browser will continue to present the current release of the GENCODE gene set.
Participants, PI & Co-PIs
- Paul Flicek (Lead PI), EMBL European Bioinformatics Institute, Cambridge, UK
- Roderic Guigo (PI), Centre de Regulació Genòmica (CRG), Barcelona, Catalonia, Spain
- Manolis Kellis (PI), Massachusetts Institute of Technology (MIT), Boston, USA
- Mark Gerstein (PI), Yale University, New Haven, USA
- Benedict Paten (PI), University of California, Santa Cruz, California, USA
- Michael Tress, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- Jyoti Choudhary, Institute of Cancer Research (ICR), London, UK
GENCODE collaborators
We are working in close collaboration with various other research groups around the world. These include the NCBI (eg. Terence Murphy, CCDS project), CSHL (Tom Gingeras group) and others.
Please contact us if you would like to start a collaboration with the GENCODE project.
Acknowledgements
The GENCODE project is funded by the National Human Genome Research Institute (NHGRI) (2U41HG007234) and the European Molecular Biology Laboratory.
When referencing, please use ”Frankish A, et al (2018) GENCODE reference annotation for the human and mouse genomes” (PubMed).