RNA-seq Genome Annotation Assessment Project (1/2)
Data Sets
Usage restrictions: Data should not to be used for publications without written permission, see https://www.genome.gov/ENCODE/#3.
Round 1
A readme file is available on our FTP site.
- Illumina fastq files:
Insert sizes for paired reads: 200, for single reads: 100 nucleotides.
151 nucl. sequences consists of 2x75 reads, the last nucl. is discarded.- Human polyA+ total RNA, single reads, K562
- Human polyA+ total RNA, single reads, GM12878
- Human polyA+ total RNA, paired reads, K562
- Human polyA+ total RNA, paired reads, GM12878
- Human polyA+ cytosolic RNA, single reads, stranded, K562
- SOLiD cfasta files:
- Human cytosolic long polyA+, K562
- Human cytosolic long polyA+, GM12878
- Helicos fasta file:
- Human cytosolic long polyA+, K562
- modENCODE Drosophila data:
- fastq files from cell lines S2-DRSC, CME_W1_CI, Kc167, ML-DmBG3-c2
- modENCODE C.elegans data:
- fastq files from 6 different stages
Round 2
A readme file and the data are available on our FTP site.
- experiment: Homo sapiens polyA+ total RNA, paired reads, HepG2
- lab: Wold lab, Caltech
- format: fastq, tar archive with bzipped files
- other details: 75mer sequences, the last base has been removed
- _1 & _2 are the corresponding pairs
- includes spike-in sequences for quantification
- quality scores are Sanger rather than Illumina
- fragment length is 200bp with a std deviation of 34
- experiment: Caenorhabditis elegans polyA+ total RNA, paired reads, L3 phase
- lab: Sternberg lab/Wold lab, Caltech
- format: fastq, tar archive with bzipped files
- other details: 75mer sequences, the last base has been removed
- _1 & _2 are the corresponding pairs
- includes spike-in sequences for quantification
- quality scores are Sanger rather than Illumina
- fragment length is 165bp with a standard deviation of 28
- experiment: Drosophila melanogaster polyA+ total RNA, paired reads, L3 stage larvae
- lab: Celniker lab, Lawrence Berkeley National Laboratory
- format: fastq, tar archive with gzipped files
- other details: 76mer sequences
- _1 & _2 are the corresponding pairs
- produced on an Illumina Genome Analyzer II
- fragment length is 250-300bp
- low quality reads have been filtered out
Spike in Data for Quantification
To allow a more precise quantification control for (human) RNA-Seq quantification, we will test control sequences of defined concetrations in the nanostring experiments for the datasets from the Wold lab (fastq files 1-4).
There is a fasta file with the spiked-in sequences available (download).
Please make sure you submit your quantification for these as well!