I am running splice variant analysis. I wanted to use NCBI genome but the program works better with ensembl. I am a bit confused on primary vs. top level to use as my reference genome. I also do not know which annotation file to use. And I am not seeing a reference rna download. If anyone can help that would be immensely appreciated.
1 Answers
Ensembl provides gene annotation files in both GTF and GFF3 format via the FTP site: http://ftp.ensembl.org/pub/
The GTF and GFF3 files are named according to whether they contain data relating to the primary assembly, or on all top-level sequences including unplaced scaffolds, patches and haplotypes. More information can be found in the README files within each folder. Which genome sequence file and annotation file you use will depend on the details of your study and have been discussed in detail on other posts.
Ensembl also provides reference RNAseq data aligned to the reference genome sequence as BAM files with corresponding .bai index files on the FTP site: http://ftp.ensembl.org/pub/current_bamcov/homo_sapiens/genebuild/
More information can be found in the README: http://ftp.ensembl.org/pub/current_bamcov/homo_sapiens/genebuild/README

- 146
- 3
For your reference question: https://bioinformatics.stackexchange.com/questions/540/what-ensembl-genome-version-should-i-use-for-alignments-e-g-toplevel-fa-vs-p?rq=1 Generally using primary is the best option
– story Jul 30 '21 at 10:40