I got a bunch of vcf files (v4.1) with structural variations of bunch of non-model organisms (i.e. there are no known variants). I found there are quite a some tools to manipulate vcf files like VCFtools, R package vcfR or python library PyVCF. However none of them seems to provide a quick summary, something like (preferably categorised by size as well):
type count
DEL x
INS y
INV z
....
Is there any tool or a function I overlooked that produces summaries of this style?
I know that vcf file is just a plain text file and if I will dissect REF
and ALT
columns I should be able to write a script that will do the job, but I hoped that I could avoid to write my own parser.
--- edit ---
So far it seems that only tool that aims to do summaries (@gringer answer) is not working on vcf v4.1. Other tools would provide just partial solution by filtering certain variant type. Therefore I accept my own parser perl/R solutions, till there will be a working tool for stats of vcf with structural variants.
vcftools
has a sibling calledbcftools
, which has a query function, that allows users to query a VCF/BCF to pull out fields and information and output their own format. It might not do exactly what you want, but might get you very close (enough to maybe just need a little post-processing in R?). – Sam Nicholls May 28 '17 at 22:05