Using bcftools to merge and filter VCF files

GOALS

  1. to merge genotype calls from separate VCF files (e.g. one VCF file per sample) into one master VCF file with a column for each sample.
  2. and filter this master VCF file and extract regions of interest

EDIT: have edited this to include workflow using conda’s BCFTools within my GWrangle Docker image (see bottom of post).

Input files

For this example, I have 7 VCF files – one for each sample. All of these files are identical, except for the genotype columns (i.e. the CHROM, POS, RSID, REF etc. columns are all the same).

Note that the VCF files have been compressed using bgzip, and indexed using tabix:

$ bgzip *.vcf
$ parallel tabix -p vcf ::: *vcf.gz

Merging genotypes

Here we will create a master VCF file which contains the genotype call for each sample is separate columns. There are a number of tools for this (vcfintersect, vcftools, vcf-merge, bcftools), we are going to use bcftools as it is faster than most of the older toolsets.

First, we will create a tab-separated file with the regions that we want to extract:

19 10000 50000
19 100000 150000
19 200000 250000
19 300000 350000
19 400000 450000

We have saved this file as regions.bed.

And now, it is a simple command to a) extract our regions and b) merge all the input vcf files:

$ bcftools merge -R regions.bed *vcf.gz > combined_regions_genotypes.vcf

Docker, gwrangle, bioconda bcftools

I have created a docker image which contains a whole lot of tools for the wrangling, manipulation and analysis of genomic datasets. One of these tools is bioconda’s bcftools. To install this (from a root session of the image):

# conda install -c bioconda bcftools=1.3.1

Then, to index a VCF file:

$ bcftools index <vcf_file>

And you can then subset the file as previous:

$ bcftools view -r <chr:postion> <vcf_file>

Nice and easy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: