A first go at using Docker for data science

So excited! I have been listening to the rumors about Docker, and how awesome it is. So I thought I would give it a go. It was so easy! Here are some screenshots of my first afternoon, playing with Docker.


Beginning with Docker’s very own “Getting started guide“. It took roughly 5 minutes to be up and running, and here is my slightly smarter whale:



I live and breathe R. So, can I get an RStudio image up and running? Quickly jumped onto Docker Hub and searched for “Rstudio”:


Downloaded the rocker/rstudio image, fired it up and started to play. Below I have loaded the iris dataset, used k-NearestNeighbours to predict the species and plotted the predicted species alongside the uncertainties in the predictions:



The next steps…

(1) to build my own docker image with some of the useful command line and genomics tools that we rely on.

(2) figure out whether I can get different docker containers to communicate efficiently. If I can get efficient communication, then I should be able to build pipelines where each component is isolated and contained. But I wonder whether I wouldn’t be better to build a single self-sufficient container that can execute a whole (or most) of a pipeline?

Useful docker commands:

Start a docker image:

docker run

Show docker images:

docker images

Show running containers:

docker ps

Stop a docker container:

docker stop <container id>

Docker cheat sheet:







Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: