Trial & Error – creating my first R package

 

Creating my very first R package has been an interesting, frustrating and challenging experience. Overall, though the high level of documentation (largely from Hadley Wickham and co.) helped me overcome the frustrations and the challenges without too much pain. This blog is simply to note down some of the issues I had a (hopefully) provide myself with a quick reference for the future.

The Motivation

I am juggling a number of different projects at the moment, all of which have a common theme: visualising and exploring patterns in interesting areas of the human genome. Functions that I wrote 2 months ago have been reproduced 3 times over in different projects. Over time they have slowly evolved and I am struggling to maintain all the different versions of them. And so I decided to make an R package for the core elements.

Wrapping critical functions up in an R package is a great idea. It means there is only one code-base to maintain. All of my projects will share the same features. And I can share it with the rest of the lab for testing, continued development and improvements.

The Difficult Parts

Hilary Parker’s ‘cats’ example is probably one of the best “here’s the basics, and look how easy they are” blog out there. And it really is that easy. At least… it is if all you want to do is put a simple function in a package. Despite what everyone says, making an R package is not a straight forward process.

Despite what everyone says, making an R package is not a straight forward process. This is because it involves a lot more than simply wrapping up your own R functions. Keep in mind, that you are primarily creating a package so that you can share the package with other people (this includes yourself in 3 months time :p ). So you have to put a lot of thought into how you abstract features of your program and how you can smooth the interface between users and the package.

Here are the things that I battled with the most, and which I will make notes about below:

  1. including executable bash scripts
  2. including external reference data files (NOT Rdata files, but text files that will be used by the bash scripts mentioned above)
  3. accessing the functions from GLIDA after I had built and loaded it (thankfully a very easy fix)
  4. getting namespaces right! Oh my – this will forever change the way I write my R code.
  5. Error: error in fetch(key). Lazy-database ‘…glida.rdb’ is corrupt. (which it turned out was related to (3))

 

Handy references

The documentation that surrounds the art of R packages is really fantastic. In particular: Hadley Wickham’s superb guide to R packages

I would recommend having a quick read of Wickham’s book before you begin. It is the most ‘usable’ & ‘practical’ reference out there. And it covers nearly EVERYTHING – it even covers the bits that I didn’t think it did, but which I glossed over in my first quick skim. The moral of the story then is to read it once and then keep going back to it every time you have a problem.

A slightly more detailed reference is here: Writing R extensions. This covers everything. But it isn’t as easy-to-read as Wickham’s book.

 

Problem 1: Including bash scripts

The official solution here is to create a folder, inst/bash. And so I have a directory structure like the following: package/inst/bash, which contains all my bash scripts. This is different to the exec/ folder which Wickham promotes in his book.

Note, that the inst/* directory structure resulted in my bash scripts being executable – which is exactly what I wanted. So this is good.

Wickham talks about the inst/* structure here . And it is also described in the R extensions doco here.

What neither of these do is explain how to call the bash scripts from within your package functions. However, this is sort of explained in the “Raw Data” section of Wickham’s book here. But to summarise:

You can access / call bash scripts (and access additional data files) using the system.file() function:


# syntax: system.file(, , )
myBashScript <- system.file("bash", "bashScriptOne.sh", package = "glida")

As you can imagine, the system.file() function will search for the file within the package directories. Nice and easy (once you know how).

 

Including additional reference data sets

Adding data files, which are used as datasets within your functions is well explained in Wickham’s section on data. You can save() datasets as Rdata files, and you can load them with the data() function.

But what about other data files? For example, the bash scripts I have use a couple of external data files to filter 1000 Genomes data by nationality. I don’t need to read these into R, and if I did, I would have to completely redo the logic around the bash scripts – which comes with some complications (invoking PLINK for example).

The answer is exactly the same as it was for calling bash scripts. Create a directory: inst/extdata/ and place the files in here. Then to access them:


system.file("extdata", "myDataFile.panel", package="glida")

 

Accessing functions from my package

This is just a rookie mistake. But it annoyed me for a while, so I will make a note.

Naturally, I have been using RStudio to develop this package. All my code exists in a local directory: ~/Documents/GitHub/GLIDA/. When I installed it, it was installed under my library directory. But when I loaded it (with ~/Documents/GitHub/ as my working directory), it loaded the local path rather than the library path. Let me show you:


setwd("~/Documents/GitHub")

# --------------------------------------------------------
#
# Method 1: The normal way
#
library(glida)
find.package()
# [1] "/home/nickb/R/x86_64-pc-linux-gnu-library/3.2/glida"
# ...

detach(package:glida, unload=TRUE)

# --------------------------------------------------------
#
# Method 2: Used for testing
#
library(devtools)
devtools::install("glida")
find.package()
# [1] "/home/nickb/Documents/GitHub/glida"
# ...

This doesn’t really matter. But, when I was testing my package using Method 2 I was able to access all of glida’s functions easy enough:


?glida::ldRead
# -> loads the help file :)

And so everything looked perfectly fine. However, I found that once I installed the package properly and used Method 1:


library(glida)

?glida::ldRead
# Error: 'ldRead' is not an exported object from namespace:glida"

# ----------------------------------------------------------------
#
# but this works...
#
?glida:::ldRead
# -> loads the help file :)

So the usual ‘::’ didn’t work, but ‘:::’ did. Very odd.

The fix is very very simple – you have to mark functions using the ROxygen2 @export comment if you want them to be accessible in the usual fashion:


#' This is a great function
#'
#' @param awesome boolean DEFAULT: TRUE
#' @return something_cool character string.
#' @export
great_function <- function(awesome = TRUE) {
something_cool <- if (awesome == TRUE) "This is someting cool"
else "Not so much."
return (something_cool)
}

I was very happy that this was such an easy fix. And I just have to learn a bit more about namespaces in R.

 

Getting namespaces right

Have you ever noticed that R scripts typically make very poor use of namespaces? And so, I too have fallen into this pattern. Which is slightly ironic, since The Zen of Python states that: “Namespaces are one honking great idea — let’s do more of those!”.

So from now on, I will use them! Everywhere. Everytime. Because once again, The Zen of Python teaches us that “Explicit is better than implicit”.

 

Error in fetch(key) lazy-load database ‘glida.rdb’ is corrupt

I had an interesting error, which was actually related to the previous 2 problems. There was one function which I hadn’t exported and this caused extreme grief for RStudio’s autocomplete. Every time I started to use a similarly-named function (i.e. there are about 6 functions which begin with “ld*”), the autocomplete dialog would try to load, but it would crash with the following error:


# error corrupt glida.rdb help file.
# Error in fetch(key)
# lazy-load database 'glida.rdb' is corrupt

Simply fixed by adding an @export comment to this function, which put this function in the known namespace of glida, and hey presto all was good.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: