Category Archives: Science@Work

Logistic regression: moving from significant results to meaningful insight

Association studies are a favourite tool for geneticists to understand the genetics that determine our health. It is simply routine now to test every mutation amongst your patients for association with a given trait (for example BMI or breast cancer).

Logistic regression: moving from significant results to meaningful insight

Association studies are a favourite tool for geneticists to understand the genetics that determine our health. It is simply routine now to test every mutation amongst your patients for association with a given trait (for example BMI or breast cancer).

SQL-like power with R’s data.table package

I had an interesting little problem today, that involved extracting data from one table based on information from another table. In SQL-speak, it was a full cross join with group by and a HAVING clause. It is a job that

SQL-like power with R’s data.table package

I had an interesting little problem today, that involved extracting data from one table based on information from another table. In SQL-speak, it was a full cross join with group by and a HAVING clause. It is a job that

Clustering Gene Expression Data: adding layers to a heatmap

So you have some data, perhaps a lot of data, but you’re not quite sure what to do with it… where do you start? Is it interesting data? Would it tell a good story, or just be a confusing mess?  This

Clustering Gene Expression Data: adding layers to a heatmap

So you have some data, perhaps a lot of data, but you’re not quite sure what to do with it… where do you start? Is it interesting data? Would it tell a good story, or just be a confusing mess?  This

Trial & Error – creating my first R package

  Creating my very first R package has been an interesting, frustrating and challenging experience. Overall, though the high level of documentation (largely from Hadley Wickham and co.) helped me overcome the frustrations and the challenges without too much pain.

Trial & Error – creating my first R package

  Creating my very first R package has been an interesting, frustrating and challenging experience. Overall, though the high level of documentation (largely from Hadley Wickham and co.) helped me overcome the frustrations and the challenges without too much pain.

Data warehousing & breaking the rules of a star schema

I would really appreciate some feedback on this. If you have a few minutes to spare and don’t mind sharing your thoughts – then I would like to hear from you. The Schema We are creating a data warehouse to store

Data warehousing & breaking the rules of a star schema

I would really appreciate some feedback on this. If you have a few minutes to spare and don’t mind sharing your thoughts – then I would like to hear from you. The Schema We are creating a data warehouse to store

SQL vs. BioMart for querying the human genome

A huge part of my job is to add context and build layers of information on top of the genetic mutation datasets that we have amongst our groups. If we want to understand the importance of genetic mutations on human

SQL vs. BioMart for querying the human genome

A huge part of my job is to add context and build layers of information on top of the genetic mutation datasets that we have amongst our groups. If we want to understand the importance of genetic mutations on human

Databases for finding human protein-coding genes

After approx. 4 months with the Merriman group, I am beginning to get a handle on how they operate and the typical questions that they are interested in. At a very high level, a typical workflow might involve finding interesting

Databases for finding human protein-coding genes

After approx. 4 months with the Merriman group, I am beginning to get a handle on how they operate and the typical questions that they are interested in. At a very high level, a typical workflow might involve finding interesting

Debugging GenomeSIMLA

Simulated datasets get a hard time in the real world, as it is difficult to build a simulation which accurately captures the range of values and “dirtiness” of real data. However, simulated sets cannot be beaten when testing out new

Debugging GenomeSIMLA

Simulated datasets get a hard time in the real world, as it is difficult to build a simulation which accurately captures the range of values and “dirtiness” of real data. However, simulated sets cannot be beaten when testing out new

Statisticians are a gloomy bunch!

It’s a really exciting week this week, as we have ResBaz here in Dunedin! I didn’t know what I was signing up for, but it has been a great opportunity to meet a whole lot of really interesting people. I

Statisticians are a gloomy bunch!

It’s a really exciting week this week, as we have ResBaz here in Dunedin! I didn’t know what I was signing up for, but it has been a great opportunity to meet a whole lot of really interesting people. I

Data Analytics are not the answer

Data analytics is now so ubiquitous that it is a requirement for success in a fiercely competitive global economy. Unfortunately, there is so much hype around analytics that people often expect that analytics are the answer, that it can somehow

Data Analytics are not the answer

Data analytics is now so ubiquitous that it is a requirement for success in a fiercely competitive global economy. Unfortunately, there is so much hype around analytics that people often expect that analytics are the answer, that it can somehow