Thesis: Beginning to read…


The rapid ‘datafication’ of our world has led to some of the biggest advances in science and business, and yet future progress is severely hampered by our still developing understanding of the science of data (Ahalt et al, 2014). Given the need for data science, it is somewhat ironic therefore that finding a project and defining my core research question has been such a challenge! But I believe this is because data science is necessarily interdisciplinary and does not exist in an area of its own. This means that as thesis students, we need to look beyond the boundaries of information science and statistics and find specific applications for our skills.

Of all the potential research areas, genomics seems to best exemplify the challenges of big data. Genomics has all the hallmarks of a great research area with significant challenges and opportunities in data collection, management, exploration and analysis as well as broader ethical issues related to privacy (Stephens et al, 2015).  So I feel very fortunate to be working alongside Assoc. Prof. Mik Black and Assoc. Prof. Tony Merriman who are both heavily involved in genomics. However, finding a supervisor is only the first hurdle – I know have to define my own research question.

Coming up with a research question has been harder than I thought it would be. Mainly, I have been held back by my own lack of experience in genetics and genomics. I have been on a steep learning curve to try to bridge this gap and get enough of an overview to be able to see where best to direct my research. This has meant getting to know Tony’s research into gout and where this fits into the bigger context of genomics in general. Unfortunately Mik has been incredibly busy, with 8 students under his wing, sitting on multiple advisory boards and being away at a conference in Queensland (poor guy – what a hardship!) – I haven’t been able to rely too heavily on him to just hand me a problem 🙂 Of course this is a good thing, I have had to roll up my sleeves and read.

There is simply no substitute for reading. A lot of research students groan when it comes to the inevitable literature review. But it is so essential. The readings can be tough initially, especially in a totally new area. There is the jargon, which while it is familiar to people who have spent years in the field, it is a significant hurdle in those first few weeks (months?). And with so many publications, each with their own application to the field, the sheer scope of a subject area can be a lot to wrap your head around. But fair to say it all becomes a little clearer eventually. And among all that reading there will be one little niggling problem that really grabs your attention and that you think you can help contribute to. This is when it gets exciting.

So my research area is in genomics and more specifically, the underlying genetic basis for gout. This is fundamentally what Tony Merriman’s lab investigate. Over the next few posts I will summarise some of the readings I have done and slowly build up context around my specific research question. Finally, I will describe my research in more detail and how I hope it will add to some recent research into machine learning and advanced analytical techniques in genomics.


Ahalt, S., Bizon, C., Evans, J., Erlich, Y., Ginsberg, G., Krishnamurthy, A., … & Wilhelmsen, K. (2014). Data to Discovery: Genomes to Health. A White Paper from the National Consortium for Data Science. RENCI, University of North Carolina at Chapel Hill.

Stephens, Z. D., Lee, S. Y., Faghri, F., Campbell, R. H., Zhai, C., Efron, M. J., … & Robinson, G. E. (2015). Big Data: Astronomical or Genomical?. PLoS Biol, 13(7), e1002195.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: