Exploring & Extending IBM’s SlamTracker

The 2015 Australian Open was a fantastic event! From the Australian perspective, it was encouraging to see so many Australian players winning deep into the event. And Djokovic was a class act, he appeared to be in complete control, dug deep through the critical moments and stepped up an aggressive style to close out the tournament. But the best thing was that every shot, game, set and match was recorded by the IBM SlamTracker.

In this series of posts we extend on the basic IBM Slamtracker and explore the data with some basic data analysis and data mining using Python and AzureML. As we go, the code will be publicly available via IPython notebooks on GitHub: https://github.com/nickb-/AOStats .

PART 1: Data Scraping

The Australian Open match statistics are publicly available via the AO Website: http://www.ausopen.com/en_AU/scores/index.html

IBM’s Slamtracker displays a range of summary statistics for each match, and the keys to the match. In this first part, we scrape the match statistics from the AO website using Python (requests, pattern.web)

PART 2: Data Cleaning & Exploratory Data Analysis (EDA)

In Part 1 we scraped the match statistics the AO Website. However, we made no effort to clean the raw data. In this part we clean the raw data and populate a Pandas DataFrame for further analysis.

To do this effectively we need to normalise the data, parsing aggregate statistics (such as 1st Serves In) into useful atomic measures. For example:

raw data:

1st Serves In “81/136 (60 %)”

normalise to:

total_first_serves 136
first_serve_faults 55
first_serve_percentage 60

Once the data has been normalised, we will use Python’s Pandas library to store the statistics and perform an initial EDA.

PART 3: Data Mining
Hopefully, the EDA from Part 2 will have exposed some interesting trends / patterns or questions for further analysis. In Part 3 we will explore the data further using AzureML.

While we can’t really define this part until after Part 2, we will hopefully be able to identify styles of play, or typical measures that indicate success deep into the Australian Open vs. early round losses.

Future Work & General Hypothesis

Long term, it would be very interesting to collect similar statistics across a large number of professional tennis tournaments with a view to being able to predict or forecast a player’s success.

  • We would expect players such as Djokovic and Nadal to express quite dominant statistical patterns.
  • Players like Sam Stosur (whose success rate is highly inconsistent, and therefore statistically very interesting) should be interesting to watch.
  • We imagine that young players, like Nick Kygrios and Borna Coric, would be particularly interesting to watch and probably difficult to forecast.

Follow the journey at GitHub: https://github.com/nickb-/AOStats. Or follow this blog series via the AOStats tag.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: