Exploring & Extending IBM’s SlamTracker

By nickburns2013 / February 6, 2015 / AOStats, ML in Practice, Small Steps to Big Data / Leave a comment

The 2015 Australian Open was a fantastic event! From the Australian perspective, it was encouraging to see so many Australian players winning deep into the event. And Djokovic was a class act, he appeared to be in complete control, dug deep through the critical moments and stepped up an aggressive style to close out the tournament. But the best thing was that every shot, game, set and match was recorded by the IBM SlamTracker.

(source: http://www.ibm.com)

In this series of posts we extend on the basic IBM Slamtracker and explore the data with some basic data analysis and data mining using Python and AzureML. As we go, the code will be publicly available via IPython notebooks on GitHub: https://github.com/nickb-/AOStats .

PART 1: Data Scraping

(source: http://www.firstdistribution.co.za/big-data-and-the-australian-open-2014/)

The Australian Open match statistics are publicly available via the AO Website: http://www.ausopen.com/en_AU/scores/index.html

IBM’s Slamtracker displays a range of summary statistics for each match, and the keys to the match. In this first part, we scrape the match statistics from the AO website using Python (requests, pattern.web)

PART 2: Data Cleaning & Exploratory Data Analysis (EDA)

(source: IBM via Twiter: https://twitter.com/ibm/status/557233565398994944)

In Part 1 we scraped the match statistics the AO Website. However, we made no effort to clean the raw data. In this part we clean the raw data and populate a Pandas DataFrame for further analysis.

To do this effectively we need to normalise the data, parsing aggregate statistics (such as 1st Serves In) into useful atomic measures. For example:

raw data:

1st Serves In

“81/136 (60 %)”

normalise to:

total_first_serves	136
first_serve_faults	55
first_serve_percentage	60

Once the data has been normalised, we will use Python’s Pandas library to store the statistics and perform an initial EDA.

PART 3: Data Mining

Hopefully, the EDA from Part 2 will have exposed some interesting trends / patterns or questions for further analysis. In Part 3 we will explore the data further using AzureML.

While we can’t really define this part until after Part 2, we will hopefully be able to identify styles of play, or typical measures that indicate success deep into the Australian Open vs. early round losses.

Future Work & General Hypothesis

Long term, it would be very interesting to collect similar statistics across a large number of professional tennis tournaments with a view to being able to predict or forecast a player’s success.

We would expect players such as Djokovic and Nadal to express quite dominant statistical patterns.
Players like Sam Stosur (whose success rate is highly inconsistent, and therefore statistically very interesting) should be interesting to watch.
We imagine that young players, like Nick Kygrios and Borna Coric, would be particularly interesting to watch and probably difficult to forecast.

Follow the journey at GitHub: https://github.com/nickb-/AOStats. Or follow this blog series via the AOStats tag.

Follow Blog via Email

Categories

Exploring & Extending IBM’s SlamTracker

Leave a comment Cancel reply

Follow Blog via Email

Categories

Exploring & Extending IBM’s SlamTracker

Share this:

Related

Leave a comment Cancel reply