What is R?
First off, let’s look at R itself. R is a programming language. It was, and is, designed specifically for data analysis. It allows you to manipulate, calculate and graph data. It allows you model statistics and save your results to many standard file types, e.g. PDF, Jpeg, Metadata, etc.
It’s completely open source, i.e. free, unlike equivalent statistical packages, e.g. SAS. There’s also a thriving R community to support your use, and expand what’s possible in R. 1
Where to start with R?
The R Graphics Cookbook, or it’s website (http://www.cookbook-r.com/), was also useful starting off.
However, I found the only way to get going was to choose some data, download R and try it out.
Stackoverflow.com, and many other R advice websites, were hugely beneficial in helping me out if I got stuck. The main thing I discovered is that there is never just one way to do something in R, so keep trying and learning!
Let’s try it!
As I write this blog (05/08/2015), Serena Williams has recently won Wimbledon again. The US Open is coming up. The question is being asked in numerous media & blogs, who is the greatest Female Tennis Player of all time? The consensus seemed to be it was between Serena Williams and Steffi Graf, but are they right? 2 3 4
Generally, we can’t judge people of a different era together. They may not have had the same opportunities to travel and compete, or maybe their competitors were not as good as other eras. However, we have win records; maybe the statistics can help throw light on the argument? Or could it muddy the waters further? Let’s see what R can tell us…
To start, I need data. I searched online, and came up with the names of 11 female tennis players who regularly appear in the ‘Top 10 Best Female Players‘ type posts. Using that list, I went to the WTA (Women’s Tennis Association) website, and retrieved player stats for the players I’d chosen. These were then saved to a CSV file, and loaded into R.
What can we tell from the data?
How to we classify “the Best”?
Let’s look first at who won the most Singles Titles. These are generally considered the benchmark for the best players. So, who won the most?
Margaret Court won 24, ahead of all the competition. Case closed, correct? The data is irrefutable?! However, she won the majority of her titles at the Australian Open in an era when the majority of her competition didn’t travel that far. So, if the best of her competition was not even playing, can they still hold up to the current standard? Possibly not. So, what else can we look at?
Total Prize Winnings
In this age, who won the most prize money could be considered a true marker of greatness. In Golf they even have the ‘Race to Dubai’ every year, which is purely based on Prize money during the year. Let’s see if that gives us a true answer.
Serena Williams, outright winner. Which, let’s face it, is another unfair chart. The prize winnings on offer were nowhere near current trends when Billie Jean King was playing. Even in the era of Steffi Graf, Female tennis winnings were not on a par with Male winnings. The only legitimate comparison we could do in this case is Serena in comparison to her sister, Venus, as they are competing in the same era prize money wise.
Career Win Stats
Another line of comparison is career win stats for each player. That is their Win/Loss record expressed as a percentage.
Which as you can see proves Venus Williams is the worst player of the list! Obviously, the statistics don’t lie. But … How can we stand over that? Especially when Venus has won more slams & more prize money that Martina Hingis – her closest competitor. It doesn’t seem right. Maybe there is no “best” player in this case.
However, let’s have one more stab at this.
All round player
As it currently stands, fewer and fewer players are competing in Doubles matches as well as the Singles competitions. 5 There are a few reasons for this, less prize money, concentrating effort on the Singles prizes, and a different skillset required to play. There is certainly more Serve & Volley skills required in Doubles than in the average Singles match. There is an argument which says to be the best all round tennis player you should be winning both Singles & Doubles matches. Have we a player who stands out for both?
This graph groups the slams wins by player:
Finally, this graph shows the slams stacked for each player:
So, Martina Navratilova is the winner for Best Player! Serena Williams coming up a close second and as she is still playing she could still reach the top. I’m happy with that result, but then that shows my personal preferences.
As you can see depending on how we ask, we get a different answer. The phrase “Lies, Damn Lies and statistics” comes to mind. Let’s look at a summary of the players:
In this case, there is no correct answer with the data I’ve entered. Martina Navratilova comes out tops in more categories than Serena Williams, with Chris Evert and Margaret Court coming up next behind them. Surprisingly, Steffi Graf is a little behind. However, that could say more about the fact she gave up Tennis at a relatively young age, or the quality of her opposition, but who’s to say. There are alternative possible means of getting an answer in this case, however, I won’t be continuing with this analytics further.
1. You could look at total career titles, not just the slams. This would be over their entire career, and not just the headline grabbing main competitions.
2. You could look at the players whole career, rank the quality of their opposition, and using the resulting quality scores, analyse who were more successful.
3. You could look at match stats, such as unforced errors, serving stats, etc.
I may not have confirmed who the best ever female tennis player was, but I acquired a good understanding of a subsection of R. The TryR course was a good starting point, but I didn’t feel very confident with my knowledge immediately afterwards. As with most programming languages, actually working with real world data makes it easier to learn. In addition, you gain from working through the frustration of figuring out something that won’t work. The community sharing help for R makes it even easier, as long as you put the work in.
I feel I’ve only scraped the surface of what is possible in R. It’s worth considering other R courses, or available training online to advance the knowledge. For example, There is a free R Programming course from John Hopkins on Coursera to learn more of the options available within the environment. Interesting assignment. Thank you.