Clustering Players For Betting

Head-to-head records are an interesting topic when it comes to betting. You will often see people mention the H2h when they are presenting their case as to why a selection is value. For example, 'Player X has a 2-0 head-to-head record against Player Y, therefore they are value'. However, many other people will tend to disregard them, believing that the sample sizes are far too small to draw any meaningful conclusions.

In the majority of cases, I would agree with the latter view. Head-to-head records of just two or three matches are relatively useless in terms of trying to actually determine if there is any tactical or stylistic issues for one of the players in the match. Admittedly, H2h records such as Pennetta's 7-0 against Stosur or Tomas Berdych's 12-0 against Kevin Anderson probably do suggest there is something there, even if it is purely mental after so many consecutive defeats.

Ideally, we would be able to see the outcome if two players played against each other ten times. Or even better, fifty or one hundred or even one thousand times. However, this is clearly nothing more than a pipe dream. However, what if we could group similar players together and look at H2h records between the two groups? Might this act as a way of determining what types of players thrive against other types of players?

Can H2h records for similar players like Sara Errani and Monica Niculescu be combined?

So, the first thing to do is group players. Rather than doing it manually, which would contain inherent biases, I used a process called K-mean clustering to group every WTA player that has played at least 25 matches on hard courts over the past two seasons based on a number of statistics, including aces per service point, points won on first and second serve and points won on first and second return of serve. The 93 players were clustered into 15 groups.

The first thing to look at is whether the groups make sense. For example, if we found Serena Williams grouped with Annika Beck and Sorana Cirstea, we might conclude that the groupings are nonsense.

However, this was not the case. Looking at certain groups, we find Sara Errani, Monica Niculescu and Annika Beck forming one group, which makes sense given their weaknesses on serve and strong return games. Another group sees Coco Vandeweghe, Julia Goerges, Karolina Pliskova, Lucie Safarova, Madison Keys, Petra Kvitova and Samantha Stosur in a group that have strong serves, both first and second, but who particularly struggle returning opponent's first serves. Interestingly, I expected Serena Williams to be in a group of her own given her statistical dominance, but she is actually grouped with her sister, Venus. For those that are interested, the full list of groups is at the end of the post.

Now we are relatively content that the groups are reasonably accurate, we can start to look at how they might perform against each other. I will look at four pairings in particular, which seemed likely to throw up some interesting results.

The first is Group 12, containing Annika Beck, Monica Niculescu and Sara Errani, against Group 2, containing Coco Vandeweghe, Julia Goerges, Karolina Pliskova, Lucie Safarova, Madison Keys, Petra Kvitova and Samantha Stosur. Here, we have a group that is bottom of the averages for both points won on first and second serve, but who top the averages for first serve return points won and are close to the top of second serve return points won against top level servers, but who struggle on return (particularly first service returns).

Looking at all the matches in the past three seasons between the two groups, we find that there have been 20 completed matches. Group 12 have won 6 of them, while Group 2 have won 14. Now, given the names in Group 2, we would expect them to have won more matches. With the likes of Safarova, Kvitova, Pliskova and Stosur, who have all spent time inside the Top 10, we would have expected to see a fairly one-sided record and many of them would have been short-priced favourites.

Having said that, we find that of those 20 matches, only 10 of them were completed in straight sets, while 10 of them went the distance. Certainly with multiple short-priced favourites in these matches, the odds on the Group 12 player on the +1.5 handicap might have been rather tempting. Looking at the game handicaps, we find that the Group 12 players went 11-8-1 on the handicap. Still small samples admittedly, but it is a promising start for the Group 12 players.

Next up, let us look at Group 6 against Group 4. Group 6 includes Alize Cornet, Tsvetana Pironkova and Victoria Azarenka as a slightly unexpected trio with decent first serve stats, poor second serve, good first serve return stats and excellent second serve return stats. Group 4 contains the rather uninspiring selection of Bojana Jovanovski, Donna Vekic, Elena Vesnina, Lauren Davis, Misaki Doi, Shelby Rogers and Sorana Cirstea.

Again, we might expect Group 6 to dominate the H2h record here, particularly with the presence of former world number one, Victoria Azarenka. The results back this up - Group 6 leads the H2h against Group 4 12-3. Impressive, but with plenty of short-odds favourites, maybe nothing to shout about. However, if we look at the performance against the handicap, we find that Group 6 has gone 11-3-1 against the Pinnacle games handicap that was closest to 50-50. In other words, if you had backed the Group 6 player in all 15 matches, you would have won on the games handicap in 11 of those and got your money back in one. That is quite an impressive return.

Finally, we shall focus on Group 10, which contains Angelique Kerber, Dominika Cibulkova, Eugenie Bouchard and Li Na. Against Group 4 from earlier, this quartet is 9-5 in the H2h, but also an impressive 9-5 against the handicap with no fewer than 11 of the meetings being concluded in straight sets. Against Group 6, we find that they are an identical 9-5 H2h record and 9-5 against the handicap again.

The results are summarised below:

Group A Group B Matches Won Lost A Covered A Failed Push 2 Sets 3 Sets
12 2 20 6 14 11 8 1 10 10
6 4 15 12 3 11 3 1 12 3
10 4 14 9 5 9 5 0 11 3
10 6 14 9 5 9 5 0 10 4

There are obviously many other combinations that we could look at, but merely by focusing on those four combinations, we find that they would have gone 40-21-2 against the closest Pinnacle line to 50-50 on the games handicap or a 63.5% winning record. By grouping players by the most basic of stats, we can find certain types of player that thrive against other particular types of players, which could give us an advantage in the future.

Summary of Groups

Group 1
Agnieszka Radwanska, Ana Ivanovic, Camila Giorgi, Caroline Wozniacki, Ekaterina Makarova, Garbine Muguruza, Jelena Jankovic, Maria Sharapova and Simona Halep

Group 2
Coco Vandeweghe, Julia Goerges, Karolina Pliskova, Lucie Safarova, Madison Keys, Petra Kvitova and Samantha Stosur

Group 3
Alexandra Dulgheru, Alison Riske, Jana Cepelova, Kiki Bertens, Kimiko Date Krumm, Shahar Peer

Group 4
Bojana Jovanovski, Donna Vekic, Elena Vesnina, Lauren Davis, Misaki Doi, Shelby Rogers and Sorana Cirstea

Group 5
Alison van Uytvanck, Caroline Garcia, Irina-Camelia Begu, Magdalena Rybarikova, Polona Hercog and Silvia Soler-Espinosa

Group 6
Alize Cornet, Tsvetana Pironkova and Victoria Azarenka

Group 7
Serena Williams and Venus Williams

Group 8
Ajla Tomljanovic, Anastasia Pavlyuchenkova, Francesca Schiavone, Jarmila Wolfe, Kaia Kanepi, Kristina Mladenovic, Marina Erakovic, Mirjana Lucic-Baroni, Mona Barthel, Monica Puig and Sabine Lisicki

Group 9
Andrea Petkovic, Carla Suarez Navarro, Johanna Larsson, Kurumi Nara, Saisai Zheng, Timea Bacsinszky and Varvara Lepchenko

Group 10
Angelique Kerber, Dominika Cibulkova, Eugenie Bouchard and Li Na

Group 11
Anna Schmiedlova, Klara Koukalova, Shuai Zhang, Urszula Radwanska and Zarina Diyas

Group 12
Annika Beck, Monica Niculescu and Sara Errani

Group 13
Barbora Zahlavova, Daria Gavrilova, Lesia Tsurenko and Madison Brengle

Group 14
Belinda Bencic, Casey Dellacqua, Elina Svitolina, Flavia Pennetta, Roberta Vinci, Shuai Peng, Sloane Stephens and Svetlana Kuznetsova

Group 15
Bethanie Mattek-Sands, Christina McHale, Daniela Hantuchova, Heather Watson, Karin Knapp, Katerina Siniakova, Kirsten Flipkens, Stefanie Voegele, Vera Zvonareva, Yanina Wickmayer and Yaroslava Shvedova

