Predicting the Brownlow is hard.
Since 2016, Fat Stats has been building statistical models with the aim of predicting the Brownlow medal. Every year we have tweaked the models and datasets with the aim of emulating the decisions of the umpires on game day and (hopefully) improving our models. This is a hard task - every year we review our results and something different we haven’t thought about pops up. We thought we would run through 5 things which make this particular prediction difficult. All of the plots shown are generated in the chaRlie app here: https://fatstats.shinyapps.io/charlie2019/
Umpires are humans too
Most umpires must love footy, otherwise they wouldn’t put up with the crap that’s thrown at them every weekend. You know what they say - love is blind. Like all of us, umpires are going to have things they like in the way a player plays and things they don’t care much about. My favourite thing in footy is a full forward being hit on the chest by a 40m dart on a full speed lead. Other people like big tackles, a defender blanketing someone or in the case of GWS supporters - unnecessary contact to the face and eyes of star opposition players. Try as they might, these biases are impossible to fully suppress on game day. Brownlow votes are allocated based on the collective decision of the umpires on the day, without seeing statistics. Every game has different configurations of umpires, in different moods with different likes and dislikes - this creates a lot of variance!
One famous example of an umpire WTF moment occurred last year in round 2 when Marley Williams managed to poll 3 votes in a game where he gathered 14 disposals, three marks, one clearance, three clangers and one free kick. Not surprisingly, the model gave him a 0% chance of a vote at all, similarly for Jake Carlisle who also managed to scrape 1. In fact, the model was very clear and confident on where it thought the votes should go! Shaun Higgins and Ben Brown probably feel pretty hard done by for this game.
The times they are-a-changin’
Football evolves over time, and what the umpires deem important when handing out votes is not an exception. The way Richmond won its flag in 2017 was very different to the Bulldogs the year before, and even more so than West Coast and Sydney in 2006. Like opposition coaches and players, the umpires react to these trends in their Brownlow vote allocation.
The above plots shows two things - firstly that players get 35 possessions or more much more frequently post 2005 than they did in the 90s (Greg Williams influenced 1990 - 1992 aside). Last year, 102 players achieved the 35 disposal mark in regular season, with Tom Mitchell accounting for 10 of those.
The bottom plot shows that the umpires rewarded this feat for a while, but since a peak of 60% in 2013 (Gary Ablett driven) the percentage of 35 vote games that get three votes has dropped to almost 1 in 3 (bottom plot y axis is percentage). There a few potential reasons for this - change in game plan, a focus on efficiency, Gazza getting old, multiple players in a game getting 35 touches etc etc but when you use the previous 3-5 years of data to build a machine learning model it may not take into account these changes, leading to potential over-predicting on high disposal games. This has definitely happened for chaRlie in recent years. Certain players can even influence the models - if ruckmen who poll votes are present in the dataset, then the model is much more likely to predict votes for ruckmen.. if not, the opposite can occur. Balancing the dataset is key to a good prediction.
In Data We Trust
Not every component of an AFL game can be captured by data, far from it at the moment. When Jeremy Howe stands on Aaron Sandiland’s head for a mark, the degree of difficulty is not taken into account in statistics - he is allocated a mark and a contested mark, and potentially an intercept mark. The same result he would get if he had bullied Caleb Daniel one on one on the wing. Currently the timing of that mark is also not taken into account. Not that they voted on the game, but you know that if the umpires had walked into the rooms after the 2005 grand final with Leo Barry’s mark fresh in their minds, they might be a lot more likely to give him votes - the statistics as they stand will not capture the fact that the mark decided the game.
Research by Michael Bailey and later written about in the great Footballistics book showed that players with bald heads or tattoos are up to 2 times more likely to receive votes than their more nondescript team mates. Not many people can be bothered creating that dataset and as such the models suffer from this measurable bias in the umpires attention. Data is steadily improving, with the AFL/Champion Data slowly releasing more and more for public consumption. This will lead to improved models and insights into the game, but it will never capture never capture every nuance of the complex and evolving game of AFL.
“Whoever controls the media, controls the mind” - Jim Morrison
The above plot is the google trends results for the previous month (leading up to the Preliminary final weekend 2019). It shows that Toby Greene has had up to 75-100 times the google hits relative to Brodie Grundy, who also appears to be always in the news at the moment. It is impossible for this saturation to not have an effect on umpires, whether it be positively or negatively. We won’t know for sure, being finals, but it’s very unlikely that umpires are going to look at Toby positively next time he plays.
One famous example of this pre-existing bias due to a media situation is James Hird in 2004. After publicly criticizing umpire Scott McLaren (rightfully) on the footy show after the round 2 debacle, Hird was fined $20,000 and agreed to umpire promotion. The next round, the umpires showed they hold a grudge by awarding Hird zero votes after one of the greatest last quarters of all time against West Coast. Hird got 34 disposals, 14 in the last quarter, 20 contested and kicked 3 goals including the iconic match winning snap from the pocket where he hugged a fan. Umpires hold a grudge.
Zero Sum Game
One of the things that is challenging about the brownlow is someone has to get three votes, two votes and one vote. If you predict player A to get 3 votes, and they get 1, and player B to get 1 vote and they get 3, you aren’t two votes out - you are 4. In cases where you have two players on the same team who are thereabouts at the pointy end of the night, these kind of games can have a huge effect on the end prediction results.
In 2018, the chaRlie model over-predicted Jack Macrae by 11 votes, its second biggest stuff up. A lot of the issue with the dogs is that when the midfield is on, they are all on, meaning that its difficult to figure out how the umpires will rank them. The example above shows that Jack Macrae was predicted to get 5 or 6 votes between rounds 19 and 21, and Marcus Bontempelli was expected to get around 3. The Bont polled 6 and Macrae 1, effectively an 7-8 vote turn around! Understanding the uncertainty the model has with particular games and players is key to its successful use.
Good luck, and may the Brownlow gods be with you like they were with Marley.