Updated: Aug 30, 2018
Welcome to the revamped Fat Stats website, complete with a new and improved Brownlow prediction model named chaRlie! For those who have followed our predictions in previous years, you will see that this year we have put substantial effort into making these predictions easy to understand, transparent and interactive.
chaRlie lives as an online app here so feel free to check it out (and let us know if you find any bugs).
Previously, we have focussed on comparing our model to betting odds to try and find value, however we will not be doing that this year - too stressful! If you want to use the model for those purposes, go ahead, but don’t blame us if you don’t become millionaires. If you do, feel free to thank us with huge donations, a beer or shout outs on Twitter.
The aim of this blog post is to summarise the new model and app so that people have a good understanding of how it works, but also understand its limitations. We try to improve it every year, so any feedback on the maths, visualisations or just life advice is welcomed. Over the next few weeks leading up to Brownlow night, we will be putting out small blog posts focussing on specific topics, so let us know if you have anything of interest you would like us to look into.
There have been some significant changes to the model since last year, leading to a significant drop in model error. (FYI for error, we calculate the absolute difference from the actual values (actual votes) to the predicted values for the top 100 players. Typically the error values will range from 2-4 using this metric). Below is a plot showing the differences between this year’s model applied to 2017 (red), the released model from last year (blue) and the actual (green). The results are clipped to players who were more than 5 votes different and also players in the top 100 actual results. As you can see, in general the red dots are much closer to the green dots. Our overall error for 2017 dropped from approximately 3.5 to 1.7; almost twice as good!
List of changes:
1. Game Normalisation. By normalising the statistics on a game by game basis, it helps to highlight players who had anomalous games relative to everyone else on the field (the point of the Brownlow). This results in the model being more focused on relative performances during a single game against your competitors which is at the heart of how the Brownlow is awarded.For example, in a game where both teams were kick heavy, 15-20 kicks might not be anomalous - there might be 5-10 players who got that, but in late August in torrential rain at the G it might be a great effort.
2. PAV (Player Approximate Value). As mentioned in previous years, it is a struggle to get the algorithm to notice good games by defenders and ruckmen. We came across PAV last year, created by the excellent folk at Hurling People Now (HPN). PAV is a set of statistical measures generated to try and establish a players value not only independently, but relative to overall team performance. It calculates metrics for offensive, defensive, midfield and total impact on a game which helps us address how we identify strong games by defenders and ruckmen. They provided their formulas, which we adapted for a game by game basis and included them as input variables.
3. Historical Skew. For the first time, the model was run over 8 years of historical data (2010-2017). We observed that some players had quite significant historical skews. For example, Buddy Franklin has been consistently over-estimated by the model for his entire career. We can use this information to “correct” the model by using the average of the previous three years skew. For example, Buddy has been overestimated by 3.2, 7.9 and 2.8 votes over 2015-2017 so in 2018 we can subtract 4.6 votes off his predicted tally. It’s not a perfect system; if you removed the skew from 2011 to 2013, Buddy’s 2014 total of 22 votes would have been underestimated by about 7 votes. Use with care!
4. Risk Ratings. In some games, the best player - or three players - on the ground is obvious. In others there are 5-10 players that you could throw a blanket over. The results of the chaRlie algorithm have the same problem, and often have multiple players with similar probabilities for votes. To get a mathematical estimate of this, we calculated the gini inequality for the top 6 predicted players for each game and then using that value classified them into "High", "Moderate" or "Low" risk categories. These numbers can be seen in the data table on the Round tab, or in the Player tab.
5. Projections. Seeing as the model was completed prior to round 23 this year, we added in a simple projection tab where multiple players final tally can be projected using a linear regression and a sliding bar of how many games to include. Please note that there is no skew corrections on these numbers.
1. Statistical measures. This one is quite obvious - we don’t have access to every statistic available (Our pockets aren't deep enough to afford Champion Data). Even if we did, statistics never capture the full essence of a game of football. Even the advanced statistics that are available at Footywire (i.e. metres gained) aren’t included in this model as they are only available from 2015 onwards and would limit the training data. Players like Cyril Rioli and Alex Rance are never going to have their performances captured well by current statistical measures and this needs to be taken into account. It’s not only the fact they might score more votes than predicted, but votes they get are votes someone else cannot.
2. Umpire Bias over time. The game of AFL is renowned for changing rapidly, due to both innovative coaches and overly trigger happy administration. Game styles change from free-flowing to contested and from ruck dominated to no ruck at all sometimes in the span of only a season. It’s not a big leap to assume that what an umpire considers important also changes over time. We will release a blog post later on in the year showing these trends, but to compensate for this phenomenon we only use a training dataset of 5 years. The risk is that any abrupt changes in umpire vote bias will not be reflecting in historic data and therefore our training dataset.
3. The Brownlow is an inherently subjective award. Similar to point 2, at the end of the day the votes are awarded by humans and as such will be affected by a whole range of difficult to quantify psychological factors. In addition, the umpires reportedly do not look at statistics before the votes are cast. Its important to keep in mind and that no prediction is going to get it entirely right - and sometimes it will get it catastrophically wrong - even if its testing does not. If you are going to use the model for gambling, make sure you are sensible, do not bet more than you can afford and do your own research and de-risking.
All in all, we are quite happy with the improvements to chaRlie in 2018. We think that we have improved the algorithm significantly, we are using smarter inputs and when people want to look at and analyse our predictions, they can now do so interactively in a considerably easier and more transparent manner via the app. Whether the model is right or not is yet to be seen (there are still 2 rounds to go), but at the time of writing it seems very unlikely Tom Mitchell will lose. If he were to inexplicably throw a left hook and get himself suspended, then it seems that it would be a 2 horse race between Clayton Oliver and reigning medallist Dustin Martin (if you assume Buddy will be overestimated again). Please contact us with any questions, ideas, feedback or criticism of the model, a lot of the improvements have come from outside advice. Happy Brownlowing!
Visit the app to see the latest predictions here.