Whilst no statistical model can provide any meaningful perspective on whether a single outcome was or was not due solely or mostly to chance, a model can provide some quantitative information about how plausible a chance explanation might be.
I have in the past written about this topic in the context of the proportion of games finishing with a given margin (or less) that a team of specified pre-game superiority might be expected to win. There I found, for example, that a 3-goal pre-game favourite should be expected to win games decided by 3-goals or less about 55% of the time.
That analysis provided a post hoc view, looking at games that finished as close games. For today's post on this topic I'm going to adopt an in-running view, drawing on a simplified version of a model similar I derived in this post, looking at games that were close with some, limited time remaining.
The simplified model is also built using data from the period 2008 to 2016 and produces fitted values that correlate about +0.99 with those from the more complex model at all points in the game.
It is, like the more complex model, a quantile regression model of the form:
Predicted Final Home Margin = a_{0} + a_{1} Current Home Lead / ((1 - Game Fraction)^{-0.03}) + a_{2} (1- Game Fraction)^{1.16} + a_{3} Pre Game Home Probability (1 - Game Fraction)^{1.20}
The exponents in this model were selected to minimise the Akaike Information Criterion (which was 1,643,364 for the simpler model compared to 1,643,134 for the more complex model). The fitted values of a0, a1, a2, and a3 depend on the particular quantile chosen, and the model was fitted to quantiles in 1% increments from 1% to 99%. In fitting the model, the pre-game home team probabilities were estimated from the pre-game TAB head-to-head prices (using the overround equalising methodology).
Using this model we can ask the following question: for a home team with a specified pre-game probability of victory and a given lead with X% of the game to go, what is the fitted probability that the home team wins?
By constraining the size of the given lead and the fraction of the game remaining to relatively small values, we can explore the in-running model's views of what would be considered "close games".
Specifically, we're going to look at games where the lead, from the home team perspective, was been -3 and +3 goals, and where there was either 5% (about 6 minutes) or 10% (about 12 minutes) of the game remaining.
The in-running model outputs for these scenarios appear below.
On the left we have the results for games with about 12 minutes remaining and where the lines relate to home teams assessed as having pre-game victory probabilities of either 10%, 30%, 50%, 70% or 90%. We see there, for example, that a 70% pre-game home team favourite tied with 12 minutes to go has about a 55% probability of victory.
As well, we see that a 90% pre-game favourite trailing by a couple of points with 12 minutes remaining is still a narrow favourite.
With only 6 minutes remaining, however, leads become far more important, even for underdogs. For example, a home team assessed as just 10% chances pre-game are over 70% favourites if they lead by a goal at that point, and about 90% favourites if they lead by 2 goals.
The broader conclusions from the chart are that:
In short, better teams will tend to win more close games than weaker teams, but at an incremental rate lesser than their pre-game probability would have indicated, much lesser if the game remains close as it enters its final minutes, and lesser still if the better team trails. A tied game, however, is only a 50:50 proposition for a team that was a 50:50 proposition when the game commenced.
]]>After all my kvetching and worrying last week, and my expectation to be making wagers progressively from Friday to Tuesday next week, Centrebet framed all nine over/under markets earlier today (Wednesday), a couple of hours even before the TAB framed theirs.
Sure, they were only at $1.87, which means I'm paying a slightly higher premium than I would usually with Centrebet, but experience has suggested that MoSSBODS tends to be better when it wagers on early rather than late markets (though, to be honest, I probably need a few more counterfactuals to assert that confidently).
So, here's what the TAB and Centrebet markets looked liked just before 4pm today.
All four forecasters agree that the Suns v Crows game is likely to be the round's highest-scoring game, and all four agree that the Power v Blues game is likely to be one of the lowest-scoring games (though the TAB and Centrebet expect the Sydney v GWS game to be even lower-scoring).
They also concur that the Dogs are most likely to be the round's high-scoring team, registering anywhere between 113 and 127 points, and that the Dogs' opponents, the Lions, will be the lowest- or equal lowest-scoring team.
But, importantly for the prospect of any wagering, there are some sufficiently large divergences of opinion between MoSSBODS and one or both of the bookmakers in terms of the expected totals in some games.
In four games, in fact, MoSSBODS differs in its opinion by more than the requisite minimum 6 points, which has induced an:
That level of activity is much lower than last week's seven wagers, but more in keeping with what we saw in earlier, often profitable, rounds.
In reviewing last week's performance, bear in mind that Centrebet had a decided advantage over the TAB in its totals (and margin) forecasts because it posted them much nearer game time.
Given that, the TAB's superior performance on totals is even more impressive. Centrebet did, however, record a better mean absolute error (MAE) than the TAB for game margins.
Ignoring the bookmakers for a moment, we see that MoSHBODS outperformed MoSSBODS on every metric in Round 4 except game margin MAE where MoSSBODS pipped it by just 0.3 points per game.
Across the season so far:
MoSSBODS has not just outperformed MoSHBODS in terms of game margins, but both the TAB and Centrebet as well. History shows that these types of lead dissipate quickly, however.
Looking lastly at the Errors section we see that:
Now, I can relax until Friday ...
]]>Anyway, the head-to-head and line markets are up and here we are with Round 5 just three days away, so it's time to lock in the weekend's forecasts.
Only five of the Head-to-Head Tipsters have opted for even a single underdog this week, the two RSMP's tipping the Swans to produce the upset against the Giants, Consult The Ladder and C_Marg tipping the Dons to do the same against the Pies, and Home Sweet Home tipping both those upsets plus three more in the form of Gold Coast, St Kilda, and Hawthorn.
That level of broad agreement has made for a small overall Disagreement Index - it's at 19%, the second-lowest it's been this season, and would probably have been at a record low but for Home Sweet Home's Index of 50%.
Amongst the Margin Predictors, C_Marg, for only the second time this season, does not have the week's highest mean absolute deviation (MAD). Its 5.0 points per game MAD is, in fact, only 4th-highest, behind, in order, MoSSBODS_Marg (6.7), Bookie_9 (5.2), and MoSHBODS_Marg (5.1).
The week's most extreme tips are spread across seven different Predictors this week, which is far more dispersed than we're used to seeing. As such, it's not entirely surprising that the all-Predictor MAD of only 4 points per game per Predictor this week is a season record, lowering the mark of 4.5 set in Round 2.
Across the nine games, only the Sydney v GWS (5.8 point per Predictor), Western Bulldogs v Brisbane Lions (5.5), and Gold Coast v Adelaide (5.4) games have elicited MADs of greater than 5 points per Predictor.
There are record levels of agreement too amongst the Head-to-Head Probability Predictors, the all-Predictor MAD of 3.3% points per Predictor per game lowering the previous record of 4.6% set only last week.
MoSSBODS_Prob has the week's highest MAD (5.5% points per game), ahead of C_Prob (4.2%), and MoSHBODS_Prob (4.0%). MoSSBODS_Prob has also produced the most extreme forecast in seven of the nine games.
Probability estimates vary most for the Essendon v Collingwood game where the MAD has come in at 5.9% points per Predictor.
The table below provides the round-by-round and overall average disagreement values for every forecaster, and gives a good overall perspective on how Round 5 sits within the season.
At some point I plan to perform an analysis looking at the relationship between levels of disagreement and forecasting accuracy at the individual forecaster and forecaster-grouped-by-type levels.
The relatively high levels of agreement have made for relatively low levels of wagering this week, MoSHBODS suggesting only five head-to-head wagers totalling just over 10% of the original Fund, and MoSSBODS suggesting only five line wagers totalling just under 8% of the original Fund. That's the lowest single round aggregate wagering we've seen in the head-to-head market, and the second-lowest in the line market.
Together, these 10 bets put about 5% of the original Overall Portfolio at risk, almost 60% of which is in the hands of the Dogs and the Pies.
The Dogs could knock 1.8c off the value of the Overall Portfolio should they fail to cover their roughly 7-goal spread in taking on the Lions, while the Pies could snip 1c off the value simply by losing to Essendon. Adelaide, the Kangaroos, and St Kilda control most of the remaining risk.
In terms of upside, about three-quarters of it is in the hands of the Dogs, Roos, and Pies - a disconcerting situation when you think about it, because none of those animals have hands.
Anyway, a set of best-case results would add just over 4c to the Overall Portfolio price, and a set of worst-case results would knock just under 5c off that price.
Last week I reflected on the conundrum posed by the recalcitrance of Centebet in posting totals markets. On the assumption that Centrebet will continue to post these markets only on the day of the game, from this week onwards I plan to:
Note, however, that I will only be making wagers once both the TAB and Centrebet markets have been opened so, in some cases, I'll be posting about them here post hoc. You will, however, know MoSSBODS' views in advance, and hence will have the basis on which I'll be making wagers should appropriately attractive totals be set.
So, here are MoSSBODS' and MoSHBODS' opinions on scoring for this week.
I'll refrain from any comment or analysis here, reserving that possibility for later in the week when I post the first Over/Under market update.
]]>Whilst both, on balance, felt that the Crows underperformed in their victory over the Dons, MoSSBODS felt that a mere 10 Scoring Shot victory merited a relatively significant deduction while MoSHBODS felt that a smaller deduction was in order.
GWS' performance, on the other hand, was treated even more differently by the two Systems, MoSSBODS deeming their 6 Scoring Shot victory worthy of a small increment in Combined Rating, while MoSHBODS felt the win insufficiently impressive and thus deserving of a smallish reduction in Combined Rating.
Those changes flipped the ordering of the Crows and the Giants for MoSSBODS, who now ranks the Giants in 1st ahead of the Crows, but have left MoSHBODS still ranking the Crows in 1st ahead of the Giants. On both Systems, however, these two teams are rated considerably higher than all other teams in the competition.
Further down the table we find just three teams ranked more than a single place differently by the two Systems:
Looking next at the MoS brothers' component ratings, we find that GWS is now ranked 1st on offence and on defence by MoSSBODS, while MoSHBODS ranks Adelaide 1st on offence and the Western Bulldogs 1st on defence.
There are now just two teams ranked more than one place differently by MoSSBODS and MoSHBODS on offence:
On defence there's a slightly higher level of disagreement between the two Systems about team rankings, with six being ranked two places or more differently:
MoSSBODS currently rates 11 teams as above-average offensively and 10 as above-average defensively. MoSHBODS also rates 11 teams as above-average offensively, but just nine as above-average defensively.
Regular MoS readers will know that I inadvertently misled ChiPS last week by failing to advise it of the result in the Gold Coast v Hawthorn game. One consequence of this was that ChiPS, once informed, ranked the Hawks as only 12th, which left it much less surprised by this week's result in their game against the Cats.
That said, the Hawks' thumping was still enough for ChiPS to demote them another four places, leaving them ahead now only of Carlton and the Brisbane Lions.
MARS also slipped the Hawks down multiple places, from 8th to 13th.
Higher up the table we still find ChiPS and MARS rating the Crows as the top team, ahead of GWS in ChiPS' case, and Geelong in MARS'.
Across the entire suite of teams there are relatively high levels of disagreement this week between ChiPS and MARS, eight teams now being ranked two places or more differently by these two systems:
Despite all that disagreement, however, ChiPS and MARS rate the same nine teams as the only teams that are currently better than an average team.
Looking across all of the MoS Team Rating Systems, four teams seem to be causing the most dispute in that their minimum and maximum rankings differ by more than three places:
Who's right? Who knows, but if forced to make a choice I'd probably go with MoSSBODS.
]]>So far, both of the MoS brothers have acquitted themselves well, though moreso in the field of margin prediction than in head-to-head tipping or probability estimation.
This week, best amongst the Head-to-Head Tipsters were the six of them that correctly predicted the outcome in seven of the games, among them ENS_Linear and the two RSMP Tipsters, who remain tied in first place now on 24 from 36 (67%).
C_Marg, despite managing only six from nine did at least drag itself one tip ahead of Home Sweet Home who now sits last on the ladder with an 18 from 36 (50%) record. There really only is so much that a home ground advantage can do for some teams ...
The all-Tipster average for the round was 6.3 correct tips from 9 games.
C_Marg did even better as a margin predictor this week, recording the round's best mean absolute error (MAE) of 22.6 points per game, which was more than 3 points per game better than the second-best Predictor, MoSHBODS_Marg.
Overall, the week's results saw an all-Predictor average MAE of 26.8 points per game, and left MoSSBODS_Marg at the top of the Leaderboard for the second successive week, now 16 points clear of the field.
MoSHBODS_Marg remains in 2nd, and Bookie_Hcap in 3rd, ENS_Linear the big mover of the week, climbing two places from 6th into 4th.
Amongst the Head-to-Head Probability Predictors, C_Prob struggled again this week, recording the round's worst (and only negative) probability score. The three bookmaker-based Predictors did best, foremost amongst them Bookie_OE who heads the Leaderboard. MoSSBODS_Prob did better than MoSHBODS_Prob, and remains ahead on the full-season view.
MoSHBODS_Prob's relatively lacklustre round spelled trouble for the Head-to-Head Fund, which landed only 2 of its 6 wagers on the weekend, falling by almost 11c as a consequence. It still remains up on the season however, but now only by about 2c.
MoSSBODS' opinions led to a 1 in 4 record for the Line Fund, though the sizeable bet on the Crows was enough for that single successful bet to more than wipe out the losses on the other three. So, the Line Fund edged up - by just 0.6c to finish the round up by 8c on the season.
The Over/Under Fund is also powered by MoSSBODS, and it collected on only 3 of its 7 wagers, dropping 3c as a result to end the round up by only 4c on the season. One of the four losses was by half a point however, and would have been another collect had I waited for the more-generous total offered by Centrebet.
In total, the Overall Portfolio lost 3c on the round to finish Round 4 up by 5% on the season.
]]>It's still tipping the Blues to win, but now by only 2 points, and it's swapping its allegiance to the Cats over the Hawks, which it now tips to win by 13 points.
Data input folks: it's key ...
Those changes mean that the all-forecaster disagreement metrics for the Head-to-Head Tipsters and Margin Predictors for Round 4 remain as the second-highest of the season, but that the MAD for the Head-to-Head Probability Predictors in Round 4 is now the lowest of the season for a single round.
]]>These objectives are:
What's causing the conflict is that, this week, Centrebet has been dragging its feet on posting markets, putting up the Eagles v Swans total only on Thursday morning, followed shortly after by the total for the North v Dogs game on Friday. It's now late on Thursday and there are no further Centrebet markets available. (This despite the fact that the TAB and Pinnacle markets have been available for days. That's pretty disappointing to be honest; I had hoped Centrebet would be different.)
For this week then, in an effort to meet the objective of only publishing confirmed wagers, I decided to lock in wagers using only the TAB markets late on Wednesday night. I'll need to review that approach for future rounds; even this week that approach has meant that I've taken a poorer proposition in the North v Dogs game than I could have had I waited for the Centrebet market.
What I'm thinking is that, in future, I'll post MoSSBODS' and MoSHBODS' opinions generally on the Wednesday night including any wagers that have been made contingent on both the TAB and Centrebet markets having been posted, and then make subsequent wagers on each game at or around the time that both the TAB and Centrebet markets become available. That means, unfortunately, that I'll be posting about some of the actual wagers after they've been determined, but you will at least have the basis on which those wagers had been made.
That's not perfect, but I want this blog to reflect real world experience rather than some artificial, academic version of what might have been possible given ideal (seemingly prescient) timing.
Anyway, to this week then with the market information such as it exists.
MoSSBODS and MoSHBODS again have very similar opinions about the totals in each game, the largest difference coming in the Collingwood v St Kilda game, where MoSHBODS has the Saints scoring 5-points fewer than MoSSBODS' forecast, which is enough to flip the game to being a Pies win in MoSHBODS' eyes.
Both MoSSBODS and MoSHBODS have that Pies v Saints game as the high-scoring game of the round, very much at odds with the TAB, which has the Crows v Dons game in that position. As one sign of just how different are MoSSBODS' and MoSHBODS' opinions from the TAB's is the fact that they have that Crows v Dons game as their lowest-scoring game.
An even starker sign of those differences is the fact that MoSSBODS has seen fit to make wagers in seven of the nine games, four overs and three unders wagers with overlays ranging from about 2 to 4 goals. That is, I'll admit, enough to make me nervous, but was also one reason that I decided to take the TAB totals on offer and not wait for Centrebet markets to be posted. If these are genuine opportunities, surely they'll disappear over time.
Such nervousness as I have is tempered a little by the recent performance of MoSSBODS relative to the TAB and Centrebet, which shows that MoSSBODS now has a better margin MAE than the TAB or Centrebet, and a better MAE for Away team scoring too.
It's also not all that far behind on MAE for Home team scoring, and on MAE for game totals.
I recognise that we're only three rounds in, but it's a promising start ...
]]>In its Head-to-Head Tipster guise, it's opted for four upset wins, the equal-highest number of any Tipster (the other being, of course, Home Sweet Home). Amongst the other Tipsters, only Consult The Ladder (2), ENS_Linear (2), MoSSBODS_Marg (2), and MoSHBODS_Marg (1) have also found reason to tip against Bookie Knows Best in any of the games.
One-third of the underdog tips have come in the Carlton v Gold Coast game where the aggregate has finished 5-4 to the underdog Blues. Four more have come for the Saints who face the Pies. Both the Blues and the Saints, however, are priced close to $2 in the head-to-head markets and so are barely underdogs.
Such disagreement as there is has seen the all-Tipster Disagreement Index coming in at just 26%, its second-lowest value for the season, behind only the 14% of Round 2.
C_Marg, for the third consecutive round, has produced the largest mean absolute deviation (MAD) amongst the Margin Predictors, its 12.6 points per game figure a full 6 points per game higher than the next-highest, which is MoSSBODS_Marg. Elevating C_Marg's MAD most of all are its predictions of a 10-point upset Roos victory, a 17-point Blues victory, a 69-point Crows victory, a 30-point Tigers victory, and a 2-point upset Hawks victory.
C_Marg is on the extreme end of margin predictions in seven games, Bookie_9 in four, Bookie_3 in three, MoSSBODS_Marg in two, MoSHBODS_Marg in one, and RSMP_Simple also in one.
The all-Predictor average MAD of 5.2 points per game per Predictor is the second-lowest for the season, behind the 4.5 points per game per Predictor in Round 2.
Partly because of C_Marg's prediction of such a large win by the Crows, the Adelaide v Essendon game has come in with the highest MAD for the round at 11 points per Predictor, ahead of the Roos v Dogs game (7.3 points), and the Dees v Dockers game (6.3 points).
C_Prob also stands out amongst the Head-to-Head Probability Predictors with a MAD of over 10% points per game. The next-highest Predictor's MAD is MoSSBODS_Prob's at just 4.9% points per game.
C_Prob aside, we could see some significant moves on the Leaderboard mainly on the basis of the results in the:
The all-Predictor average MAD this week is 5.3% points per game per Predictor. As we saw for the other two all-forecaster metrics, this value is the second-highest for the season, behind only the result in Round 2 (4.5% points per game per Predictor).
It's another active round of wagering, but with fewer bets and less money at stake across the nine games than we had last week.
Altogether, that's six head-to-head and four line bets totalling just under 10% of the original Overall Portfolio. And, we've still managed to leave three games wager-free.
Most important to Investor fortune this week is the result in the Adelaide v Essendon game where the difference between a comfortable Adelaide win and an Adelaide loss will translate into more than a 7% point difference in the value of the Portfolio.
Only two other games have a range of outcomes that span more than a couple of percentage points: the Melbourne v Fremantle game (4.5% points), and the Carlton v Gold Coast game (2.8% points).
The profile of outcomes in the GWS v Port Adelaide game is an interesting one this week. It shows the maximum profit being achieved if the Giants win by anywhere between 1 and 22 points, but by no more or no less. That bounded optimum comes about because MoSHBODS rates the Giants as 82% chances, making the $1.30 head-to-head price on offer for them just attractive enough to induce a wager, while MoSSBODS is slightly less enamoured of the Giants' chances and thinks they'll win by only 13 points, making the 22.5 points start for the Power in the line market sufficient for a wager there too.
(By playing around with a number of scenarios where MoSSBODS and MoSHBODS wind up recommending wagers on different teams, such as is the case this week in the GWS v Port Adelaide game, I think I've convinced myself that it's impossible for them to place a combination of bets that renders a profit unattainable regardless of the outcome. Put another way, I think there will always been some range of outcomes that will see both the MoSSBODS and MoSHBODS wagers as winners, even in the unusual case where they wind up betting with different bookmakers and so obtain a head-to-head price of over $2 but also give start in the line market.)
Combined, the 10 wagers have the ability to slice almost 10c off the Overall Portfolio price or to add almost 8c to it.
]]>That said, the Crows' Combined Rating fell a little on the basis of the weekend's result, as their 17-point and 2 Scoring Shot defeat of Port Adelaide was assessed as being slightly less decisive than it should have been. Rating Systems, like the teachers and professors that demanded the most of you, can be very hard markers ...
More broadly, the weekend's results have left MoSSBODS and MoSHBODS agreeing about the Top 4 teams, and disagreeing most of all about:
For no other team do MoSSBODS and MoSHBODS disagree by more than a single ladder position about their most appropriate overall ranking.
Next, let's look at the Offensive and Defensive ratings and rankings of the two Systems.
Here, we find that:
MoSSBODS now rates 12 team as better than average on offence, and nine teams as better than average on defence. MoSHBODS rates only 11 team as better than average on offence, and eight teams as better than average on defence.
On MoSSBODS, the difference between the best and worst teams offensively is 8.8 Scoring Shots, and between the best and worst teams defensively is 8.5 Scoring Shots. On MoSHBODS, the difference between the best and worst teams offensively is 33.7 points, and between the best and worst teams defensively is 33.2 points.
After last weekend's results, I'll admit that I'm feeling a little more predisposed to take notice of MoSSBODS and MoSHBODS.
ChiPS, clearly, has been struggling for form a bit this season, perhaps a victim of some over-zealous optimisation in the off-season by someone who's name is best left unstated. That said, its errors are probably more in its estimates of home ground advantage than in its estimates of underlying team abilities, which this week see it also ranking the Crows as the best team, and the Giants as second-best.
MARS agrees with that assessment, though it and ChiPS now disagree most about:
For no other team is ChiPS' and MARS' ranking different by more than two places.
ChiPS now rates 12 teams as better than average, while MARS does the same for only eight teams, it excluding Richmond, Collingwood, Melbourne, and the Kangaroos from ChiPS' list.
]]>Across the teams, scoring shot conversion rates are widely spread, spanning the range from Geelong's 70.5% to St Kilda's 42.6%. Three other teams are also converting at below 50% (Hawthorn, Collingwood, and Fremantle) and eight more are converting at above 55% (Adelaide, Brisbane Lions, Essendon, Gold Coast, GWS, Kangaroos, Port Adelaide, and West Coast).
A few other things to note:
With all the other Tipsters scoring 4, the all-Tipster average was 4.3 from 9.
Amongst the Margin Predictors it was the MoSS brothers that finished 1st and 2nd this weekend, theirs the only mean absolute errors (MAEs) to come in under 26 points per game. That performance has lifted them into 1st and 2nd on the Leaderboard, MoSHBODS_Marg nudging Bookie_Hcap into 3rd by just 0.4 points.
C_Marg continued its poor start to the season with a 36.3 MAE, leaving it now over 30 goals off the lead.
The all-Predictor average MAE for the round came in at 29.2 points per game - a very respectable result given the number of upsets in the round.
MoSSBODS_Prob recorded comfortably the round's best probability score, so much so that it was catapulted into the top spot on the Leaderboard, ahead of the three bookie-based Head-to-Head Probability Predictors. MoSHBODS_Prob recorded only the fifth-best probability score for the round, though it was again positive, and much better than C_Prob's large, negative score.
I'm beginning to think a steward's enquiry into the off-season recalibration of ChiPS is in order ...
Simply put, it was an extraordinary weekend's wagering, the Head-to-Head Fund landing five from six bets to grow by 26c, the Line Fund also landing five from six to grow by 9c, and the Over/Under Fund landing three from four to grow by 3c.
In total, that added almost 10c to the value of the Overall Portfolio, which now stands up by 8% on the season.
There'll not be many more rounds like that one this year, I'd wager.
]]>MoSSBODS and MoSHBODS concur that this game is most likely to be the highest-scoring game of the round, and that it will probably produce an aggregate score in excess of 200. Neither of them, however, foresees 200 being broken in any other game, which has left their all-game average totals about 8 points below those of the two bookmakers.
The biggest difference of opinion comes in the Geelong v Melbourne clash, where MoSSBODS and MoSHBODS expect the total to be around 175 points, while the two bookmakers expect the total to be about 4 goals higher. There's also about a 2 to 3 goal difference in the Fremantle v Western Bulldogs game, which sees MoSSBODS and MoSHBODS predicting totals of around 170 points to the bookmakers' 185 points.
MoSSBODS, in fact, has forecast lower totals than the TAB and Centrebet in every game, and in four of those games the overlay is sufficiently large to warrant wagers.
MoSSBODS' lower expected totals are almost entirely the result of its lower expected scores for Away teams. Compared to the TAB, it expects about 8.5 fewer points per Away team per game, and compared to Centrebet about 7.8 fewer points per Away team per game.
Over the first two weeks of the season, MoSSBODS has done fairly well relative to the bookmakers, though moreso in Round 2 than in Round 1.
If we look at the Absolute Errors portion of the table above we see that MoSSBODS recorded the smallest mean absolute error (MAE) for Away team scores in Round 1, and the smallest MAE for Home team scores and final game margins in Round 2.
Averaged across the two rounds, however, it's the TAB that's done best on Totals, and Centrebet on Margins, Home team scores, and Away team scores.
]]>Only four Tipsters have opted for underdogs in any of the games this week, Home Sweet Home (HSH) doing this six times, Consult The Ladder (CTL) four times, C_Marg twice, and MoSSBODS_Marg once.
That leaves the Tigers v Eagles game as the only one where there is more than two Tipsters on the underdog, CTL, HSH and MoSSBODS_Marg all plumping for the Tigers in that contest.
The all-Tipster Disagreement Index comes in at 29% for the week, eclipsing by 1% point the previous high of 28% in Round 1.
The Margin Predictors have also, narrowly, set a record all-Predictor MAD. It's at 5.4 points per game per Predictor this week, edging out the value of 5.2 recorded in Round 1.
As previously indicated, C_Marg has contributed heavily to this high MAD, its own value of 13.6 points per game a record high for any Predictor this season, breaking its own mark of 11.2 points set in Round 2. It has the most extreme margin prediction in eight of the games, six times at the high end from the home team's perspective, and twice at the low end.
MoSSBODS_Marg has the next-highest MAD (7.2 points) and is Predictor Most Extreme in four games, ahead of Bookie_3 (6.8 points) who is in the top 3 for the first time this season and who has the low margin prediction in three games.
Looking across the games, we see that it's the Carlton v Essendon game that has the largest MAD (9.9 points per Predictor), this metric driven up most by C_Marg's prediction of a narrow Blues upset, and MoSHBODS_Marg's and MoSSBODS_Marg's predictions of narrow Dons victories.
The Roos v Giants game has also produced relatively high levels of disagreement (7.0 point MAD), book-ended by Bookie_3's prediction of a 37-point Giants win, and C_Marg's prediction of an 8-point Giants win.
C_Prob has also set a record MAD amongst the Head-to-Head Probability Predictors, its home team probability estimates differing, on average and in absolute terms, by 13.5% points from the all-Predictor average. MoSSBODS_Prob has the second-highest MAD of 7.0% points per game, and MoSHBODS_Prob the third-highest of 5.8% points per game.
The all-Predictor average of 6.4% points per game per Predictor is the second-highest this season behind Round 1's 7.3% points per game per Predictor.
As for the Margin Predictors, it's the Blues v Dons (13.4% points), and Roos v Giants (9.5% points) games that are contributing most to the elevated overall MAD value, though the Tigers v Eagles 7.5% result is also quite high.
I'm expecting a round of divergent probability scores this week.
MoSSBODS and MoSHBODS have conspired to produce six head-to-head and six line bets this week, all but one of them on home teams and all but three of them on underdogs.
As a result, there are four games this week where the difference between best-case and worst-case scenarios represent more than a 2.5% point swing in the price of the Overall Portfolio.
The Carlton v Essendon game has the largest potential swing, a Blues win providing a gain of almost 5% and a Blues loss by 28 points or more resulting in a loss of almost 3%.
Next most critical to Investors fortunes is the Roos v Giants game where a Roos win would bump the Overall Portfolio by 3.5% and a Roos loss by 32 points or more would lop 1.4% points from its original value.
In total, about 16% of the Head-to-Head Fund is at risk (somewhat startlingly, the smallest proportion in any single round this season), and about 13% of the Line Fund. In weighted terms that means about 8.5% of the original Overall Portfolio is at risk across the nine games and the aggregate potential swing spans almost 22% points, from a gain of 13.4% to a loss of that 8.5%.
]]>Below the Crows there was a lot of activity in both Systems, all of which served to completely align their top 6 rankings.
(Note that some minor adjustments were made to the ratings of Hawthorn, Essendon, the Brisbane Lions, and Gold Coast after I discovered an error in the data inputs from their Round 1 games.)
For MoSSBODS, the big movers were Port Adelaide, up four places into 5th, and St Kilda, up three places into 10th. Only one team fell by more than two places: West Coast, down three places into 9th, despite their win over the Saints.
MoSHBODS moved no team up or down by more than two places. It also rewarded the Saints (and punished the Eagles) based on their 32 Scoring Shots to 26 "victory" against a team they were assessed as being considerably weaker than pre-game and who they were facing away.
These re-rankings have left the two Systems differing by more than a place about the rankings of:
Looking at the offensive and defensive components of the ratings, we find that Adelaide has retained the number 1 offensive rating on both Systems, and the Western Bulldogs has retained the number 1 defensive rating on both Systems.
Port Adelaide is the big mover on both Systems offensively and defensively and is now ranked 3rd by both on offence, and 6th by both on defence.
On MoSSBODS, Port Adelaide aside, the Western Bulldogs (up 3) and St Kilda (up 6) were the only other teams rising significantly offensively, while Collingwood (down 4) and West Coast (down 3) were the teams falling significantly.
MoSHBODS also elevated the Dogs considerably (up 4), and dropped Collingwood (down 5) and Hawthorn (down 3).
Defensively, the major climber for MoSSBODS was Carlton (up 4), and the major decliners were Hawthorn (down 3), West Coast (down 5), and Fremantle (down 3). For MoSHBODS, West Coast (down 3), and Fremantle (down 3) were the only teams moving by more than two places defensively.
The two Systems now rank a number of teams quite differently on offence:
On defence, however, their rankings differ by no more than two places for any team.
MoSSBODS now rates 12 teams as above-average offensively, but only 7 as above-average defensively. MoSHBODS has the same 12 teams rated above-average offensively, and the same 7 teams rated above-average defensively, but adds West Coast to the latter list, albeit as a tenuous member.
On Combined Ratings, nine teams are rated above-average by both MoSSBODS and MoSHBODS (Adelaide, GWS, Western Bulldogs, Geelong, Port Adelaide, Sydney, Collingwood, Hawthorn, and West Coast). MoSSBODS has St Kilda as its tenth member of that list; MoSHBODS has the Kangaroos instead.
Adelaide also remains ranked 1st on both ChiPS and MARS, but the two Systems disagree significantly about the team ordering after that.
The largest differences are for Port Adelaide, who ChiPS ranks 3rd and MARS ranks 7th, and for Geelong, who ChiPS ranks 7th and MARS 4th. For no other teams do the rankings of ChiPS and MARS differ by more than two places.
ChiPS now rates 11 teams as above-average, MARS only eight and excluding the Kangaroos, Richmond, and Melbourne.
]]>(In case you missed the update, please note that these results reflect the corrected forecasts for the Friday through Sunday games as flagged in this post on Friday afternoon.)
All but two of the Head-to-Head Tipsters bagged 8 from 9 this week, which left the all-Tipster average at 7.8 and sees ENS_Linear remain atop the bunch, now on 13 from 18 (72%). Home Sweet Home and C_Marg were the only Tipsters to score less than 8.
C_Marg also struggled as a margin predictor, its mean absolute error (MAE) of 32 points per game comfortably the worst of all the Predictors and leaving it 90 points adrift of Bookie_Hcap, which is the current leader after two rounds.
Very pleasingly, best MAE for the week belonged to MoSSBODS_Marg (24.1), this result leaving it now less than 3 points behind Bookie_Hcap. Encouragingly too, MoSHBODS_Marg (25.6) returned the second-best MAE for the round, but its inaccurate forecasting in Round 1 has left it just over 6 goals behind Bookie_Hcap. There is though, as any footy coach would tell you, still a long way to go in the season.
In other news, the all-Predictor average MAE for the round was 28 points per game per Predictor, which is good, and only two Predictors now have profitable, season-long line market records: MoSSBODS_Marg and Bookie_9.
It was also a very good round for the six Head-to-Head Probability Predictors, though most definitively for the three bookmaker-based Predictors, which still fill the top three spots on the ladder, Bookie-RE pre-eminent amongst them. The three other Predictors, including C_Prob, all recorded positive probability scores as well this week, which has left all three with positive averages for the season too.
There are two pieces of good news for Investors this week: firstly that the Portfolio returned a profit this week, and secondly that the results from Round 1 were misstated in such a way that the loss was magnified.
The reason for the Round 1 error is simple: I applied last season's 40% weighting to the Head-to-Head Fund's result rather than this season's 20% weighting. You'll have to forgive me for being so obviously inept with numbers ...
Anyway, this week's results saw the Head-to-Head Fund land just 2 from 5 results - though all of the losing wagers on the Lions, Roos and Blues looked reasonable until fairly late in all three games - to finish just on the wrong side of break even. The Over/Under Fund also made a small loss, choosing correctly on 2 from 4 occasions to shed 0.5c.
The Line Fund, however, was right 4 times in 6 attempts, climbing 4.8c and propelling the Overall Portfolio to a 1.7c increase. That leaves the Portfolio now down by just 1.8c after last week's restated loss of 3.5c.
I reiterate though, there's a long way to go - for better, or for worse.
]]>Today, while I was running another analysis, I spotted an error in the data input for last week. Essentially, the Gold Coast v Brisbane, and the Essendon v Hawthorn results were flipped in the inputs to MoSSBODS and MoSHBODS.
Fortunately, the changes are small but, since the relevant games have not been played, I've decided to provide the updated forecasts, which appear below.
First, tips, margin predictions and head-to-head probability estimates.
The major changes here for MoSHBODS and MoSSBODS are that:
Next, team and total scoring.
In terms of game totals, we see:
At the team level, the changes are mostly only of a point or two, the exception being GWS, who MoSSBODS now sees as scoring 111 points rather than 116 points.
For completeness, here's the updated comparison of MoSSBODS with the bookmakers (showing the over/under bets as already made).
The major implications for wagering are that:
Let's start with the term overround, which is defined (for a market where there is only one winner) as the sum of the inverse of all the prices in that market, minus 1.
So, for example, in a head-to-head contest where draws are not possible, if the prices were $2.20 and $1.68, the overround in that market would be calculated by:
Overround = 1/2.2 + 1/1.68 - 1 = 4.97%.
The overround is a measure of how profitable a market is to the bookmaker so, from a bettor's perspective, a market with lower overround should be preferred to one with higher overround.
What we really care about as a bettor, however, is the overround embedded only in the prices for the wagers we plan to make. In the absence of any knowledge - actual or inferred - about a bookmaker's raw probability estimates, we normally assume that the overround embedded in each price is the same as the overround in the market as a whole (though the case for other assumptions about the spread of overround across prices, which usually imply that the prices of favourites carry less overround than those of underdogs, can be made and empirically tested).
If we did know what a bookmaker's assessment of the true probability of an outcome was, we could calculate the overround embedded in any single price using the following formulae:
which gives us
Now the fair price for any event is the price at which a wager on it would be expected, in the long-run, to break-even, and this can be shown to be equal to the inverse of that event's probability. So, for example, the fair price for a $1 wager on a toss of an unbiased coin would be 1/0.5, or $2.
With that in mind, the first equation tells us that overround can be thought of as deflating the price offered away from the fair price by multiplying that fair price by 1/(1+Overround).
We can see how overround affects the expected return from a bet by using the table below, which considers a range of event probabilities.
Let's walk through the first row.
It relates to an event that the bookmaker assesses as having a 5% chance of occurring. Given that, his (let me assume it's a he for simplicity of exposition) fair price is 1/0.05 or $20 for a $1 bet. If he chooses to embed 1% overround in the final price, that price will instead be $19.80.
Now, if the bookmaker's assessment of the true probability of the event is accurate then the bettor's expected return on every dollar wagered on this outcome is:
5% x ($19.80 - $1) - 95% x $1 = -$0.01 [the $1 terms reflect the initial wager, which is retained by the bookmaker whatever the outcome]
In other words, the bettor should expect to lose about 1c in the dollar on wagers like this.
If, instead, the bookmaker had embedded an overround of 6% by setting a price of $18.87, the bettor's expected return would be a loss of 5.7%.
Roughly speaking, the overround embedded in a price is equal to the bettor's expected loss on that wager, assuming the bookmaker's assessment of the true probability is accurate. As we'll see later, however, vig is a more accurate - indeed, exact - measure of this.
In the columns on the far right of the table we consider a situation where the bookmaker underestimates the probability of events by exactly 5% points. So, for example, for the fourth row where he's assessed an event as having a 35% probability, the true probability is actually 40%.
Here, even for an overround as high as 6%, a bettor would have a positive expectation from a wager on the event. In the 6% overround case, we calculate that expected return as
40% x ($2.70 - $1) - 60% x $1 = +$0.08
Notice that a given sized underestimation on the part of the bookmaker is more disadvantageous to him if it's made about an underdog (eg a team with probability of 25%) than it is about a favourite (eg a team with a probability of 75%). This might in part explain why bookmakers seem to embed more overround in the prices of underdogs than in the prices of favourites (the so-called 'favourite-longshot bias').
Vig (or vigorish) is related to overround but is more directly related to the expected return on a wager. In fact, that's exactly what vig is, by definition: the size of the expected loss per dollar from a wager.
We can derive an equation for vig using a generalised form of the expected return calculations we've already been using earlier.
So, we have
Expected Return = (Price Offered - 1) x Estimated Probability - (1 - Estimated Probability)
We define this as the vig of a wager at the given price and probability estimate, and simplify to get
A bit of maths yields the following identities with which we can derive vig from overround, and vice versa:
We can create a similar table for the effects of vig on expected returns as we did for overround.
Notice that, in this table, the expected returns shown in the middle section are exactly equal to the amount of vig assumed.
So, what does this all mean?
The major implication is that, if you're looking to wager, you should prefer bookmakers who you assess as having embedded less overround or vig in their markets in general (better yet, if you can, in the individual prices for the wagers you're interested in, but that requires some empirical analysis to estimate because you'll almost never know a bookmaker's raw probability estimates).
Here on MoS my major aim is to pit my predictive models against prices offered by widely-available and reputable Australian bookmakers. I don't have the time or inclination to open accounts with a wide range of bookmakers - and I pay a price for that in terms of overround.
Consider, for example, the over/under markets typically posted in 2017 at the opening by the TAB and Centrebet, the prices for which are $1.87 and $1.88 respectively.
In the table at right I've calculated the overround and vig in these markets (assuming that the bookmakers assess the probability of either outcome as 50%).
So, at a $1.87, I'm suffering a 6.5% vig, and at $1.88, a 6% vig. Compared to a bookmaker offering $1.93 that means I'm giving up between 2.5c and 3c on every dollar wagered.
Those calculations all assume that the bookmaker's 50% probability is accurate (in which case I'll still lose money with the bookmaker offering $1.93 prices, but I'll do so less slowly than with a bookmaker offering $1.88). If the bookmaker is miscalibrated, as we saw in the tables above, and if we're able to detect this reliably, then the true overround or vig can be far less than the estimates shown here. In fact, they might even be negative, signifying that a wager has a positive expectation for us.
That aside, however, if you are ever motivated to place wagers based on what you see on MoS, you should be mindful of the vig that you're paying with your bookmaker of choice, and strive to find one with the smallest vig possible - certainly smaller than the apparent vig that MoS Funds are paying if you possibly can.
]]>
The main difference this has produced is a 7-point gap between the TAB's and Centrebet's margin in the Port Adelaide v Fremantle game.
That aside, we see only a couple of points - at most - difference between the two bookmaker's Totals and similarly sized differences in their implied team scores.
Both bookmakers have the Hawks v Crows, and GWS v Gold Coast games as the round's likely highest-scoring contests, and the Port Adelaide v Fremantle game as the likely lowest-scoring. MoSSBODS and MoSHBODS disagree with the bookmakers about which game will be highest-scoring, preferring instead the Lions v Dons game, and disagree too about which will be lowest-scoring, opting for the Dogs v Swans matchup.
All four agree that the Giants are likely to be the round's highest-scoring team, but only MoSSBODS, MoSHBODS and Centrebet have the Dockers as low scorers, while the TAB casts Carlton in this role.
MoSSBODS has the average total per game coming in at 184 points, MoSHBODS at 185 points, the TAB at 188 points, and Centrebet at 189 points.
MoSSBODS' different opinions have seen the Over/Under Fund make four wagers this week, three on unders and two on overs.
For two of those bets, the apparent overlay is fairly high: about 3 goals in the Dogs v Swans game, and 13.5 points in the Lions v Dons game. The overlays in the Cats v Roos, and Dees v Blues contests are a bit smaller and around the 1 to 2-goal mark.
MoSSBODS has seven of the home teams scoring fewer points than do TAB and Centrebet, but only three or four of the away teams doing the same.
In the table below we look at MoSSBODS', MoSHBODS' and the two bookmakers' performances in Round 1 compared to actual scores.
On the right of the table we have the forecasters' mean absolute errors (MAEs), which shows that:
Looking at the left-hand side of the table, which provides raw error data (in terms of actual less expected scores), we see that all four forecasters overestimated home team margins despite underestimating home team scores. This was because they all underestimated away team scores even more. They also all underestimated total scores by about 28 to 29 points.
]]>Consult The Ladder (whose tips I have this week reported correctly) is the only other Tipster going contrarian in more than a single game, and the Tigers v Pies, and Lions v Dons matchups are the only ones with more than a single dissenting voice. In five games, the Tipsters are unanimously behind the favourite.
As a result, the all-Tipster Disagreement Index is only 14%, exactly one-half of what it was in Round 1.
Amongst the Margin Predictors, it's C_Marg, MoSSBODS_Marg and MoSHBODS_Marg that are contributing much of the variability, C_Marg to a larger extent than it did last week, but MoSSBODS_Marg and MoSHBODS_Marg to a lesser extent.
C_Marg's high mean absolute difference (MAD) value of 11.2 points per game is elevated by its bold margin predictions for GWS, the Brisbane Lions, West Coast and Geelong in particular, where it finds itself tipping the highest winning margin of all the Predictors by between about 10 and 25 points. It's also at the extreme on its Collingwood tip, though there its prediction is not all that far from the consensus.
Other Predictors with the highest or lowest home team margins are:
The round's low MAD belongs to RSMP_Weighted at just 1.7 points per game, which is a season record low.
Games generating the greatest spread of opinion are Port Adelaide v Fremantle (7 points per Predictor) and Geelong v Kangaroos (6.4 points per Predictor), while the narrowest spread comes in the Dogs v Swans game where the entire set of margin predictions span only 8 points.
C_Marg's sibling, C_Prob, has the highest MAD amongst the Head-to-Head Probability Predictors (7.5% points per game) and has the most extreme probability estimates in five games. MoSSBODS_Prob has the next-highest MAD, followed by MoSHBODS_Prob. Bookie-RE has the round's lowest MAD of just 3.3% points per game, which is the lowest MAD recorded by any Head-to-Head Probability Predictor so far this season.
Three games have elicited relatively large MADs across the Predictors: Brisbane Lions v Essendon (8.6% points per Predictor), Port Adelaide v Fremantle (7.3%), and Geelong v Kangaroos (6.0%). The Dogs v Swans game has the round's lowest MAD of just 2.2% points per Predictor.
MoSHBODS' and MoSSBODS' relative temperance this week has meant vastly diminished wagering activity in the head-to-head and line markets compared to last week.
(NB: An earlier version of this table showed incorrect handicaps for some line bets.)
In all, there are five head-to-head wagers totalling a bit over 16% of the Fund, almost half of that in a single wager on the short-priced Port Adelaide, and six line wagers totalling about 7% of that Fund, spread much more evenly. Last weekend, the Head-to-Head Fund risked over 30% and the Line Fund almost 20%.
The Ready Reckoner reveals that, not surprisingly, the Port Adelaide v Fremantle game carries the highest level of risk, with the difference between best- and worst-case outcomes spanning 3.5% of the entire initial Overall Portfolio.
The Hawthorn v Adelaide and Brisbane Lions v Essendon games both carry the next-highest level of risk with potential swings of 2.7% points, though the Melbourne v Carlton game isn't far behind with 2.1% points.
Three games carry no head-to-head or line bets, and the lone line bet in the Richmond v Collingwood game represents little more than a rounding error.
All together, six favourable results would add 6.4c to the Overall Portfolio price, while six unfavourable results would knock 6.1c off that price.
]]>Let's start by looking at the Combined (ie Offensive plus Defensive) ratings and rankings of MoSSBODS and MoSHBODS.
Both Systems rank Adelaide as the number 1 team, MoSSBODS assessing them as over 7 Scoring Shots better than an average team when playing at a neutral venue, and MoSHBODS assessing them as almost 4.5 goals better than the same such average team.
(For details on how MoSSBODS and MoSHBODS have been built and how to interpret their ratings, follow this link).
MoSSBODS has the Bulldogs in 2nd, then Geelong, GWS and Sydney. MoSHBODS puts the Dogs at the bottom of that grouping, and ranks Geelong, GWS and Sydney in the same order above them.
This similarity of team ranking persists as we move down the list. In fact, the Western Bulldogs is the only team for which the two Systems' rankings differ by more than one spot. All this despite the fact that 16 teams changed ranking on MoSSBODS and 12 on MoSHBODS on the basis of the weekend's results.
Perhaps the only other comparison of interest is that Melbourne is rated a very slightly above-average team by MoSSBODS and a very slightly below-average team by MoSHBODS.
Turning next to the component ratings we find that Adelaide is rated as the best team offensively by both MoSSBODS and MoSHBODS, and the Western Bulldogs are rated as the best team defensively, albeit it to a lesser extent after the Dogs allowed the Pies 26 Scoring Shots on Friday night.
Looking just at the Offensive rankings we that only the Western Bulldogs are ranked significantly differently by the two Systems, MoSSBODS putting them 7th and MoSHBODS 10th.
On Defence, no team is ranked more than a single spot differently by the two Systems.
I'll admit to being a bit surprised about the very high levels of agreement between MoSSBODS and MoSHBODS at this early point in the season, but hope this can be interpreted as convergent validity rather than shared miscalibration.
Lastly for this week, let's take a look at MoS' veteran Team Rating Systems, MARS and ChiPS.
They also have the Crows in top spot, but rank West Coast a little more highly (2nd and 3rd) than do MoSSBODS and MoSHBODS (6th and 6th).
Again though, the overall picture is one of broad agreement where the only other mild divergences are Sydney's #2 ranking and Collingwood's #11 ranking by MARS.
At this point of the season it seems that one hymn book is sufficient.
]]>As we get further into the season and trends begin to emerge, we'll start to do some comparative analyses using the data from the Dashboard. For this week, though, I'll just provide the Dashboard itself.
]]>Best was the rebuilt ENS_Linear Tipster, which bagged five from nine, making it the only Tipster to end the round above 50%. BKB, who got Consult The Ladder's incorrect tip in the Saints v Dees game where there were equal-favourites, scored just four, as did Consult The Ladder, and the two RSMP Tipsters.
(Note that I provided the wrong predictions for Consult The Ladder - I wound up basing its predictions on what it came up with for Round 1 of last year. There's always one error like this in the early rounds. Anyway, the score of four is what it should have received.)
Amongst the remaining Tipsters, MoSHBODS_Marg and MoSSBODS_Marg managed only three correct tips each, and Home Sweet Home and C_Marg managed only two. The all-Tipster average was 3.4 from 9.
The TAB Bookmaker line market handicaps, which as listed as Bookie_Hcap, proved closest amongst the MoS Margin Predictors, the mean absolute error (MAE) of these forecasts coming in at 31.8 points per game. Bookie_3 and Bookie_LPSO were next-best, less than a goal further back, while ENS_Linear grabbed fourth, just 14 points adrift of Bookie_Hcap.
Worst was MoSHBODS_Marg's 38.1 points per game, which left it almost 10 goals behind Bookie_Hcap. In three games it was further from the final margin than all other Predictors (hence the 3 alongside its name in the column with the sand bunker at its top). Bookie_3 had the highest number of tips "nearest the pin" (3).
The all-Tipster average MAE was 34.2 points per game.
Only four Predictors tipped more than half of the line market results, Bookie_LPSO, ENS_Linear, Bookie_9, and RSMP_Simple all snagging 5 from 9.
It was an especially tough week for probability estimation, with none of the Head-to-Head Probability Predictors returning a positive log probability score for the round. Bookie-OE did best, slightly ahead of Bookie-RE and Bookie-LPSO. MoSHBODS_Prob recorded the round's worst probability score.
Hopefully, the relatively large rating adjustment factors used by ChiPS, MoSSBODS and MoSHBODS in the early parts of the season will help improve their performances in Round 2.
Whilst the Overall Portfolio finished down almost 7c at the end of the round, for much of it I feared that loss would be considerably larger. Adelaide's and Geelong's victories in the last two games of the round were the major results that helped drag the Portfolio back closer to break-even.
It was the Head-to-Head Fund that caused most of the damage, its 2 from 8 performance seeing that Fund shed over 12c. Based, as it is, on MoSHBODS' opinions, that loss is not surprising.
The Line Fund also shed value, though only about half as much after a 3 from 6 performance.
Encouragingly, the Under/Over Fund started the season off well, landing both of its wagers to rise by 3.5c.
In total, the Overall Portfolio fell by 6.7c after collecting on just 7 of its 16 bets.
Last year, the Portfolio was up by 4.8c at the same point . Different year, different trajectory - hopefully same or better outcome.
]]>The TAB markets opened at $1.87, Centrebet at $1.88, which means we'll need to be right more than 53.5% of the time with the TAB's lines, and more than 53.2% of the time with Centrebet's lines, if we're to turn a profit at those prices. At the $1.90 prices on offer last season, 52.6% would have been good enough to break-even.
Anyway, for this week, I've locked in two wagers at Centrebet, both overs bets, one on the Swans v Port game, and the other on the Suns v Lions game. Both are for 2% of the Fund since we're level-staking again in this market this season.
These wagers were informed by MoSSBODS' opinions, which are tabulated below alongside MoSHBODS', the TAB's, and Centrebet's.
Across the full suite of games, all four forecasters have similar average views, though MoSSBODS and MoSHBODS both see higher home team scores, on average, than do the two bookmakers. They both have four teams scoring more than 100 points, three of them home teams, while the two bookmakers have only two teams achieving this mark (two home teams for the TAB and only one for Centrebet).
MoSSBODS has Fremantle as the round's low-scorers, while MoSHBODS, the TAB and Centrebet all have Port Adelaide. All four forecasters have Gold Coast on their list of high-scoring teams, but MoSSBODS adds Sydney and Adelaide as well.
(Note that, for MoSHBODS, Home Score + Away Score < Total Score because, historical data shows, the best estimate of the Away team's score is two points less than the base MoSHBODS prediction, but the best estimate of the Total Score is the sum of the unadjusted MoSHBODS Home team and Away team scores.)
Comparing MoSSBODS' forecasts directly with the TAB's and with Centrebet's, we see that MoSSBODS' Totals are within a goal of the TAB's in all but one game, and within a goal of Centrebet's in all but two games. Those are the two games on which the Over/Under Fund has wagered, with overlays of 8 and 16 points.
Had we have been using MoSHBODS rather than MoSSBODS to inform our over/under wagering, and assuming we'd have imposed the same 6-point minimum overlay on it, we'd have made one additional bet this week: an unders bet on the Roos v Eagles game against Centrebet's offer of 182.5.
It's going to be a fascinating year watching MoSSBODS and MoSHBODS match up.
]]>It was for this reason that, last year, I altered the methodology used for projections by incorporating a random perturbation into the team ratings used to simulate each result. Specifically, I added a N(0,0.5) random variable to each team's offensive and defensive MoSSBODS ratings in each game, these alterations feeding directly through into the expected Scoring Shot production of the competing teams. These perturbations serve to increase the chances of upset victories, since additional variability generally helps weaker teams.
(I'm not a fan of the alternative method of introducing greater intra-season randomness - that is, by adjusting team ratings on the basis of within-simulation results - for reasons that I outlined in the blog post linked earlier. In short, my rationale is that it makes no sense at all to change the rating of a team on the basis of a simulated result that was entirely consistent with its original rating. That's like knowing that you're simulating an unbiased die but changing the probability of rolling a six on the second toss just because one happened to show up on the first.)
The standard deviation of 0.5 was justified (lovely passive construction there) on the basis that it was "approximately equal to the standard deviation of MoSSBODS team component rating changes across the last eight or so home-and-away rounds for seasons from 2000 to 2015".
In today's blog I'll be revisiting this approach to season projections and looking to provide a stronger empirical basis to the size of the rating perturbations, if any, to be used in the projections at different points in a season.
For the analysis I'll be using the new MoSHBODS Team Rating System to provide the basic inputs into the projections - that is, to provide the teams' relative offensive and defensive strengths and the sizes of any venue effects. These will be used to form an initial set of expected scores for both teams in every contest.
The specific questions I'll be looking for the analysis to answer are:
With those objectives in mind, the simulations proceed as follows:
In Step 3 above we simulate the result of every remaining game in the season by:
There will, therefore be 35 distinct simulations (5 SD values by 7 starting Rounds), each of 1,000 replicates, for each of the 17 seasons. That's almost 600,000 in total.
We'll use the following metrics to compare the efficacy of different SD choices at different points in the season:
For all of these metrics except the Log Probability Score and average number of Finalists, lower is better since we'd like to be nearer to the teams' actual winning rates, nearer to their actual points for and against tallies, nearer to their final ladder positions, and better calibrated in terms of their ladder-finish probabilities.
I think the best way of conveying the outputs of the analysis here is as a table.
Let's go through the top five rows.
Each row relates to a different SD being used to perturb team Scoring Shot production, with a given SD used to simulate each season 1,000 times, giving 17,000 replicates in all across the 17 seasons. In this first block, all seasons were simulated from Round 1 to the end of the relevant home-and-away season.
The results were that, on average, teams' :
Next, looking at the Brier and Log Probability Scores, we find that:
Within this block (and every other block) of the table, items shaded in green are the best for a given starting round, while those in red are the worst.
Looking across all the blocks, we see that, within blocks:
In short, there's little convincing evidence to move away from an SD of (effectively) zero at any point in the season if our base assumptions are informed by MoSHBODS and our aim is top predict team winning rates, final scores for and against, final ladder positions, or to select the highest number of Finalists.
Conversely, we'd be better off using quite large SDs if our aim is to calculate the probabilities of teams' finishing in the Top 8 or Top 4, or as Minor Premier.
What this suggests is that the raw ratings do a good job of ordering the teams and simulating the difficulty of their remaining schedules, but also that they provide probabilities that are relatively poorly calibrated.
One way we can improve the calibration of these probabilities is by perturbing the raw Scoring Shot expectations, which tends to move all the probabilities nearer to parity for every team.
Below, for example are the results of 1,000 replicates of the 2016 season, starting in Round 1, using an SD of 0.001 on the left and 7 on the right. You can see that SD = 0.001 provides superior results for all metrics except the Brier and Log Probability Scores.
If you compare the relevant probabilities from which the Scores are calculated, you can see what I mean about them generally being dragged closer to parity when an SD of 7 is used compared to an SD of 0.001. Sydney, for example, were assessed as a 93.4% chance of making the 8 when the SD was 0.001, but only 85.7% chances when the SD was 7. Collingwood, on the other hand, move from 22.8% to 32.5% chances.
For anyone who's curious, I've provided the season-by-season results below. You'll find that the pattern is fairly consistent and the choice of a zero SD proves best or near-best in most seasons much of the time for all metrics except the probability scores.
(On a technical note, in a few cases, log probability scores were undefined for projections starting at Round 1 or Round 4. This occurs when a team finishes in (say) the Top 8 but the simulations have assigned a probability of zero to that event. In these cases I've arbitrarily recorded a log probability score of -18.)
So, what to make of all this?
The most obvious conclusion is that our choice of SD is heavily dependent on the purpose to which we plan to put our season projections. If we're after win percentages, points for and against, ladder finishes of all the teams, or predicting as many finalists as possible, then we should not perturb the raw Scoring Shot expectations at all.
If, instead, we want probability estimates for particular ladder finishes - say top 8, top 4, or minor premier - then we need to perturb the expectations by quite a large amount, possibly by even more than the highest value we've tested here (viz, 7 Scoring Shots). This is a surprising amount of on-the-day variability in Scoring Shot generation. Bear in mind as well that this is in addition to the variability that's already embedded in the team scoring model, both for Scoring Shot production and for conversion rates.
A case could be made, I'd suggest, that probability estimates should be avoided entirely during the earliest parts of the season. Whilst the use of larger SDs for projections commencing in Round 1 do produce probability estimates that are better than those of a naive forecaster (and better than those that use smaller SDs), they are only slightly better.
The case for making probability estimates from about Round 4 is slightly more compelling, but it's not until we get to Round 8 that the gap between optimal projections and naivety becomes very substantial. Another input that would help nudge the scales about when to start making probability estimations would be information about the quality of probability estimates of bookmakers at various points in the season.
I feel as though there's still more quite a bit more to explore in this area - for example, if there's a more elegant and statistically superior way of introducing the extra variability needed - but for today's blog, let's leave it there.
]]>As well as using the same dataset, I'll also use the same statistical algorithm (quantile regression), and a somewhat similar set of regressors and functional form.
Specifically, the variables I'll use are:
The quantile regression algorithm returns a separate regression output for each of the quantiles for which output is requested. I requested outputs for the 5th, 25th, 50th, 75th, and 95th percentiles.
The equation for the 50th percentile - which we can treat as our model for the "most likely" final total score - appears in the diagram at right.
(Producing it has given me a new-found disdain for Latex.)
It comes from fitting a model to 99 points in each game, these relating to the situation after 1%, 2%, 3%, 4%, and so on up to 99% of the game has been completed. Each quarter of a game is deemed to represent 25% of the entire contest, with points then equally-spaced within each quarter. With this approach, in a typical game we are sampling the score at intervals of roughly 30-40 seconds.
The equation tells us how to project the final total score for a game after a proportion GameFraction of it has been completed, given that we know the Home and Away team scores at that point, the length of the Home or Away team's Scoring Shot run at that point, and MoSHBODS' pre-game prediction.
Because we multiply each of the terms in the equation by (1-GameFraction) raised to some power, a term's influence diminishes as the game progresses if the exponent on (1-GameFraction) is positive, and increases as the game progresses if that exponent is negative. In this equation, only the CurrentTotalScore term carries a negative sign, and a very small one at that, which, along with a coefficient near 1, ensures that the output of the equation tends to equal the CurrentTotalScore as we near the end of the game. Which is, of course, what we'd want.
Below is an example output for all fitted percentiles for the Collingwood v Gold Coast game from late in 2016.
(Please click on it to access a larger version).
In the chart we map the fitted values for all five of the percentiles for which an equation was fitted (5th, 25th, 50th, 75th and 95th) across the entire game, and overlay the MoSHBODS pre-game expectation of the total score (blue line), the actual final total score (black line), and a projected score, calculated simply by dividing the current total score by the fraction of the game played (green dotted-line). We extrapolate the final score only from quarter-time onwards because extrapolations for earlier points in the game tend to be highly variable (for example, if a goal were scored in the first 30 seconds, the extrapolated final total would be 600 points.)
The coefficients in the regression were chosen to minimise the Akaike Information Criterion (AIC) for the model fitted to the 50th percentile. In this limited sense then, the model is relatively good. That optimised AIC value reveals nothing, however, about the practical utility of that model as an estimator of the 50th percentile.
One way that we can measure the model's performance in a useful and intuitive way is to proceed as we did in the previous blog and estimate the model's calibration. If the model for the 50th percentile is well-calibrated, we'd expect the actual final total score to fall below it 50% of the time, and to rise above it 50% of the time. We can perform similar calculations for the models for the four other percentiles.
If we do that, and consider the results for each of the quarters separately, we obtain the results as shown at left.
A perfectly-calibrated model for the Xth percentile would provide estimates such that the final total score would fall below them exactly X% of the time across some sufficiently large set of estimates. We see that all five models seem well-calibrated across all four quarters. In particular, the models for the 5th, 50th and 95th percentiles appear to be especially good.
We can use also this approach to estimate the models' calibration for each of the 1,786 games in the sample. For this purpose we calculate the proportion of times during the game that the 5th percentile model provided a projection of the final total that was above the actual final total, and then do the same for the four other percentiles. Lastly, we add the absolute differences between these proportions and the relevant percentage. So, for example, if the 5th percentile model provided projections that were too high 6% of the time in a game, its contribution to the sum would be 1% point since it would, ideally, have provided such projections only 5% of the time. Similarly, if the 50th percentile model provided estimates that were too high 55% of the time, its contribution would be 5% points.
Proceeding in this way allows us to rank the models' performance on every game and identify the games for which calibration was poorest.
The chart below is for the game where calibration was poorest of all.
It's not difficult to see why the models struggled with this game: an 83-point final term after three quarters averaging just 48 points each.
By way of contrast, the chart below is the game where calibration was highest, there being just enough variability in the scoring rate to bounce the model estimates around sufficiently for them to all be too high in almost exactly the right proportions - a "Goldilocks" game, if you like.
Note in particular how much better the 50th percentile model performed than naively projecting the final total based on the current scoring rate. The game produced only 6 goals in the first half - which was well-below MoSHBODS' pre-game expectation - but 16 in the second. The 50th percentile model reacted well to the lower-than-expected scoring, though it did overshoot a little.
There's one final performance metric worth estimating for the 50th percentile model and that is its mean absolute error (MAE) relative to that of naive extrapolation of the score. To calculate this we take the 99 estimates provided by the 50th percentile model for a game and then sum the absolute difference between each of these and the actual final total score. We do this for the 50th percentile model for all 1,786 games and then take the average.
Next we perform the same set of calculations for our naive model for which final score projections are made at the same 99 points by dividing the actual total score at that point by the fraction of the game completed. So, for example, if 47 points had been scored to quarter time, the naive model projection of the total would be 47/0.25 = 188 points.
If you look back at either of the model output charts above, what we're calculating for the 50th percentile model is the average distance between the heavy, jagged black line and the straight black line of the actual final total score. For the naive model we're measuring the average distance between the green dotted-line and the straight black line.
Doing this, separately for projections made in each quarter, and excluding those made in the 1st quarter altogether, yields the results in the table shown at left.
We see two things here. Firstly that, as we'd expect, the size of the absolute errors of both models declines as the game gets nearer to concluding.
Secondly, and more importantly, we see that the 50th percentile model outpredicts the naive model in all three quarters, though by progressively smaller amounts in each quarter.
We can see the diminishing superiority even more clearly if we plot the mean absolute errors of the two models at each value of GameFraction from 25% to 99%, which is what we've done in the chart at right.
One interesting feature of this chart is the behaviour of the average absolute errors for the two models across time. The Naive model follows three distinctive trajectories, declining steeply from the 25% point to about the 35% point, then more slowly until about three-quarter time, then more rapidly again across the final quarter.
The average absolute errors for the 50th percentile model decline at a fairly constant (and lower) rate until just after three-quarter time, and then match the trajectory of the errors for the naive model.
What this flags is variability in the actual scoring rate across the course of a typical AFL game. This variability "fools" the naive model, but is relatively effectively embedded in the 50th percentile model.
]]>That post has cropped up occasionally in a few other places and discussions since, most recently at the start of the month in this post on the FOX sports website. I had no idea the FOX piece was being written nor that my blog post was to be linked in it so it was one of those times where you see a sudden, pleasant spike in your website traffic and wonder where it's coming from.
If I had (or was even vaguely entitled to have) a Wikipedia page, I could now, at least, provide a few citations to support my claim of being a data scientist and football statistician.
(NB I'm pleased to confirm that the information in the blog post was not Fake Analysis.)
]]>The quintessential "good" answer to a question in Pointless is one that is "pointless" - that is, both correct but not provided by anyone amongst the 100 people who were given 100 seconds to come up with their own answers. In the first three rounds (the first two of which comprise two "passes") of the show, during which an initial set of four pairs is whittled down to just one, pointless answers add £250 to an ongoing jackpot. That jackpot resets to £1,000 for the next show after it is won, and increments by £1,000 from one show to the next when it is not won.
In the first two rounds (four passes), pairs work independently to provide one answer each, and the scores are summed within pairs. The pair with the highest sum is eliminated. In the third round, when just two pairs remain, the pairs can collaborate and the winning pair is the one which wins two out of three points by providing the lower-scoring answers.
The final round has the sole surviving pair conferring for 60 seconds in an attempt to find a pointless answer within any of three specified, usually quite narrowly defined, questions. If they find such a pointless answer, they take the jackpot. If they do not, they win a "coveted Pointless trophy" but no money, and do not return for the next show.
Pairs can appear on a maximum of two shows, and will only fail to do this if they are successful in reaching the final pass in their first appearance.
As I think I've said before here on the MoS site, analysts are driven by curiosity. We strive to answer questions that, sometimes for no objectively good reason, we genuinely need to answer once we've thought of them.
In the case of Pointless, I got to wondering about the average number of returning pairs from one show to the next. Absent any analysis, I'd have guessed it was about two, maybe a little less.
Working through the game mechanics, it's not hard to convince yourself that the number of returning pairs can range anywhere from zero to three, with four returning pairs being an impossibility because the pair competing for the jackpot in one show cannot return for the next. Zero returning pairs, though logically feasible, seems empirically rare, as does three returning pairs (though of course this must have been the case for, at least, the second game in Pointless history under the four-team format).
A more definitive answer to the average number of returning pairs can be obtained through simulation based on the game mechanics as already described. For that simulation we'll also need to make assumptions about:
As well as answering our original question about the average number of returning pairs, the simulations will also allow us to investigate how the average amount won per show and average jackpot size vary as we change the two input probabilities.
With four different values for the probability of providing a pointless answer, and 10 for the probability of the jackpot being claimed, we've forty different scenarios to simulate. We'll simulate each of them for 1,000 shows.
Let's firstly look at how average winnings per show varies with the probability of providing a pointless answer.
(Please click on this and any subsequent image to access a larger version).
We see that the average winnings per show increases by about £125 for every 2.5% point increase in the probability of providing a pointless answer. In each show (ignoring the possibility of tie-breaking extra guesses) there are either 18 or 20 opportunities for a pointless answer to be provided in the first three passes - eight opportunities in the first pass, six in the second pass, and either four or six in the third pass depending on whether a third question is required.
Assuming that all opportunities are independent (which is almost certainly not true, because some questions have no or few pointless answers, and answers cannot be repeated), the number of pointless answers in a show can be thought of as a Binomial random variable with two parameters:
To model the number of pointless answers in each simulation replicate I randomly chose 18 or 20 for the number of trials, assigning a 50% probability to each (assuming that a third question will be required half the time in the head-to-head pass).
Under these assumptions we can calculate:
On average then, we add about 0.5 more pointless answers every time we increase the probability of a pointless answer by 2.5% points. Since each pointless answer is worth £250, we'd expect that to translate into a £125 increase in average winnings, which is what we observe in the simulation results.
From this chart it appears that the probability of winning the jackpot has little or no effect on the average winnings per show - whilst there is some spread of average winnings as we vary the jackpot probability for a given pointless answer probability, there appears to be no systematic relationship with the jackpot probability. The following chart, which presents the same data in a slightly different way, makes this even clearer.
That bobbling about as we move from left to right, increasing the jackpot probability for a given pointless answer probability, looks to be no more than random noise.
If we think about the game mechanics we can see why this must be true. Across any given series of shows in which the jackpot is won at least once, the total amount won will be equal to £1,000 x number of shows + £250 x number of pointless answers given in those shows. Whether this whole amount is won only in the very last show, or several pairs each win some proportion of it across a number of shows, the average winnings per show will simply be the sum of the available winnings divided by the number of shows. That average will certainly vary with the probability of providing a pointless answer, but not with the probability of the jackpot being claimed.
That latter, jackpot probability will, however, determine the variability of jackpot sizes, since more difficult to win jackpots will tend to be larger when they are eventually won.
We can see this if we investigate the average jackpot size as we vary the probability of winning it.
Here we see quite a sharp decline in the average jackpot size as we assume that it is easier to win. At a probability of 10%, typical jackpots are in the £12,000 to £15,000 range, depending on how often pointless answers as given, whereas at 30% typical jackpots are only around £3,000 to £5,000.
There is also, of course, a systematic relationship between the average jackpot size and the probability of providing a pointless answer, as reflected in the ordering of the points for each given jackpot probability. As we'd expect, we generally see higher average jackpots when pointless answers are more probable.
Time then to investigate the question that led us here: on average, how many returning pairs will there be in a show?
Here, a moment's reflection on the game mechanics makes it obvious that the answer doesn't depend on the pointless answer and jackpot probabilities since:
So, we can estimate the average number of returning pairs by averaging our simulation results across all 40 scenarios. That gives an answer of about 1.72 returning pairs per game.
We can also estimate probabilities for each of the four possible outcomes for the number of returning pairs:
The 1.7 average returning pair figure and the probabilities of the different outcomes accord well with observation.
In summary, we can glean from this analysis that:
It would be fascinating to compare the results found here by simulation with the actual results from Pointless history. I'll be hunting around the internet to see if anyone has created a database of individual show results, or has even generated summary tables.
If you're aware of any, please let me know.
]]>Briefly, the theory involves "nodes", which are entities like books, teams or streets, and "vertices", which signify relationships between the nodes - such as, in the books example, having the same author. Vertices can denote present/absent relationships such as friendship, or they can denote cardinality such as the number of times a pair of teams have played. Where the relationships between nodes is between them and not from one to the other (eg friendships), the vertices are said to be undirected; where they flow from one node to another they're said to be directed (eg Team A defeated Team B).
In today's blog we'll use graph theory to depict the Twitter Follower networks of 30 Twitter accounts that I follow or am aware of, that have at last 200 Followers (as at 19 February 2017), and that Tweet regularly, if not exclusively, about the AFL competition. The nodes then will be Twitter accounts and the vertices between them based on followership relationships. Strictly speaking followership vertices are directed, but for the purposes of today's analysis I'll be ignoring that.
The objective of the analysis is to explore the nature of the follower groups for each of the 30 accounts, in particular the extent to which they are disjoint or shared, and to investigate whether some higher order structure might be revealed in the pattern of the sets of accounts that followers tend to follower in common. Well, that and to give me a chance to create some colourful and interesting charts using a technique I haven't got to use much ...
In broad terms, the analysis proceeds as follows
Altogether, just over 18,000 accounts follow at least one the 30 selected accounts, with the individual follower counts for any single account ranging from just over 2,000 to around 200.
We can get an idea of the raw follower counts and the pairwise co-follower numbers by visualising the cross-tab of the counts as below.
(Please click on this, and on other images in this blog, to access larger versions.)
In this visualisation, larger dots connote a larger number of cross-followers (or, on the diagonals, followers).
We see, for example, that Arwon, BetDetective, and TheArc all have relatively large follower bases, and that the co-followings of TheArc and MoS, JoshPinn and FootyGospel, as well as InsightLane and FiguringFooty are all relatively large. (NB Links to the Twitter profiles of all 30 accounts used in the analysis appear at the bottom of this blog.)
To get a sense of how significant these cross-followings are in terms of the follower bases of the 30 accounts, we can convert these raw counts into proportions, which we depict in the visualisation below.
Here we connote proportions by dot size (larger is higher) and also colour (lighter blue is higher), with the light-blue dots along the diagonal representing a proportion of 100%. Specifically, the size (and colour) of a particular dot represents the proportion of the followers of the account named in the row who also follow the account named in the column.
So, for example, a relatively large proportion of the followers of plusSixOne also follow FiguringFooty.
If we're interested in the actual proportions, we can simply spit out the cross-tab and colour-code it by value (leaving out the 100s on the diagonal to allow for a slightly wider range of colours).
This view, I think, makes clearest of all the surprisingly low levels of cross-followership generally amongst these accounts given that they all have in common, at the very least, an interest in AFL. Even amongst accounts that are subjectively more similar in "content", such as TheArc and FiguringFooty, the co-following rates are only 65% (from FiguringFooty's viewpoint) or 23% (from TheArc's).
Recall that we are, in graph theory terms, defining nodes as Twitter accounts and vertices as the relationship "follows" (so that, for example, the node FMI will have a vertex to it from node User_ID_12345 if the User_ID_12345 account follows FMI). Building our network on this basis, running it through our spinglass clustering routine, laying it out using the Kamada-Kawai algorithm, and then prettying it up in ggnet2, we obtain the visualisation below.
Perhaps the most important part of network analysis is finding a layout that subjectively "works" for the data. Layout algorithms are responsible for moving the nodes around in an attempt to reveal the underlying relationships in the data, and the Kamada-Kawai algorithm seems to have done a reasonable job for us here, but igraph offers a number of alternative layout algorithms that we might also have tried. There is no such thing as an objectively "correct" layout of a network, but some layouts clearly work poorly for some networks.
Here we do have some objective sense of the efficacy of the layout algorithm, however, in that it has performed well in separating the nodes from many of the clusters defined by cluster_spinglass (igraph also offers other clustering, or 'community detection', algorithms too) and in highlighting some of the more distinct co-follower groups.
For example, we can clearly see in red at the bottom of the visualisation the accounts that follow only ASpeedingCar, as well as the individual and shared followers of DownIsNewUp and BetDetective at the top of the visualisation.
In this visualisation, I've also coloured the node labels for the 30 selected accounts with the colour of the cluster to which the account belongs. As such, we can see the commonality of the RankSW, InsightLane, FiguringFooty, HPN, plusSixOne, MoS, SgtB and RyanB follower bases. As you'll see if you review their tweeting history, all of these accounts have a highly quantitative approach in their coverage and discussion of AFL.
A similar, though arguably prettier version of the network emerges once we port the igraph network into Gephi and use some of its layout algorithms as described earlier.
(You can access a PDF version of this image here.)
I gave the spinglass algorithm license to create up to 50 clusters (or communities), but it stopped after building just 19 of them.
Consistent with my earlier comment about the relatively disjoint nature of the follower bases, many of the 30 selected Tweeter accounts see a large proportion of their followers coming from a single community.
TheArc, for example, has a large proportion of its follower base in Community 9. No other account sources any significant proportion of its base from this Community. Most accounts, in fact, can be said to exclusively "own" a particular community, the obvious exception being the 10 accounts that seem to "share" Community 8.
This notion of "ownership" is also revealed via a by-community analysis looking at the proportion of each community that follows a particular account.
Limited in scope as it is, I think this analysis shows promise for wider application. I might, for example, redo it with a larger set of accounts or start to investigate the wider Twitter behaviour of some of the followers in the identified communities.
More broadly, I think there are other interesting possibilities in applying network analysis and graph theory to aspects of sports analytics such as team-versus-team result histories in the home-and-away season, or just in finals. I plan to investigate some of these over the next few weeks and during the season
Below are links the Twitter profiles of the 30 accounts used in this analysis in case you'd like to check a few of them out.
TheArc
DownIsTheNewUp
FMI
BetDetective
TheHolyBoot
JoshCPinn
FootyGospel
Gigs
Arwon
FreoPope
RudiEdsall
MoS
InsightLane
LucasGarth
Harri_Chas_17
ASpeedingCar
FiguringFooty
RyanBuckland
4Boat
DemP
Swishtter
RankingSw
NABFW
HPN
DiogenesBrown
SgtButane
NickTheStatsGuy
DadAndMog
plusSixOneBlog
CapitalCityCody
]]>
Whenever we convert bookmaker prices into probabilities, we have a decision to make about how to unwind the overround in them. In the past, I've presented (at least) three plausible methods for doing this:
These methods yield the following equations:
In all cases, Prob(Away win) = 1 - Prob(Home win), as we ignore the usually roughly 1% or so chance of a draw.
For the previous blog, I simply chose to adopt the overround equalising method for my purposes. The question for this blog is how well calibrated, empirically, are these three approaches solely at the start of the game across all games from the period from 2008 to 2016, and how do they compare to the probability estimates we can derive from MoSSBODS and MoSHBODS.
To make that assessment we'll proceed as we did in the previous blog, first establishing probability bins and then estimating the winning rate of home teams within each bin. For this purpose I've created 11 bins, each of which contains roughly the same number of forecasts across the five forecasters.
The calibration chart that results is as below:
Recall that a well-calibrated forecaster will see teams win about X% of the time when he or she rates them X% chances, so the closer are the points in the chart above to the dotted line, the better the forecaster.
Some observations on the results:
Across all forecasts, the forecasters demonstrate fairly similar levels of calibration - a conclusion borne out by their Brier Scores for these games (recall that lower scores are better):
The Log Probability Scores - which I define as 1+the standard definition to avoid zero being best - tell a similar story (recall that higher scores are better):
All of which gives me some pause that I'm using MoSHBODS rather than MoSSBODS to power the Head-to-Head Fund this year, but the empirical data does suggest slightly superior returns to using MoSHBODS, so I'll not be reversing that decision. At least, for now ...
]]>In response, TheArc footy wrote a piece earlier this week in defence of the practice of using and creating such models, which I'd recommend you read and on which I'd like to build, touching on some similar areas, but also delving a little more into the technical side (which, I hope by now, you've come to expect from here).
In-running probability models are designed for one, single purpose: to provide an estimate of the likelihood of some final outcome during the course of a sporting contest. That's it. If you've ever watched a game involving your team and wondered about how safe their 20-point lead was, or how likely it was that they'd be able to run down a 13-point deficit, then, at some level, you've naturally craved an in-running model.
You might merely have thought it "unlikely" that your team would surrender their lead, or "virtually impossible" that they'd make up the gap - which is fine if you're happy with qualitative assessments, but less so if you want to say how "unlikely" and whether it's more or less unlikely than the time they previously surrendered a similar lead (which floods all-too-readily into consciousness).
And that's what in-running probability models do - provide a numerical answer to a perfectly legitimate question: what's the best estimate of my team's chances now?
Can these models be used for gambling? Sure. Are they? Of course. But just because they can be used for one purpose doesn't mean they can't be used for others. In fact, the entire field of probability was founded on the need to estimate the in-running chances of an interrupted dice game, and this field does seem to have turned out to be useful for a couple of other things. Writing football-based blogs, for one ...
So, in short, creating in-running probability models seems like a sensible thing to do even if you have quite reasonably-held qualms about the social effects of gambling.
One other implicit criticism of in-running models stems, I think, from a belief that they don't actually provide what they promise anyway. If a team rated a near-zero chance winds up winning, how could the model possibly have been correct?
Whilst it is never possible - or, I'd add, sensible - to argue the efficacy of a forecaster on the basis of a single forecast, or even those of a single game, it is possible to define an intuitive metric and measure a forecaster using that metric across a sufficiently large set of forecasts.
Exactly what that measure might be I'll come back to in a later section.
How then might we go about building an in-running model of AFL games for ourselves?
I've written about this a number of times in the pages of the MoS website, most recently (I think) in this post from 2013, so here I'll spend only a few paragraphs describing my most-recent attempt.
The latest models once again draws on the score progression data from the remarkable afltables site, which provides the time for every scoring and quarter-end event in every game since 2008. Altogether, that spans 1,786 games. We use this data to create 200 observations for every game - the score at 0.5% intervals throughout the game, comprising 50 equally-spaced points within any given quarter. Roughly speaking, at an average of 1,800 seconds per quarter, that means we're sampling the score every 36 seconds. This discretised data is used for modelling, rather than the raw event-based scoring data, because it ensures that games with fewer scoring events have equal weight in our modelling as games with more scoring events.
Accurate pre-game estimates of the teams' chances are exceptionally important because, in essence, an in-running model can be thought of as a continuous updating of this pre-game Bayesian prior. To provide these estimates we'll use either the TAB bookmaker's pre-game head-to-head prices (converted to a probability using the overround-equalising method) or the pre-game probability estimates derived from the MoSHBODS Team Rating System. We'll build two models, one TAB-based, the other MoSHBODS-based.
Unlike the previous in-running models I've built, for which I've used binary logit or probit formulations, today's models will employ an algorithm called quantile regression, which I used for a similar purpose way back in 2014. What's appealing about the quantile regression approach is that it gives me more than just a simple point-estimate of a team's victory probability, but instead yields a distribution of the expected final margin (from which such a victory probability point-forecast can be estimated, as discussed later).
Now the efficacy of an in-running probability model is highly dependent on an appropriate definition of regressors or "features" as the machine learning adherents seem to prefer calling them. We would expect, for example, that the range of feasible final margins would contract as the game progresses, but that characteristic won't necessarily emerge from our model without some thoughtful feature design.
Which is why we end up with a quite odd-looking functional form for the in-running model, the details of which I'll spare you from today - partly because, if I'm honest, I'm just not up to wrestling with MS Word's Equation editor right now - but which includes a number of terms involving (1-Game Fraction) as a multiplier or a divider. This ensures that particular components have a diminishing effect as the fraction of the game completed goes toward 1, and provides the narrowing in the projected final margin estimates that we desire.
Variables other than Game Fraction that appear in the models are:
The full set of features, transformed and combined in various ways, are used to explain variability in the final game margin from the Home team's perspective.
As noted earlier, two models were fitted, one using TAB-based pre-game probability estimates, and the other using MoSHBODS-based estimates. We can measure the relative fit of the two variants using the Akaike Information Criterion, which reveals that the model using the TAB-based estimates is marginally superior.
The chart below summarises the fitted results for a single game, here the 2016 Grand Final.
In the top chart we have the score progression data itself, tracking the lead from the home team's perspective, here considered to be the Western Bulldogs (though that designation changed in the week leading up to the Grand Final).
Beneath that we have the in-running probability estimates of the two models, the one built using MoSHBODS in black, and the other built using the TAB bookmaker data in red. These probabilities are derived from the quantile regression outputs, which appear in the two lowest sections. Note that both models provide very similar in-running estimates throughout the game largely because MoSHBODS' (39%) and the TAB's (40%) pre-game estimates of a Dogs' win were very similar.
In the bottom section we have estimates for the projected final margin, these provided by the quantile regression that used the TAB prices for initial probability estimates. We've chosen to map the values of the 5th, 25th, 50th, 75th, and 95th percentiles across the game. This output allows us to say, for example, that at quarter time the model attached a 90% confidence level to the final margin lying somewhere in the -57 to +50 range, and a 50% confidence level to the final margin lying somewhere in the -23 to +19 range.
Armed with the model's fitted values for all percentiles, we derive an in-running probability estimate at any point in time by determining the percentile whose associated final margin value is nearest to 0. So, for example, at quarter time, that was the 49th percentile for the TAB-based model, so our in-running estimate is, accordingly, 49%. At the same point in time, the MoSHBODS-based model assigned the Dogs a 46% chance of victory.
To give you an idea of how in-running estimates look when MoSHBODS and the TAB have differing pre-game assessments, here's the output for an early-season 2014 game where MoSHBODS' rated the Saints over 40% points better chances than did the TAB bookmaker. The MoSHBODS-based model had the Saints as comfortable favourites all game, but it took the TAB-based model until the second half of Quarter 2 before it nudged them over 50% (an even then not for the remainder of the game).
From fairly early in the final term, however, the two models had become much more aligned.
I have created these in-running outputs for every game since 2008, copies of which can be downloaded from the Links & Archives page under the Downloads - In-Running Charts section.
Also, team-by-team in-running charts for all 22 home-and-away games of the 2016 season are available at the bottom of this page from the Static Charts - Score Progressions page.
As I mentioned earlier, I think some of the criticisms of in-running models come from people witnessing their occasional spectacular failures. To get a more balanced perspective of a probability forecaster, however, we need to look at more than just these top-of-mind examples.
The measure we'll use is called calibration, the intuition for which is that events assessed as having an X% chance of occurring should, in the long run, eventuate about X% of the time if the forecaster is any good. If you think about that measure for a moment, you'll realise that events assessed as having, say, a 99% chance of occurrence should fail to occur about 1% of the time.
If, instead, events rated as 99% chances by a probability forecaster never occurred, then that forecaster would not be well-calibrated. As much as they might be embarrassing for the forecaster at the time, occasional "mistakes" of this kind are actually a sign of his or her ability.
So let's assess how well calibrated are our two in-running models. We'll do this firstly by "binning" the models' probability estimates into 1% point buckets, and then calculating for each bucket the proportion of times that the home team went on to win. Ideally, for example, looking at all the occasions where a model made an assessment that the home team had a 25% chance of victory at some point in the game, we'd find that the home team actually went on to win about 25% of the time.
That desirable behaviour would manifest in this chart as the black (TAB-based) and red (MoSHBODS-based) lines tracking a 45 degree course from (0,0) to (100,100). We see that this is almost the case, though there is some suggestion that both models are slightly too pessimistic about home teams' chances when their estimates are in roughly the 15% to 40% range, where they seem to be about 5% points too low.
Overall though, across the full range of estimates, the calibration looks fairly good - certainly good enough to suggest that the models have practical efficacy. On average, when they estimate a team has an X% chance of winning at some point during a game, that team will go on to win about X% of the time.
You really can't ask for much more than that from an in-running model.
Now it might be the case that the models are better at certain times of the game than at others. For example, their estimates might be better in one Quarter than in another. That information would also be useful to know. To obtain it, we proceed to bin estimates and calculate winning rate in the same fashion as we just did, but look separately at forecasts made during Quarter 1, Quarter 2, Quarter 3, and Quarter 4.
What we find is that the home team pessimism we saw in the overall chart is largely driven by estimates from Quarter 1. Knowing that, we might choose to go back and review the model, looking in particular at some of the terms involving (1 - Game Fraction), which could be running off at the wrong rate in the first Quarter when Game Fraction is low. As it stands, home team underdogs or home teams whose estimated probability drifts into the 15 to 40% range during Quarter 1 according to both models, do a little better than the models forecast.
To summarise then:
I don't have bookmaker data that I trust going back beyond 2006, so I thought I'd use MoSSBODS to assess the level of pre-game favouritism for the full historical analysis.
Now, in the cross-tab I'd like to record the number of teams whose results fell into the relevant category, but also how many of the results we might have expected to fall into that category given MoSSBODS' pre-game assessment of the relative team strengths, adjusted for venue.
In this first table I do this by assuming that actual margins follow a Normal Distribution with a mean equal to MoSSBODS' expected margin, and a standard deviation calculated from MoSSBODS' margin errors in the relevant year (which range from about 26.5 points in the very early VFL years to the mid 40s around the 1980s.)
The result is shown below.
We see then that, for example, of the 4,285 teams that have started as more than 4-goal underdogs, 2,703 of them ended up losing by more than 4 goals. That's about 30 teams more than we'd have expected given our assumption about the distribution of actual margins around their expected value.
Now if we focus on the bottom row we can see that:
Looking up and down the column of data for the 2-4 goal wins (and losses) reveals that the shortfall is largely independent of how strong or weak the pre-game favourite was. The exceptions are games where there were equal favourites - which I've defined as games where MoSSBODS' expected margin was under half a point - and games where there was a better than 4-goal favourite. In those cases we saw roughly as many 2-4 goal victories (losses) as we'd have expected.
There are other features of this table that I think are interesting and probably worth exploring another day, but for today I'm going to stay with the apparently "missing" 2-4 goal margins.
Okay, maybe the result we've just seen is somehow an artefact of the early parts of VFL history - I can't posit exactly how or why, but let's just say it'd be a less interesting phenomenon if that were the case. So, let's constrain our analysis then to only those seasons from 2000 to 2016.
We now have that:
Far from eliminating the "missing margins" phenomenon, focussing on the modern era exacerbates it. And, now, the phenomenon occurs for games with every level of pre-game favouritism excepting equal favourites.
Curious.
But, there is another way of constructing the distribution of actual margins given MoSSBODS' pre-game views about the strength of the teams, and that is by using the team scoring model I fitted back in 2014, which allows us to create a distribution of margins for every combination of pre-game expected home and away team scoring shot levels. That model produces distributions that are very Normal-like, though they're not Normal and they are also heteroskedastic in that the standard deviation of margins around the expected value tends to increase with the number of scoring shots.
So, if we use this model to simulate 10,000 games under all combinations of home team and away team scoring shot levels, we can then estimate for every game how likely was each victory margin range. This will allow us to fill in the values in the "Expected" columns of the table. (For these simulations, for simplicity and for some other uninteresting reasons, I used the same parameter values for home and away teams in these simulations, but that makes little difference to the outputs.)
This new approach yields the next table, which suggests that:
That's not helping at all. Those 2-4 goal margins are still missing.
Using a higher level of correlation (about double, in fact) between home team and away team scoring shots improves the agreement between actual and expected wins (and losses) of more than 4 goals (now showing as an 0.7% excess), reduces the size of the missing 2-4 goal games (now a 12% deficit), but at the expense of driving up the apparent oversupply of 1 to 11 point wins (now an 11% excess).
That change, on balance, appears to mostly drive up the proportion of blowout victories (as you'd expect), and isn't based on the latest empirical data anyway. The correlation between the modelled errors in home team and away team scoring shots for all games in the 2000-2016 period is -0.25 - virtually identical to the -0.24 from the model fitted in 2014.
Still missing then.
MoSSBODS is good, but far from perfect at estimating pre-game team scores and margins, and the TAB Bookmaker (amongst a long list of others) is undeniably better.
Lastly then, let's use the data I have for the TAB from 2006 to 2016 to, firstly, estimate the expected margin in each game. For the most part, I've simply used the negative of the handicap in the line market for this purpose, though I've made adjustments in games where the handicap is under 7 points (because, previously, the TAB would set a minimum handicap of 6.5 points and would adjust the prices on offer, which meant that they didn't really expect a 6.5 point margin) and where the difference between the handicap and that implied by the head-to-head prices I have for the same game is too large.
In total, less than one game in five had its handicap adjusted and, in those games where it was adjusted, the average absolute change was about 2.4 points.
These expected margins have been converted to actual margin distributions in the same way that they were for the MoSSBODS' margin opinions in the very first table - by assuming that actual margins follow a Normal Distribution with a mean equal to TAB's expected margin, and a standard deviation calculated from the TAB's margin errors in the relevant year (which range from about 33 to 39 points across the 11 seasons.)
This approach suggests that:
A range of approaches then suggest the apparently missing 2 to 4-goal margins might really be missing.
However we analyse it - whether we use MoSSBODS or the TAB to set estimated margins, and whether, when using MoSSBODS, we assume that actual margins follow a Normal distribution or are best modelled based on an individual team scoring model - historical results seem to have two few margins of between 2 and 4 goals.
There are at least two reasons why this might be the case:
Readers, of course, might be able to come up with other explanations, which I'd love to hear.
But for now, for me, it remains an unresolved and curious issue ...
]]>As the 'SS' in its name reflects, MoSSBODS is based solely on teams' scoring shot performances. It doesn't distinguish between goals and behinds, so a team registering 15.2 is considered to have performed equally as well as one that registered 7.10 against the same opposition. Its Ratings are also measured in terms of scoring shots - above or below average - and so, for example, a team with a +3 Offensive Rating is expected to create 3 more scoring shots than the (previous season) all-team average when playing a zero-rated team at a neutral venue.
The rationale for using a team's scoring shots rather than its score in determining ratings is the fact that a team's accuracy or conversion rate - the proportion of its scoring shots that it converts into goals - appears to be largely random, in which case rewarding above-average conversion or punishing below-average conversion would be problematic.
Conversion is not, however, completely random, since, as the blog post just linked reveals, teams with higher offensive ratings, and teams facing opponents with lower defensive ratings, tend to be marginally more accurate than the average team.
So, if better teams tend to be even slightly more accurate, maybe higher accuracy should be given some weighting in the estimation of team ratings.
Enter MoSHBODS - the Matter of Stats Hybrid Offence-Defence System.
Fundamentally, MoSHBODS is exactly the same as MoSSBODS. It too is an ELO-style rating system and has the same set of underlying equations:
Also, just like MoSSBODS:
For MoSHBODS, the year 2016 was used as the holdout year, so no optimisation was performed using data from that year.
MoSHBODS also, as MoSSBODS, splits the season into five sections, dividing the home-and-away portion into four unequal pieces, and placing the finals into a fifth portion of its own. The optimal k's for each of those pieces for MoSHBODS are as follows:
MoSHBODS has a flatter profile of k's than does MoSSBODS for all but the earliest portion of the home-and-away season. It has a higher k, however, for that early portion, which allows its ratings to more rapidly adjust to the revealed team abilities of the current season.
Equations (4) and (8) are a little different for MoSHBODS and are the reason for its "hybrid" designation. Their role is to adjust a team's actual score to make it a mixture of:
The value of f in the equation determines the extent to which the adjusted score is dragged away from the actual score, with larger values producing larger differences. We can think of MoSSBODS as having an f of 1 since it puts no weight at all on a team's actual score and looks only at the number of scoring shots it registered. The optimal value of f for MoSHBODS has been determined to be 0.65, so it takes some account of a team's actual score, but places almost twice as much weight on a team's scoring shot production.
Let's work through an example to see how this works in practice.
Consider a team that kicked 15.4.94 to its opponent's 9.10.64. Assume that the all-team conversion rate in the previous season was 53%.
The team's adjusted score would therefore be:
0.65 x (53% x 19 x 6 + 47% x 19 x 1) + 0.35 x 94 = 65% x 69.35 + 35% x 94 = 78 points
That 78 point figure is a mixture of the 69.4 points the team would have scored if they'd converted their 19 scoring shots at 53% rather than at 79%, and of the 94 points that they did score. Overall, their score is reduced by about 16 points because they converted at an exceptionally high rate. They still, however, receive about a 9 point credit for their above-average accuracy.
Their opponent's adjusted score would be:
0.65 x (53% x 19 x 6 + 47% x 19 x 1) + 0.35 x 64 = 65% x 69.35 + 35% x 64 = 67 points
Their 67 point figure is a mixture of the 69.4 points the team would have scored if they'd converted at 53% rather than 47%, and the 64 points that they did score. Their score is increased by 3 points because they converted at a rate marginally lower than the expected rate of 53%. That is, though, about a 2 point penalty relative to what they'd have been credited with if the average conversion rate was used.
In this example then, the 30-point actual win is reduced to an 11-point win once the adjustments have been made. Note that MoSSBODS would have treated this game as a 19-all (scoring shot) draw.
It's a subjective call, but it feels to me as though MoSHBODS' assessment is more appropriate here.
Across the entirety of V/AFL history, the MoSHBODS approach changes the result (ie changes the winning team, switches a draw to a result, or switches a result to a draw) in only 9% of games. MoSSBODS changes the result almost twice as often.
MoSHBODS also estimates Venue Performance Values (VPVs) for every team at every ground, but does not use a separate Travel Penalty, instead allowing the Venue Performance values to absorb this component. This makes the VPVs very straightforward to interpret.
Another benefit of this approach is that it allows the effects of interstate travel to vary by team and by venue. MoSSBODS, by comparison, simply imposes a fixed 3 scoring shot penalty on any team playing outside its home state against a team playing in its home state. The apparent variability of the effects of travel on different teams to different venues is clearly reflected in the matrix below, which is MoSHBODS' VPVs for current teams and current venues as at the end of the 2016 season.
Numbers in this table can be interpreted as (a regularised estimate of - see below) the number of points the team would be expected to score above or below what would be implied by the difference between its own and its opponent's offensive and defensive ratings. We can think of these numbers as a logical extension of the home ground advantage notion, but one that recognises not all away grounds are the same for every team.
As we might expect, most teams enjoy positive VPVs at their home ground or grounds. These values are shaded grey in the table. Adelaide, for example, enjoys a +5.1 VPV at the Adelaide Oval, and Port Adelaide a +5.0 VPV at the same ground.
The only teams with negative VPVs at their home grounds are Essendon (-0.9 at the MCG and -0.4 at Docklands), and Richmond (-0.8 at the MCG).
We can see the differential effects of travel and interstate venues on teams by scanning down the columns for each team. Adelaide, for example, faces a -7.1 VPV at the Gabba and a -7.0 VPV at Kardinia, but actually enjoys a +1.5 VPV at Docklands.
The process for estimating VPVs is as follows.
Firstly, VPVs are assumed to be zero for any venue at which a team has played 3 times or fewer. Thereafter, the VPV is calculated as a fraction of the average over- or under-performance ("excess performance") of the team at that venue across the (at most) 100 most recent games.
A team's excess performance is defined as the difference between the actual adjusted margin of victory and the margin of victory we'd have expected based solely on the ratings of the teams involved (ie without adjusting for venue effects).
These excess performances are averaged and then damped (or, in the lingo, "regularised") by taking 45% of the result. This serves to prevent extreme VPV values for venues where teams have played relatively few games with very unexpected results, and reduces the impact of such extreme results even at venues where a team has played regularly.
Again, a small example might help.
Imagine a team that has played 4 games at a particular venue:
The excess performance values are thus -8.0, +31.5, -71.4 and +10.9, and their average is -9.25. We take 45% of this to obtain a VPV of -4.2 for this team at this venue.
MoSSBODS uses a single logistic equation to convert expected victory margins, measured in terms of scoring shots, into victory probabilities. In this respect it probably suffers a little from changes in the typical number of scoring shots recorded in games from different eras, which might make the value of a (say) +4 SS advantage worth more or less at different times.
MoSHBODS uses the insight from this earlier blog on the eras in V/AFL football to split history into six eras:
A separate logistic equation is then fitted to each era (2016 is used as a holdout for the final, current era).
The efficacy of this approach is borne out by the variability in the fitted exponents across the eras, which range from a low of 0.0425 for the 1978 to 1995 period, to a high of 0.0649 for the 1897 to 1923 period. This means, for example, that a predicted 12 point win would be mapped to a 68.5% probability in 1900, but a 62% probability in 1990.
More broadly, as you can see from the diagram below, larger expected victory margins have been associated with smaller victory probabilities as the average levels of scoring have increased over time.
It's curious to me, but all of my ELO rating models have consistently underestimated game margins from the home team's perspective. I feel as though there's something profound about this, but the source of this profundity continues to allude me. In any case, for MoSHBODS, the bias amounts to about 2 points per game.
Analysis shows that this bias stems from overestimating the expected score of away teams during the home-and-away season, so the final adjustment we make in converting MoSHBODS ratings into team score and margin predictions is to subtract a fixed 2 points from the designated away team's score for all home-and-away season games.
Even more curiously, MoSHBODS' total score predictions are superior without this bias adjustment, so this year we'll find that MoSHBODS' official predictions will not add up, in the sense that the predicted home score plus the predicted away score will not match the predicted total score.
Odd, I know, but it turns out that sometimes the best estimator of (A+B) isn't the best estimator of A plus the best estimator of B. Again, seems profound; can't find the nature of it.
So, is MoSHBODS better than MoSSBODS?
Slightly, but demonstrably.
Specifically:
As well, in the only season that is post-sample for both MoSHBODS and MoSSBODS, 2016, MoSHBODS outperformed MoSSBODS on all five of these measures, in many cases by a significant margin.
The comparative results were:
Past performance, of course, is no indication of future performance. But, I'm encouraged ...
Here are MoSHBODS' team ratings as at the end of the 2016 season.
These ratings have the teams ranked quite similarly to MoSSBODS rankings. The major differences are that:
No other ranking is more than a single place different for MoSSBODS and MoSHBODS.
If nothing else, MoSHBODS' predictions will be presented along with MoSSBODS' for season 2017. I've not yet decided whether to use it to inform the three Funds, though that is a consideration, even if only for the Head-to-Head Fund.
I'm genuinely looking forward to see how MoSHBODS performs relative to MoSSBODS and the other MoS Tipsters and Predictors.
More news as it comes to hand ...
]]>My presentation covered the
For anyone curious to know what I presented, I've made a copy of it available here.
Slide 16 is an animated chart, which can't be reproduced in the PDF file, so I've included it below.
]]>The FMI blog has already discussed many of the major similarities and differences in these various assessments. In this blog, all I'd seek to add is, firstly, a tabulation and summary of all those opinions.
From this table I think it's fair to conclude that:
As a final piece, we can quantify the level of agreement between the various Strength of Schedule assessments either in terms of the raw scores or the rankings they provide. In either case, the results are very similar, as you can see from the tables at right.
The highest levels of agreement are registered for the FMI and RoCo (+0.85) assessments, the lowest for MoS and FMI (+0.49), TWF and FMI (+0.54), MoS and RoCo (+0.57), and The Wooden Finger and FMI (+0.59).
All other correlations lie between +0.67 and +0.80, which reflects the generally high levels of agreement between the various assessments.
The differences that we see, such as they are, reflect the different choices that have been made in the methodologies that have been applied. My guess is that, as in so much analytic work, none of the assessments are as good as all of them - so I'd be, at least initially, guided in my final opinion by the "Summary" column in the first table.
]]>So, let's get into it.
The 2017 AFL Draw, released, as is now custom, in late October, once again has all 18 teams playing 22 of a possible 34 games, each missing 6 of the home and 6 of the away clashes that an all-plays-all full schedule would entail. The bye rounds have been handled a little differently this year, with Round 9 having 8 rather than 9 games, Round 11 and Round 13 having 6 games, and Round 12 having 7 games, but in all other superficial respects 2017 looks a lot like 2016.
In determining the 108 games to be excluded the League has once again, in the interests of what it calls "on-field equity", applied a 'weighted rule', which is a mechanism for reducing the average disparity in ability between opponents across the 198 home-and-away games, using the ladder positions of 2016 after the Finals series as the measure of that ability.
So, this year, of the contests that would pit a team from last year's Top 6 against a team from the Bottom 6, only 42 of the 72 (or about 58%) possible pairings are included in the schedule. By contrast, 46 of the 60 (or about 77%) of the possible pairings between the Top 6 teams are included.
More broadly, as you can see from the table, teams within each third play more games against one another than they do against teams from the two other thirds.
Excluding games, however it's done, almost inevitably imbalances a draw in that the combined strength of the opponents faced by any one team across the entire home-and-away season will differ from that of every other team. At face value, the AFL's methodology for trimming the draw seems likely to exacerbate that imbalance - and deliberately so - especially for teams in the top and bottom thirds who will face quite different mixes of team abilities.
In reality, the actual effect of the AFL's schedule truncation on the variability of team schedule strength depends on the degree to which last year's final ladder positions reflect the true underlying abilities of the teams, the spread of ability within each "third" of the competition, and the relative magnitude of venue effects in enhancing or depressing these abilities. Those are the things, of course, that the MoSSBODS Team Rating System is designed to estimate.
This year we'll use MoSSBODS' opinions to to answer the following questions about the schedule:
The first thing we need to estimate a team's schedule strength is a measure of their opponents' underlying abilities. For this purpose we'll use MoSSBODS 2017 Team Ratings, which are set by taking 70% of the final 2016 Ratings, the regression towards zero reflecting the average historical shrinking in the spread of team abilities from the end of one season to the start of the next. These Ratings appear in the table below.
This year sees a few teams ranked more than a couple of spots differently by MoSSBODS compared to their official final ladder position. Adelaide, for example, will start the season as the top-rated MoSSBODS team despite finishing 6th on the final ladder, while the Western Bulldogs find themselves reigning Premiers, but ranked only 4th on MoSSBODS.
In the context of the AFL's competition "thirds", however, no team would be placed in a different third were MoSSBODS to be used rather than the final ladder in defining the boundaries.
The average Combined Rating of teams from each of the thirds is as follows:
So, ignoring Venue Effects, which we'll come to in a moment, the difference between playing an average Top 6 team and an average Bottom 6 team is almost 9 Scoring Shots (SS), or a little over 5 goals. That's about 2 SS, or a bit over a goal, more than the difference was between the Top and Bottom thirds last season.
MoSSBODS also provides estimates of how much better or worse teams, on average, play at each venue. These estimates are known as Venue Performance values, and are a logical extension of the notion of a "home ground advantage" to account for the fact that not all away venues are the same for a given team.
The current Venue Performance values are summarised in the table below for all of the venues being used sometime during 2017. Note that teams need to have played a minimum number of games at a venue before their Venue Performance value is altered from zero (shown as dashes in the table below to improve readability).
Venue Performance values are, like Ratings, measured in Scoring Shots, and are added to a team's underlying MoSSBODS Rating when used in the Strength of Schedule calculation. So, for example, we can say that Geelong, on average, is a +0.31 SS better team than their underlying +4.08 SS Rating when playing at Docklands
One final adjustment is made to the estimated strength of an opponent, this one to reflect the relative impact of any significant travel on the two teams. If a team needs to travel interstate (or overseas) to play a game then a -3 SS adjustment is made to its underlying rating. So, for example, while Adelaide has a +2.32 SS Venue Performance value at Docklands, once the Travel Penalty is taken into account they are actually assessed as a -0.68 SS poorer team when playing at this venue.
After performing this calculation for all 22 games for every team, we arrive at the Strength of Schedule calculations below, within which larger positive values represent more difficult schedules.
The Travel Penalties in the Strength of Schedule calculations work, as you'd expect, to produce net negative Strength of Schedule scores for each team's Home games taken as a whole, and net positive Strength of Schedule scores for the Away games. Taken together, the two Aggregate Nett Travel Penalty columns provide a measure of how kind or cruel the schedule is to a team in terms of overall travel.
On this measure, the Gold Coast, Adelaide, Port Adelaide and Melbourne fare best, their aggregates all coming in at a 9 SS reduction in the overall Strength of Schedule. Carlton, alone, fare worst, their aggregate a 6 SS increase in overall Strength of Schedule. For every other team, the aggregates lie between -3 and +3 SS, so I think it's fair to say that the draw balances this aspect of a national competition fairly well.
In total, Hawthorn is assessed as having the most difficult draw, GWS the second-most difficult, and Fremantle the third-most. Interestingly, GWS and Hawthorn were in the Top 3 last season as well. One contributor to the elevated schedule strengths for both Hawthorn and GWS is that they both play 8 of a possible 10 fixtures against other teams from their Top 6 - a burden they share with Sydney and Geelong.
Adelaide are assessed as having the easiest draw, Gold Coast the second-easiest (they had the easiest last year), followed by Port Adelaide and Richmond. Adelaide's riches include the nett travel benefit mentioned earlier, as well as the fact that they play only 7 of a possible 10 fixtures against teams within their third.
Comparing each team's ranking on Strength of Schedule with the ladder positions used for weighting the draw, three teams stand out: Fremantle and, to a lesser extent, Essendon, because of the relative difficulty of their draws given they come from the bottom third, and Adelaide because of the relative ease of its draw given they come from the top third.
The difference between the hardest and easiest schedules this year amounts to about 23.5 Scoring Shots across the season, which is a tick over 1 Scoring Shot or 3.7 points per game assuming a 53% Conversion rate. That's roughly 0.5 Scoring Shots per game smaller than the difference was assessed at last season. An average advantage of 3.7 points per game is equivalent to about 0.7 to 0.9 extra wins across a 22-game season, depending on the average ability of the teams faced.
If we exclude the teams with the two easiest and hardest schedules, however, the difference shrinks to just 0.6 SS or a little over 2 points per game. That represents less than half a win across the season.
Even a 34 round, all-plays-all draw wouldn't produce identical Strength of Schedule estimates for every team. There are a couple of reasons for this:
As such, if we were to estimate what I'll call the Strength of Missing Schedule, what we'll get won't simply be the difference between what we've already calculated and some common, overall all-plays-all Strength of Schedule figure. Instead, we'll get an estimate of the extent to which the deliberate imbalance in the schedule has benefited or harmed each team relative to what its own best possible Strength of Schedule could be given the inherently distorting differences listed above.
So, let's do that, assuming that all of the unplayed games would have been played on a team's most commonly used home ground. That means Carlton, Collingwood, Hawthorn, Melbourne and Richmond are assumed to play all of their missing home games at the MCG; Essendon, the Kangaroos, St Kilda and the Western Bulldogs at Docklands; Fremantle and West Coast at Subiaco; Adelaide and Port Adelaide at Adelaide Oval; Gold Coast at Carrara; Sydney at the SCG; GWS at the Sydney Showground; Geelong at Kardinia Park; and the Brisbane Lions at the Gabba.
The table below summarises the missing games, denoting with H's those games missed that would have been home games for a team. and as A's those that would have been away games. Note that I've ordered the teams on the basis of their final 2016 ladder positions, the same ordering that was used for implementing the AFL's 'weighted rule'.
The Western Bulldogs, for example, fail to play three of the other Top 6 teams twice during the season, missing out on Geelong, Hawthorn and Adelaide at home. That makes them the only team in the Top 6 to have more than two of their home games against their peers excised from the schedule. Geelong, Hawthorn and Adelaide, by contrast, miss none of their home games against peers. Adelaide also miss three away games against their peers for which they would otherwise incur the Travel Penalty.
Overall though, Geelong suffers most from the schedule truncation, especially in relation to what would otherwise be its full suite of home fixtures. It misses out on home games against 4 of the Bottom 6 teams, including both Essendon and the Brisbane Lions.
GWS fares next-worst, though in its case more for the away games it doesn't get to play, which include 5 of the 9 lowest MoSSBODS-rated teams.
Least disadvantaged are the Brisbane Lions, mostly on account of the away games they miss, which include three against teams with Top 5 MoSSBODS Ratings. The Gold Coast are next-least disadvantaged, they too mostly for missing away games, which include four against teams with Top 8 MoSSBODS Ratings.
The differences in the Strengths of Missing Schedules are much greater than those in the Strength of Schedules. They span a range of almost 38 SS, which equates to a little over a goal a game and 1.2 to 1.5 wins per season.
What's very apparent in this table is the alignment between the teams' final ladder positions from 2016 and the extent to which they have been penalised by the truncation of the draw. The teams with the 5 easiest missing schedules all come from the top third of the ladder, while those with the four most difficult missing schedules come from the bottom third.
Which is of course, exactly as the AFL would have intended.
]]>For fans, even casual ones, AFL Grand Finals are special, and each etches its own unique, defining legacy on the collective football memory.
At another, coarser level though, we can see similarities between different Grand Finals – consider, for example, the 2010 Pies v Saints draw, as well as the 2005 and 2006 Swans v Eagles nail-biters. And then, for contrast, recall the 2007 Geelong v Port Adelaide rout along with last season’s comfortable Hawks victory by 46 points and their similarly comfortable 63 point victory of the year before that.
In some sense then, there are different types of Grand Final – but how many types are there and how should we define them?
There is, of course, no definitive answer to that question, but a statistical technique called cluster analysis allows us to come up with one view. To perform a cluster analysis of Grand Finals you need some way of mathematically calculating the similarity between any two of them in a way that captures meaningful differences.
One metric that does this is the lead, and how it alters across the course of a game. Our classification then will group Grand Finals on the basis of how similar or different was the pattern of lead changes and the magnitude of those leads.
Specifically, we’ll use the following pieces of information:
• Did the eventual winner lead at Quarter Time? At Half Time? At Three-Quarter Time?
• By how much did they lead at Quarter Time, Half Time, Three-Quarter Time and Full Time?
Based on those seven pieces of information, the 117 Grand Finals (excluding the drawn ones because they don’t have an “eventual winner”) can be classified into six fairly distinct types, some characteristics of which I’ve summarised in the following table.
Note that, where this table uses pie charts to denote proportions, the darker the circle the closer is that proportion to 100% and the lighter the circle the closer is that proportion to 0%.
So, for example, the very first circle tells us that 100% of teams in Grand Finals classified as Coast-to-Coast But Mostly Close led at Quarter Time. Also note that the pie charts used for each era, which appear at the far right of the table, sum to 100% if you add down across all Grand Final types for a single era.
With that in mind, here's a little about each Grand Final type:
(Yeah, the names do go swiftly downhill after this one.)
The key characteristic of this Grand Final type is that the winning team tends to lead at every change, but not by much. This makes them, at least in terms of the closeness of the scores, relatively appealing Grand Finals to watch.
There have been 34 Grand Finals of this type in V/AFL history, the most recent in 2013 when Hawthorn led Fremantle at every change by between about 2 and 4 goals and went on to win by just 15 points.
Grand Finals of this type are often won by only narrow margins, 20 of the 34 having been decided by less than 3 goals, and 27 of them by less than 5 goals. The average victory margin is about 3 goals.
Across history, Grand Finals of this type have occurred about 30% of the time, though their frequency has fallen off markedly since the early 1970s as average scores have tended to increase, after which there have been only six (1974, 1976, 1977, 2005, 2006 and 2013).
In this Grand Final type the winning team tends to trail at every change, and always at Three-Quarter Time.
There have been only 9 instances of this type, the most recent in 2009 when St Kilda led Geelong at every change before going down to them by just 12 points at the final siren.
All but two of the Grand Finals of this type were won by less than 3 goals and the type includes a number of particularly memorable Grand Finals. The famous 1970 Grand Final in which Carlton kicked 5.4 to Collingwood's 1.1 in the final term to complete a stunning victory is an archetypical example.
Another, from slightly more recent times, is the 1984 Grand Final in which Essendon trailed Hawthorn by 21, 25 and 23 points at each of the three changes before scoring 9.6 to 2.1 in the final term to win by 4 goals.
Come-from-behind victories have been rare in every era of the V/AFL and the 2009 classic is the only one in the past 30 years. They have the lowest average victory margin of all the types at just over 9 points.
This Grand Final type is characterised by the fact that the eventual winner leads at the first and the final change, but almost always trails at Half Time.
There have been 15 Grand Finals of this type, five of them occurring since 1980 (1982, 1997, 2001, 2004 and 2011).
The most recent example was the 2011 Grand Final in which the Cats led the Pies by a single point at Quarter Time but lost the Second Quarter to trail by 5 points at the main break. They then led by 7 points at Three-Quarter Time before going on to prevail by just over 6 goals.
Final margins in these Grand Finals have tended to be small, with about half of them (7 of 15) decided by less than 3 goals and about three-quarters of them (11 of 15) decided by less than 5 goals. The average victory margin has been just over 3 goals.
In all Grand Finals of this type the winning team has trailed at Quarter Time but at neither of the changes thereafter.
There have been 19 GFs of this type, including seven since the mid-1980s (1987, 1991, 1994, 1996, 2002, 2008, and 2012).
In that 2008 game the Hawks trailed the Cats by 1 point at the first change then steadied to lead by 3 points and then 17 points at the second and third changes respectively. They won, eventually, by 26 points.
The 2012 edition saw the Swans trail the Hawks at Quarter Time by 19 points then kick 6.0 to 0.1 in the Second Quarter to lead at Half Time by 16 points. They then clung on to lead by a single point at Three-Quarter Time before going on to register a 10 point victory.
This type of Grand Final includes a number of large victories, over one half of them (10 of 19) decided by 5 goals or more, those last two examples notwithstanding. The average victory margin in this type of Grand Final is 32 points.
These are the Grand Finals that disappoint the emotionally uninvested fan most of all. The key characteristic is that the winning team leads at every change and tends to win going away.
History has produced 34 of these GF types, 16 of them since 1980, coinciding with the trend towards higher scoring (1980, 1983, 1985, 1986, 1988, 1989, 1990, 1993, 1995, 1999, 2000, 2003, 2007, 2010, 2014 and 2015).
Last year’s Grand Final, which saw the Hawks lead the Eagles by 19 points, 31 points, and then 50 points at each of the changes, before going on to win by 46 points, is a quintessential example of the genre.
Almost 80% of Grand Finals of this type (27 of 34) have been won by 6 goals or more, and about 55% (19 of 34) by 8 goals or more. The average victory margin is almost 52 points.
In Grand Finals of this type, the winning team trails at Quarter Time and Half Time but wrests the lead by Three-Quarter Time and doesn't surrender it.
There have been only 6 Grand Finals of this type in history, and none since 1998 in which year Adelaide trailed North Melbourne by 8 points at Quarter Time, and 24 points at Half Time, before rallying in the Third Quarter to lead by 2 points at the end of it and then running away with the game in the final term to win by just under 6 goals.
In these games the winning teams tend to heavily dominate the second halves, to such an extent that the average victory margin in Grand Finals of this type is just over 6 goals, and only two of them have been won by less than 5 goals.
The two most common types of Grand Finals are the Coast-to-Coast But Mostly Close and Coast-to-Coast Blowouts, defining characteristics of which are that the winning team leads at the end of most if not every quarter.
In fact, if you look at the all the quarter-end changes across all 117 Grand Finals, the team that ultimately won has led at the end of:
Looking just at the Grand Finals since 1990, the percentages for First (65%) and Second Quarters (73%) are lower, but a startling 96% of eventual winners have led at the final change.
The only team from that period to come back from a deficit at the end of the Third Quarter was Geelong in the 2009 Come-From-Behind Grand Final covered earlier.
By way of comparison, across the 206 games of 2016 so far, the ultimate winner has led at the end of the First Quarter 71% of the time, at the end of the Second Quarter 86% of the time, and at the end of the Third Quarter also 86% of the time.
So, as in regular home and away season games, teams leading at the changes in Grand Finals have very often gone on to win, but, particularly in recent Grand Finals, leading at the final change has proven especially predictive of the ultimate outcome.
Not every Grand Final fits one of the six descriptions from earlier perfectly, but each Grand Final is, in a mathematical sense, closer to one of these six types than to any of the others. It’s on that basis that the 117 Grand Finals have been categorised.
In the four charts that follow, for each of these 117 Grand Finals I’ve shown the lead (or deficit) for the eventual winner as at the end of a particular quarter.
The years are colour coded by Grand Final type and their labels have been offset, where necessary, to avoid overlap. Where that’s occurred a line has been drawn to the point where the label should actually be (ie at the correct margin).
These charts show the range of leads and deficits for each of the Grand Final types at the end of each quarter, and also reveal the categorisation of every Grand Final.
We can employ other techniques to fit a statistical model to previous Grand Final results to estimate how likely it is that we’ll see any of the six different types this year.
A reasonable hypothesis is that the likelihood of each type depends on the relative abilities of the teams taking part. It’s less likely, for example, that we’ll see a Coast-to-Coast Blowout if the two teams are especially well-matched.
To create a model then, I’ve chosen as inputs the estimated offensive and defensive strengths of the teams facing off in the Grand Final. As well, bearing in mind the variability in the relative frequency of each Grand Final type across the eras as shown in the earlier table, I’ve also included the era in which the Grand Final was played.
The final model (which, for the technically curious, is a Generalised Boosted Model) seems to fit history fairly well and estimates the probabilities for the 2016 Grand Final as follows (the frequencies for era from 1995 to 2015 are shown in brackets):
So, the model suggests that a Coast-to-Coast But Mostly Close game is more likely and a Coast-to-Coast Blowout less likely than a naïve scan of recent history would otherwise suggest.
That seems sensible when we consider the relatively similar demonstrated abilities of the Swans and the Dogs, especially in recent weeks, and the fact that both are better known for their defensive than for their attacking qualities, which makes high-scoring (and hence blowouts) less likely.
That said, the model still (narrowly) rates a Coast-to-Coast Blowout as the most probable scenario.
Let’s hope it’s wrong …
(UPDATE: Subsequent to finalising this piece for the Guardian, I came up with another chart that shows the score progressions at the end of each quarter for all the Grand Finals of each type. It appears below.)
]]>The piece was huge fun to write and, to be honest, I'm not sure I'll ever top "Coast-to-Coast But Mostly Close".
Feedback, as always, welcomed.
]]>
As far as MoSSBODS is concerned, those probabilities are as per the table at right.
Some of the entries in this table might, at first, seem perplexing - why, for example, do the Dogs start as slight favourites over the Swans? To understand why this is the case we need to recall the Travel Penalty that MoSSBODS imposes on all teams playing a match out of their home State against an opponent in its home State.
That 3 Scoring Shot penalty is relevant for both of this week's games as well as for the possible Geelong v GWS, and Sydney v Western Bulldogs Grand Finals where the two Sydney-based teams would incur the Penalty.
Whether or not that imposition should apply in Finals is an interesting (and ultimately empirical) question, and one I'll be explicitly considering in the off-season as I develop a challenger for the MoSSBODS Rating System.
For now though, we're stuck with it. We can get a sense of how relatively important it is by breaking down the components of MoSSBODS' net ratings for each matchup, which I've done in the table at left.
The top section provides details for each of the teams' Combined Rating, as well as their Venue Performance values for the relevant grounds. The bottom section uses these values, plus the Travel Penalty where applicable, to come up with net ratings for the Preliminary Finals and the four possible Grand Finals.
Geelong is shown as a 3.5 Scoring Shot (SS) favourite against Sydney at the MCG, for example, based on the following calculation:
(Geelong Combined Rating + Geelong Venue Performance at the MCG) LESS (Sydney Combined Rating + Sydney Venue Performance) at the MCG PLUS Sydney Travel Penalty.
In numerical terms, that is: (6.4 + 0.7) - (6.9 - 0.3) +3 = +3.5 SS
Geelong, we can see, would still be favourites,but only by 0.5 SS, if not for the Travel Penalty.
The figures for the other matchups can be calculated in a similar fashion and reveal that the Dogs' putative favouritism over the Swans in the Grand Final comes solely from the Travel Penalty. Without it, the Dogs would start as 2.3 SS or about 8 point underdogs.
So, if we apply the game probabilities just described, we obtain the Flag probabilities shown in the table at right.
Compared to the current TAB market, these rate the Cats' chances more highly and the Giants', Swans' and Dogs' less highly. In fact, at the TAB price of $3, the Cats represent significant value if their true Flag chances are 40%.
Lastly, we can also look at the probabilities for the four potential Grand Final pairings, which we do in the table at left.
It suggests that there's almost a 50% chance that the Cats will meet the Giants in October, making this pairing, at $2.39, the only one currently offering any value on the TAB.
]]>
For the numerical details informing this chart we can, instead, inspect the heat map, whihc appears below.
When we compare these GF assessments with the current TAB market it's clear that MoSSBODS has quite different opinions about the relative strengths of many of the remaining teams. Most saliently, Geelong ($3.25) and Adelaide ($11) represent significant value on that market, Adelaide especially so given that MoSSBODS rates them as almost 13% chances for the Flag. GWS at $3.25 also carry a slightly positive expected return, but the edge is less than our customary minimum threshold of 5%.
If we look at each of the nine remaining possible GF matchups, the simulation offers the data summarised in the table at right.
When first I reviewed this table, I admit to being a bit perplexed about the fact that, in light of the most recent MoSSBODS Team Ratings, the Dogs and the Hawks were both favoured to beat the Swans in a GF.
But, then I remembered that MoSSBODS credits a 3 SS advantage to any team playing in its home state against a team from interstate, and I realised why it was the case. One of the things I'm keen to do in the off-season is finesse the Travel Penalty imposed on teams - in Finals and otherwise - but, for now, with some empirical justification, that is at it is. Consequently, the Cats, Dogs and Hawks enjoy an advantage in MoSSBODS' eyes, should they face Adelaide, GWS or Sydney in the Granny. Consequently, for example. the Dogs are about 56% and the Hawks 54% favourites over the Swans at the MCG.
So, as MoSSBODS sees it, the only pairing currently offering a sufficiently attractive positive expected return on TAB markets for the GF Quinella is Adelaide v Western Bulldogs at $51, though the Geelong v GWS, and Adelaide v GWS matchups also represent positive, if small expected returns.
]]>This time, I blathered on about the difference between AFL Finals and home and away games.
Once more, I'm indebted to the Deputy Sports Editor, Russell Jackson.
Ta mate.
]]>What effects do the heightened consequences of victory and defeat have on the way that Finals are played, and how do these show up in the scoring and the results?
In today’s piece we’ll be drawing comparisons between Finals and regular home and away season games across the entirety of VFL/AFL history.
Because relatively few Finals are played each season, to make the comparisons more meaningful in a statistical sense we’ll group seasons into eras – six of them, each including between 15 and 25 individual seasons. Using these groupings, as shown in the table, we end up with at least 75 Finals in every era, which should be enough to provide some reliability to our analyses.
Finals tend to pit more evenly-matched teams against one another, so it’s not entirely surprising to find that, across history, victory margins have tended to be larger in home and away games than in Finals from the same era.
In the modern era, which we’ve defined as the period since 2000, the average victory margin in Finals has been about 2.5 points lower than in home and away season games (33.4 v 36.0 points).
What’s also apparent from this chart is the overall reduction in the size of a typical victory in the 2000-2015 period compared to 1980-1999, both in Finals and in home and away season games. A typical victory margin in home and away game in the 1980-1999 era was 37.5 points, and a win in a Final 36.8 points.
The reduction in the average victory margin has come with a similar reduction in the prevalence of “blowout” victories, defined here as wins by more than 10 goals.
In the 2000-2015 era only about 19% of all home and away games and 15% of Finals have been a blowout under this definition. That’s down from 19% and 21% respectively in the 1980-1999 era. Earlier eras, in which scoring generally was lower, saw very few victory margins of over 10 goals.
Smaller victory margins and generally fewer blowouts in Finals compared to home and away games have come with less total scoring in Finals, too.
Whilst this has been true in every era, the gap has been largest in the most recent era where Finals have, on average, produced about 2 goals fewer than a typical home and away season game (175 vs 187 points). In the previous two eras, by comparison, the gaps were only about 3 points in 1980-1999 (196.1 vs 199.4) and less than half a point in 1960-1979 (175.3 vs 175.6).
Some of that larger difference for the most recent era can be attributed to the smaller number of scoring shots that have been generated in Finals (48.7 v 50.9), but some of it also comes down to the fact that teams have been generally less accurate in converting those scoring shots into goals in Finals than they have been in home and away games (51.9% v 53.6%).
This reduction in accuracy could plausibly be the result of the pressure of Finals football affecting teams’ accuracy for even relatively simple set-shots, but it might also stem from an increase in the average difficulty of shots created in Finals compared to those created in home and away games. Put simply, you’re probably less likely to have a shot from 25m out straight in front in a Preliminary Final in September than you are in a Round 1 game in March.
Previous analyses have shown that teams with strong offences facing teams with weak defences will tend to generate better quality scoring opportunities and hence will be more accurate. The fact that we’ll see more games with this sort of mismatched offensive and defensive strengths during the home and away season than we will during the Finals might be what ramps up the average difficulty of shots in this latter part of the season and drags down accuracy.
One way of attempting to control for the overall greater variability in opposing teams’ abilities in the home and away season is to look only at home and away season games played between teams that subsequently made the Finals.
If we do that we find that there is only a fractional reduction in the size of the gap between accuracy in Finals and accuracy in the home and away season in the modern era. In games pitting finalist v finalist in the home and away seasons of 2000 to 2015, the average accuracy was 53.4%, which is only 0.1% lower than the accuracy for all home and away games during that period.
So, we’re still left with about a 1.6% difference in the accuracy of finalists when they meet in the Finals compared to when they met in the home and away season.
Some of that difference might be explained by differences in where or when the games were played – teams, for example, are much more accurate at Docklands than at the MCG, and slightly more accurate in day games than in afternoon and night games – but pressure remains a plausible explanation for at least some of this difference. It might also be the case that teams play a more defensive style in Finals so that even when the same two teams meet, the average quality of scoring opportunity reduces in Finals.
Before we move on let’s briefly review the team-by-team data on accuracy in home and away games compared to accuracy in Finals for the modern era, bearing in mind that a number of teams have played relatively few Finals during this period, which makes their overall accuracy subject to large sample variation. (The numbers appearing above each bar represent the number of games played by that team of that type.)
We see that every team except Melbourne, North Melbourne and Richmond have exhibited the general pattern of being more accurate in the home and away season than in the Finals, though for most teams the differences are quite small – less than the all-team 1.6% figure.
For five teams, however the differences are larger:
• Western Bulldogs: 7.0% (47.8% vs 54.8%) – 13 finals
• West Coast: 4.3% (49.5% vs 53.8%) – 20 finals
• Geelong: 3.0% (50.9% vs 53.9%) – 27 finals
• Fremantle: 2.9% (50.4% vs 53.3%) – 15 finals
• St Kilda: 2.3% (52.2% vs 54.5%) – 17 finals
One notable feature of four of these five teams is that their home and away season accuracy is above the all-team average of 53.6%, and their Finals accuracy is below the all-team average of 51.9%. St Kilda is the only exception.
Since, as we’ve noted a few times now, Finals tend to pit teams of more equal ability against one another, it seems plausible that we might see more upsets in Finals than in home and away games (where we’ll define an “upset” as a win by the team assessed, pre-game, as being less likely to win according to a fairly simple team rating system ).
That has, indeed, been the case in every era except the most recent one where favourites have lost about equal proportions of home and away games and of Finals (viz 30%).
That proportion has remained fairly constant for home and away games across the last few eras, but has fallen dramatically in Finals from a figure of over 40% in 1960-1979, to about 35% in 1980-1999 before reaching its current 30%.
What’s especially interesting about the modern era, however, is how much more likely it is that there will have been upset results in games involving the subsequent finalists when they met earlier in the season.
In excess of 40% of these games resulted in the less-highly rated team emerging victorious.
Clearly, the motivation of Finals and the style of football that is played in them is enough to materially enhance the prospects of the stronger team, however small might be its superiority.
What this implies, of course, is that home season results in games played between finalists might not necessarily be a good guide to Finals.
To explore this issue let’s look just at Qualifying and Elimination Finals and use the following simple rule:
Employing that approach for the 64 Finals across the 2000 to 2015 period would have yielded a paltry 57% record, in some seasons getting as many as three of the four results incorrect. Only once, in 2011, would it have resulted in a four from four performance.
If we look just at the modern era of football from 2000 to 2015, we find that, compared to a typical home and away game, a Final will typically:
Let's start - a little unusually for me - with a few charts. They'll provide some context for the more data-dense tables to follow and, hopefully, tip in a little interest of their own.
The first chart covers the period from 2011 to 2016 and plots the offensive and defensive MoSSBODS ratings of every team as at the end of the home and away season of that year.
Teams labelled in green were finalists, the ones with the slightly larger typeface being the eventual Premiers. The red dot marks the average offensive and defensive ratings of the collective finalists and the teams labelled in coral were the ones to miss the finals
Looking firstly at some of the macro-features of this chart we can see that, as we'd expect, finalists tend to cluster nearer the top right of the charts, non-finalists nearer the bottom left. As well, the average rating of finalists appears to be relatively constant across the seasons (though more on this a little later).
Now, a few observations on some micro-features. Firstly, the separation of seven of the eight finalists in 2011 is striking, the straggler being the Essendon team, which finished 2011 in eighth spot with a percentage of exactly 100. Also notable is the positioning of Hawthorn across the seasons - always with a strongly positive offensive rating.
West Coast's high rating in 2014 is also a feature. They finished 9th in that season, just a game out of the 8, ending the season with a couple of games where they racked up large numbers of scoring shots. As well, Port Adelaide look a little unlucky in 2015, they too finishing 9th and just a game out of the 8 after reeling off a succession of high scores at the back end of the year.
The two charts that follow reach back slightly further in time, each spanning a unique six-year period. Please click on them to access larger versions.
I'll leave a detailed review of these charts to the interested reader and just suggest a few aspects you might want to consider as you compare and contrast the plots for individual seasons:
Now, on the topic of the expected evenness of the finals series, another measure of this would be the range of ratings of the participants, which is one of the metrics I've included in the table below.
On that measure, and looking just at the Combined ratings, the 2003 finalists were the most evenly-matched since 2000. In that season Sydney, who finished 4th, ended the home and away season rated +1.2, while the Minor Premiers, Port Adelaide were rated +4.9. The competition ladder at the end of the home and away season in that year had Collingwood in 2nd separated from Essendon in 8th by just 2 wins.
The range of Combined ratings in 2016 is considerably higher at 6.9 scoring shots (SS) though is still the 3rd-lowest in the period.
The average quality of the finalists is quite high this year (see the leftmost block of data) and continues a trend that began in 2011 for the finalists to have average Combined ratings of around +4 or higher. Bear in mind, however, that Combined ratings always sum to zero, so while it might be true that this year's finalists are relatively strong compared to, say, the finalists of 2002, it might equally be the case that the current crop of non-finalists is relatively weak.
Lastly, the data on the right of the table provides the rating details of the Premiers and Runners Up in every season. In this data it's interesting to note that only one team (Geelong in 2009) have finished as Premiers having ended the home and away season ranked below 3rd by MoSSBODS amongst the finalists on Combined rating.
Runners Up have come from a much wider range of rankings, though no team ranked 8th on Combined rating amongst the finalists has gone on to finish as Runner Up.
]]>A little context. Some of you will know that as well as building statistical models to predict the outcome of AFL contests as a part-time (sic) hobby, I do similar things during the week, applying those same skills to problems faced by my corporate clients. These clients might, for example, want to identify customers more likely to behave in a specific way - say, to respond to a marketing campaign and open a new product - classify customers as belonging to a particular group or "segment", or talk to customers most at risk of ending their relationship with the client's organisation.
There are parallels between the processes used for modelling AFL results and those used for modelling consumer behaviours, and between the uses to which those models are put. Both involve taking historical data about what we know, summarising the key relationships in that data in the form of a model, and then using that model to make predictions about future behaviours or outcomes.
In football we seek to find relationships between teams' historical on-field performances - who it was that the teams played, where and when, and how they fared - and to generalise those relationships in the form of a model that can be applied to the same or similar teams in future games. With consumers, we take information about who they are, what products and services they use, and how they have used or stopped using those products and services in the past, generalise those relationships via a model, then use that model to predict how the same or similar consumers will behave in the future.
For both domains that seems a perfectly reasonable approach to take to prediction. Human beings do much the same thing to make their own predictions, albeit intuitively and usually not as systematically or thoroughly. We observe behaviours and outcomes, extract what we think are relevant and causal features of what we observe, generalise that learning and then apply it - sometimes erroneously - to what we perceive to be similar situations in the future. This is how both prejudice and useful generalisations about people and the world are formed. Even intuition, some suggest, is pattern-recognition rendered subconscious.
Now the statistical models we build with data aren't perfect - as George E P Box noted, "... all models are wrong, but some are useful" - but then neither are these heuristic "models" that we craft for ourselves to guide our own actions and reactions.
Which brings me to my first observation. Simply put, errors made by humans and errors made by statistical models seem to be treated very differently. Conclusions reached by humans using whatever mental model or rule of thumb they employ are afforded much higher status and errors in them, accordingly, forgiven much more quickly and superficially than those reached by a statistical model, regardless of the objective relative empirical efficacy of the two approaches.
This is especially the case if the statistical model is of a type sometimes referred to as a "black box", which are models whose outputs can't be simply expressed in terms of some equation or rule involving the inputs. We humans seem particularly wary of the outputs of such models and impervious to evidence of their objectively superior performance. It's as if we can't appreciate that one cake can taste better than another without knowing all of the ingredients in both.
That's why, I'd suggest, I'll find resolute resistance by some client organisations to using the outputs of a model I've created to select high-probability customers to include in some program, campaign or intervention simply because it's not possible to reverse-engineer why the model identified the customers that it did. Resistance levels will be higher still if the organisation already has an alternative - often unmeasured and untested - rule-of-thumb, historical, always-done-it-that-way-since-someone-decided-it-was-best basis for otherwise selecting customers. There's comfort in the ability to say why a particular customer was included in a program (or excluded from it), which can override any concern about whether greater efficacy might be achieved by choosing a better set of customers using some mysterious, and therefore untrustworthy statistical model.
It can be devastating to a model's perceived credibility - and a coup for the program 'coroner', who'll almost certainly have a personal and professional stake in the outcome - when a few of the apparently "odd" selections don't behave in the manner predicted by the model. If a customer flagged by a model as being a high defection risk turns out to have recently tattooed the company logo on their bicep, another example of boffin madness is quickly added to corporate folklore.
I find similar skepticism from a smaller audience about my football predictions. No-one thinks that a flesh-and-blood pundit will be capable of unerringly identifying the winners of sporting contests, but the same isn't true for those of us drawing on the expertise of a a statistical model. In essence the attitude from the skeptical few seems to be that, "if you have all that data and all that sophisticated modelling, how can your predictions ever be incorrect, sometimes by quite a lot?" The subtext seems to be that if those predictions are capable of being spectacularly wrong on some occasions, then how can they be trusted at all? "Better to stick with my own opinions however I've arrived at them", seems to be the internal monologue, "... at least I can come up with some plausible post-rationalisations for any errors in them".
But human opinions based on other than a data-based approach are often at least as spectacularly wrong as those from any statistical model, but these too either aren't subjected to close post-hoc scrutiny at all, or are given liberal licence to be explained away via a series of "unforeseeable" in-game events.
Some skeptics are of the more extreme view that there are things in life that are inherently unpredictable using a statistical approach - in the sense of "incapable of being predicted" rather than just "difficult to predict". The outcome of football games is seen by some as being one such phenomenon.
Still others misunderstand or are generally discomfited by the notion of probabilistic determinism. I can't say with certainty that, say, Geelong will win by exactly 12 points, but I can make probabilistic statements about particular results and the relative likelihood of them, for example claiming that a win by 10 to 20 points is more likely than one by 50 to 60 points.
With this view often comes a distrust of the idea that such forecasts are best judged in aggregate, on the basis of a number of such probabilistic assessments of similar events, rather than on an assessment-by-assessment basis. It's impossible to determine how biased or well-calibrated a probabilistic forecaster is from a single forecast, or even from just a few. You need a reasonably sized sample of forecasts to make such an assessment. If I told you that I was able to predict coin tosses and then made a single correct call, I'm sure you wouldn't feel that I'd made my case. But if I did it 20 times in a row you'd be compelled to entertain the notion (or to insist on providing your own coin).
None of this is meant to deride the forecasting abilities of people who make their predictions without building a formal statistical model - in some domains humans can be better forecasters than even the most complex model - but, instead, is an observation that the two approaches to prediction are often not evaluated on the same, objective basis. And, they should be.
As modellers, how might we address this situation then? In broad terms I think we need to more effectively communicate with clients who aren't trained as statistical modellers and reduce the mystery associated with what we do.
With that in mind, here are some suggestions:
1. Better Explain the Modelling Process
In general, we need to better explain the process of model creation and performance measurement to non-technical consumers of our model outputs, and be willing to run well-designed experiments where we compare the performance of our models with those of the internal experts, with a genuine desire to objectively determine which set of predictions are better.
"Black-box" models are inherently harder to explain - or, at least, their outputs are - but I don't think the solution is to abandon them, especially given that they're often more predictively capable than their more transparent ("white box"?) alternatives. Certainly we should be cautious that our black-box models aren't overfitting history, but we should be doing that anyway.
Usually, if I'm going to build a "black-box" model I'll build a simpler one as well, which allows me to compare and contrast their performance, discuss the model-building process using the simpler model, and then have a discussion to gauge the client's comfort level with the harder-to-explain predictions of the "black-box" model.
(Though the paper is probably a bit too complex for the non-technical client, it's apposite to reference this wonderful Breiman piece on Prediction vs Description here.)
2. Respect and Complement (... and compliment too, if you like) Existing Experts
We need also to be sensitive to the feelings of people whose job has been the company expert in the area. Sometimes, as I've said, human experts are better than statistical models; but often they're not, and there's something uniquely disquieting about discovering your expertise, possibly gleaned over many years, can be trumped by a data scientist with a few lines of computer code and a modicum of data.
In some cases, however, the combined opinion of human and algorithm can be better than the opinion of either, taken alone (the example of Centaur Chess provides a fascinating case study of this). An expert in the field can also prevent a naive modeller from acting on a relationship that appears in the data but which is impractical or inappropriate to act on, or which is purely an artifact of organisational policies or practices. As a real life example of this, a customer defection model I built once found that customers attached to a particular branch were very high defection risks. Time for some urgent remedial action with the branch staff then? No - it turned out that these customers had been assigned to that branch because they had loans that weren't being paid.
3. Find Ways of Quantitatively Expressing Uncertainty
It's also important to recognise and communicate the inherent randomness faced by all decision-makers, whether they act with or without the help of a statistical model, acknowledging the fundamental difficulties we humans seem to have with the notion of a probabilistic outcome. Football - and consumer behaviour - has a truly random element, sometimes a large one, but that doesn't mean we're unable to say anything useful about it at all, just that we're only able to make probabilistic statements about it.
We might not be able to say with complete certainty that customer A will do this or team X will win by 10 points, but we might be able to claim that customer A is 80% likely to perform some behaviour, or that we're 90% confident the victory margin will be in the 2 to 25 point range.
Rather than constantly demanding fixed-point predictions - customer A will behave like this and customer B will not, or team X will win by 10 points - we'd do much better to ask for and understand how confident our forecaster is about his or her prediction, expressed, say, in the form of a "confidence interval".
We might then express our 10 point Home team win prediction as follows: "We think the most likely outcome is that the Home team will win by 10 points and we're 60% confident that the final result will lie between a Home team loss by 10 points and a Home team win by 30 points". Similarly, in the consumer example, we might say that we think customer A has a 15% chance of behaving in the way described, while customer B has a 10% chance. So, in fact, the most likely outcome is that neither customer behaves in the predicted way, but if the economics stacks up it makes more sense to talk to customer A than to customer B. Recognise though that there's an 8.5% chance that customer B will perform the behaviour and that customer A will not.
Range forecasts, confidence intervals and other ways of expressing our uncertainty don't make us sound as sure of ourselves as point-forecasts do, but they convey the reality that the outcome is uncertain.
4. Shift the Focus to Outcome rather than Process
Ultimately, the aim of any predictive exercise should be to produce demonstrably better predictions. The efficacy of a model can, to some extent, be demonstrated during the modelling process by the use of a holdout sample, but even that can never be a complete substitute for the in-use assessment of a model.
Ideally, before the modelling process commences - and certainly, before its outputs are used - you should agree with the client on appropriate in-use performance metrics. Sometimes there'll be existing benchmarks for these metrics from similar previous activities, but a more compelling assessment of the model's efficacy can be constructed by supplementing the model's selections with selections based on the extant internal approach, using the organisation's expert opinions or current selection criteria. Over time this comparison might become unnecessary, but initially it can help to quantify and crystallise the benefits of a model-based approach, if there is one, over the status quo.
******
If we do all of this more often, maybe the double standard for evaluating human versus "machine" predictions will begin to fade. I doubt it'll ever disappear entirely though - I like humans better than statistical models too.
]]>My presentation was mostly about the MoSSBODS Team Rating System, with a few visualisations thrown in to provide a little more colour and movement.
For anyone curious to know what I presented, I've made a copy of it available here.
]]>One Sunday morning late in July I was contacted by the Deputy Sports Editor at the Guardian, Russell Jackson, asking if I could finish off my regular blog projecting the remainder of the home and away season and the Finals in time for that evening's deadline.
Given that the last game was to finish around 7:30pm and the usual script took 5 hours to run, it required a little finessing (and the kindness of Russell to come in on his day off and work until very late to edit my copy and make all the magic happen), but in the end, we got there.
I've been advised by Russell - as a general principle and not, I'm letting myself believe, purely in relation to this particular piece - to avoid reviewing the comments under the piece. So far, despite significant temptation, I've not done so ....
Anyway, thanks to Russell, MoS now has its first piece in the mainstream press with my byline. Feels weird - but kinda nice.
]]>
From MoSSBODS' point of view, the Crows are now, in fact, only more likely to win the Flag than the Eagles, Dogs and Roos.
More specifically, MoSSBODS now rates the Crows as about 10% chances for the Flag and has the Cats as about 32% favourites ahead of Sydney (25%), GWS (18%), and Hawthorn (11%). At current TAB prices that still makes the Crows value at $11, but also Geelong at $3.75 and GWS at $6.
In terms of the method of elimination from the Finals, Sydney, Geelong, Hawthorn and GWS are all most likely to go out in a Preliminary Final, though Sydney is also more likely to play in the Grand Final than to miss it.
Adelaide and West Coast are both most likely to bow out in a Semi-Final, and the Western Bulldogs and the Kangaroos to bow out (well bow-wow out in the Dogs' case, I guess) in their Elimination Finals.
The most likely Grand Final matchup, according to MoSSBODS, sees the Cats face the Swans, though this pairing occurred only fractionally more often than a repeat of this weekend's Swans v GWS game.
Only two other pairings occurred in at least 10% of simulation replicates - Geelong v GWS (15%) and Geelong v Hawthorn (10%).
All of the 26 possible pairings appeared in at least 1 of the 100,000 replicates, though the Cats v Roos, Hawks v Roos, Eagles v Roos, and Dogs v Roos pairings all showed up in fewer than 100 of them.
Given the probabilities shown here the only quinellas that represent value at current TAB prices are:
Amongst the pairings that have at least a 5% chance of occurring according to the simulations, the most closely-fought Grand Final would have the Swans facing the Hawks (where Sydney would be 51% favourites), and the least well-matched would see the Cats facing GWS (where the Cats would be 66% favourites).
]]>All of the ranks are colour-coded, green if the ranking is 3 or more spots higher than a team's ladder position, red if it's 3 or more spots lower, and orange otherwise.
I'll leave closer inspection to the interested reader but just make a few broad observations:
Across all of the teams, the final rank correlations between ladder position and ranking on these metrics were:
In the race for the Minor Premiership, for example, the TAB has Adelaide priced at $7, which makes MoSSBODS' 18% assessment of the Crows' chances look extremely attractive. But, for the Crows to finish 1st, realistically we'd need:
MoSSBODS reckons that scenario would occur about two times in 11 were the weekend's games to be played many times over. The TAB Bookmaker, alternatively, is banking that it would occur less often than about one time in 15.
Similarly, in the hunt for Top 4 spots, MoSSBODS rates the Dogs as about 12% chances while the TAB Bookmaker has them priced at $26. A Dogs Top 4 finish requires:
As well, the Dogs' victory margin needs to be sufficiently large to drive its percentage above the Hawks'. If we assume that the Hawks were to lose, say 70-80, the Dogs would need to win by about 25 points to achieve this.
MoSSBODS' higher probability for this scenario is due to a combination of:
Those individual probabilities, assuming independence, make the joint probability of the four requisite results 13.5% for MoSSBODS and 2.5% for the TAB. To arrive at a final probability for the Dogs' making the Top 4 requires one last assessment about the margins in the Hawks' and Dogs' games being sufficient to see the Dogs' percentage rise above the Hawks'.
In the end, MoSSBODS' assessment of that event sees the probability reduced to about 12%. Presumably, the TAB's estimate would, instead, be something like 2%, which makes that $26 price look very unattractive.
Turning away from considerations about only the top teams, let's have a quick look at this week's Dinosaur Chart.
We see here that most teams are now strong favourites for one or maybe two ladder finishes, though GWS and Hawthorn are notable for the broader range of plausible ladder finishes accessible by them.
That observation is even clearer in the team heat map, which appears below.
A Sydney-Adelaide 1-2 finish appeared in two-thirds of the simulation replicates this week, and an Adelaide-Sydney finish cropped up in about 11% more, split by a Sydney-Geelong 1-2 finish, which occurred in almost 12%
In all, Sydney appeared in six of the Top 10 pairings, Adelaide and Geelong in five, Hawthorn in three (and always in 2nd), and GWS in one.
Combined these Top 10 pairings account for about 99.5% of all replicates.
The 10 most common Top 4s account for a smaller proportion of the replicates - about 80% - with the commonest of all, a Sydney/Adelaide/Geelong/Hawthorn finish, representing over one-third of them.
Swapping GWS for Hawthorn in 4th accounts for another 10%, and swapping the Western Bulldogs for Hawthorn another 9%.
GWS finishes as high as 3rd in one of the quartets, and Hawthorn does the same in five of them. The Dogs appear in only one quartet - the one mentioned in the previous paragraph - and this single ordering accounts for 9% of their 11.9% assessed chances.
The remaining uncertainty over most of the spots in the Top 8 means that a relatively large number of final orderings remain possible, the most likely of which, according to the simulation, is Sydney/Adelaide/Geelong/Hawthorn/Western Bulldogs/GWS/West Coast/Kangaroos. That specific ordering appeared in about 1 replicate in 7.
Another, almost equally-common ordering sees the Dogs and Giants switch 5th and 6th, this version appearing about 13% of the time.
Lastly, if we focus solely on positions 5th to 8th, we find that the Top 10 orderings account for just over 90% of replicates. The most common finish represents almost one-quarter of replicates and is Dogs/Giants/Eagles/Roos.
It's interesting to note the high level of variability about 5th spot on the ladder that is demonstrated by these projections, with six different teams appearing in 5th in at least one of the Top 10 orderings. Only Sydney and the Kangaroos from the almost-certain final Top 8 fail to appear in these orderings at least once.
The simulation results for the Finals also highlight MoSSBODS' different assessments of team abilities relative to the TAB.
Adelaide is now almost a 50% chance for the Flag, according to MoSSBODS, though their price on the TAB is currently $3.50, making them seemingly very attractively priced in this market.
The TAB, instead, has Sydney as Flag favourites, priced at $3.25, just a little shorter than the Crows. Geelong and Hawthorn are both on the third line of betting at the TAB and priced at $5.50, though MoSSBODS has the Cats as 17% chances and the Hawks as only 7% - and only slightly more likely to win the Flag than GWS.
Adelaide's most likely finish according to MoSSBODS is a Grand Final win, Sydney's, Geelong's and Hawthorn's a loss in a Preliminary Final, GWS' a loss in a Semi-Final, and the Western Bulldogs', West Coast's and the Roos' a loss in an Elimination Final, although a Dogs loss in a Semi-Final is about as likely.
The Crows, as shown in the table below and which is drawn from the simulations, are about equally likely to win the Flag should they finish in any of the Top 4 spots, though slightly less so if they finish in 3rd or 4th. They would see their prospects halved in the unlikely event that they finished in 5th, an outcome that appeared in only 3% of replicates.
Such an outcome would require that:
Even using current TAB head-to-head prices, that scenario has only about an 11% chance of occurring.
While the Crows' chances roughly halve if they miss the Top 4, every other team sees their Flag chances roughly quarter should they do the same.
Finally, let's review the results for Grand Final quinellas.
An all-avian Crows v Swans game is currently assessed as being most likely, that pairing showing up in almost 30% of replicates.
Next-most common is a Crows v Cats matchup, which appears in a bit over 1 replicate in 5.
All up, 28 different pairings appeared at least once in the replicates, only five of which showed up often enough to make their TAB price look like value:
Here too, MoSSBODS' high rating of the Crows and the Dogs relative to the TAB is very apparent.
Also, as far as MoSSBODS is concerned, a Hawks v Swans GF offers the nearest to even-money matchup, such a contest projected as having the Hawks 51% favourites. In contrast, the most lopsided an somewhat likely GF would see Adelaide face West Coast, a pairing that would see the Crows start as almost 70% favourites.
]]>Four teams have won one or more games more than would be expected:
Six teams have won one or more games fewer than would be expected:
The Win Production Function would have the teams ordered as follows (with their competition ladder position shown in brackets):
In today's blog we'll take the results of the latest simulation, run after Round 21, and explore that conditionality for all the teams in serious contention for a Finals spot. As you read the blog it might be helpful to have to hand a copy of the current competition ladder and of the draw for the remaining two rounds.
We'll explore the conditionality using heat maps like the one below for Adelaide, which depicts the profile of finishes for each of the other teams should Adelaide be assumed to finish 1st (the block in the upper left), 2nd (the next block), and so on, for each of the feasible Adelaide finishes.
The more dark red a cell is the more often that combination of ladder finishes for Adelaide and the relevant team appeared across the 100,000 replicates of the simulation.
So, for example, looking at that first block we can see that, if Adelaide finish 1st, Hawthorn is most likely to finish 3rd and Geelong or Sydney grab 2nd. Conversely, looking at the second block we see that, if Adelaide finish 2nd, Sydney is most likely to take the Minor Premiership, though Hawthorn finish 1st almost equally as often. Geelong are most likely to finish 3rd conditional on Adelaide finishing 2nd.
Adelaide taking 3rd makes the Hawks good things for the Minor Premiership and Sydney most likely to take 2nd. This is because a 3rd-place Adelaide finish allows Sydney to lose as many games as Adelaide did to finish 3rd, the Swans enjoying a superior percentage to the Crows at present.
Next, we look at Geelong.
If they finish second, which is about the best they can hope for, Adelaide is most likely to finish as Minor Premiers, Hawthorn to finish 3rd and Sydney 4th. If, instead, Geelong finishes 3rd, the Swans are most likely to finish as Minor Premiers, the Crows as Runners-Up and Hawthorn 4th.
GWS' best finish is probably 4th, a result that would most likely see the Cats finish 5th, the Dogs 6th, West Coast 7th, and the Roos 8th.
If, instead, GWS take 5th, Geelong almost certainly finish Top 4, and places 6th to 8th are as above.
A 6th-place finish for GWS allows, most often, the Dogs to grab 5th and Geelong, again to lock in a Top 4 spot.
Next, Hawthorn, a 1st-place finish for which would leave Sydney, Geelong and Adelaide to mostly fight for 2nd to 4th, GWS and the Dogs to argue over 5th and 6th, West Coast to snatch 7th and the Roos 8th.
The Hawks rarely finish 2nd in the simulation because of their significantly inferior percentage compared to the teams immediately below them, which means that a loss is most likely to slip them into 3rd or 4th.
A 3rd-place finish sees Adelaide and Sydney fighting for the Minor Premiership and 4th, and the Cats grabbing 2nd. If, instead, the Hawks wind up 4th then the most likely scenario is Sydney in 1st, Adelaide in 2nd, and Geelong in 3rd.
The Roos almost certainly finish 8th, as their heat map below indicates.
Melbourne have a slight chance (about 5%) of grabbing 8th spot. If they achieve this it will almost certainly be at the Kangaroos' expense.
Sydney, should they finish as Minor Premiers, will most likely see Adelaide in 2nd, Geelong in 3rd, and Hawthorn in 4th.
If, instead, they end up 2nd, a Hawks Minor Premiership is most likely, along with a 3rd-place finish for the Crows and a 4th-place finish for the Cats. A 3rd-place finish for the Swans sees these three other teams finishing in that same order, with Adelaide taking 2nd.
West Coast almost certainly finish in 7th spot, as indicated by their heat map. Whilst they are level with GWS and the Western Bulldogs on points, their percentage is 8 points less than GWS' and almost 14 points more than the Dogs'.
The Dogs mostly finish 5th or 6th, GWS assuming whichever of those two positions the Dogs leave vacant.
]]>Hawthorn are now assessed by MoSSBODS as favourites for the Minor Premiership, Sydney as 2nd-favourites and Adelaide as 3rd-favourites. The TAB has the same ordering.
The competition's ultimate Top 4 is now all but determined, GWS (20%) the only team from outside the current Top 4 with any realistic hope of breaking into it, though the Dogs do also have an outside chance (5%).
Likewise, the Finalists are almost locked-in. The teams currently occupying the Top 7 places on the ladder are all assured of playing Finals football, while the final spot will be the Roos' 95% of the time and the Dees' the other 5%.
For the Spoon, it's now 85% the Dons' and 15% the Lions'.
One of the curious aspects of the summary table above is that Hawthorn, despite being the team with the highest value for expected number of wins (17.3), are ranked only third in terms of expected ladder finish, behind both Sydney and Adelaide.
That's because of the still-unusual profile of its projected ladder finishes, which sees it most likely to finish 1st, but more likely to finish 3rd or 4th than 2nd.
In fact, Hawthorn has a spread of six ladder positions which it has at least a 1-in-30 chance of occupying at the end of the home-and-away season. Only one another team can say that about as many as five ladder positions (Geelong), and no other team about more than four.
Comparing the simulation results to the prices currently on offer at the TAB reveals that no team currently represents value in the Minor Premiership, Top 4 or Final 8 markets.
The latest simulation replicates for the Top 2 ladder positions show an interesting lack of symmetry, reflecting the relative unlikelihood of a 2nd-place finish for Hawthorn.
While a Hawthorn/Sydney 1-2 finish is about a 25% possibility, a Sydney/Hawthorn finish is only a 2% possibility. Similarly, though a Hawthorn/Adelaide finish is assessed as having about a 17% probability, an Adelaide/Hawthorn finish is rated only about one-fifth as likely.
Ignoring order then, a Sydney/Adelaide 1-2 finish is the most likely, that pairing occurring in one order or the other in about 30% of replicates.
We see something similar when we expand our view to consider the Top 4. In none of the 10 most-likely orderings does Hawthorn finish 2nd. It does, however, finish 1st in five of the orderings, 3rd in three, and 4th in the remaining two.
The most-common ordering, which appeared in about 1 replicate in 6, was Hawthorn/Sydney/Adelaide/Geelong and was one of only two orderings with an estimated probability of more than 10%.
The same four teams appear in all but one of the Top 10 orderings, the exception being the 9th ordering on the list in which GWS sneaks into 4th, ousting Geelong. That ordering has only a 3% probability of occurring however.
Next we look at the bottom half of the Top 8 where we find that two orderings account for over 60% of replicates. In the first and slightly more common we have the ordering Dogs/Giants/Eagles/Roos and in the second the Dogs and Giants swap places.
Some of the less-common orderings see the Hawks, Crows or Cats slip into 5th (or even 6th, in one case, for the Hawks), GWS fall as low as 7th, and the Dees sneak into the bottom of the 8. The Roos, though, never climb higher than 8th if they make it at all.
We can also look at the most likely orderings for the entirety of the Top 8, the most likely of which now carries about a 7.5% probability and sees the Western Bulldogs climbing into 5th from their current ladder position of 7th, pushing the Giants and Eagles down one place each. All of the other teams from the current Top 8 - those in positions 1st to 4th, and 8th - remain as they are on the current competition ladder.
Interestingly, in none of the 10 most-likely orderings do the Dogs wind up occupying the spot they currently hold on the ladder (viz 7th).
On then to considerations of September (and October) football where we find that the simulations now rate Adelaide as 45% chances for the Flag (up 6%), Sydney as 18% chances (up 2%), Geelong as 17% (down 1%), Hawthorn as 12% chances (up 1%), GWS as 4% chances (down 8%), and the Western Bulldogs as 3% chances (no change).
At $4.50, that makes Adelaide the only attractively-priced team on the TAB Flag market. Very clearly, MoSSBODS continues to rate the Crows much more highly than does the TAB.
Whilst Adelaide are seen as being most likely to take the Flag, Sydney. Hawthorn and Geelong are assessed as being most likely to bow out in Preliminary Finala, GWS and the Western Bulldogs in Semi-Finala, and West Coast and the Kangaroos (and St Kilda and Melbourne, should they make it) in Elimination Finals.
Also, Adelaide are estimated as 70% chances of making the Grand Final, Sydney as 46%, Geelong as 34%, Hawthorn as 28%, GWS as 12%, the Western Bulldogs as 7%, West Coast and the Kangaroos as 1% each.
We can see how each team is projected to fare conditioned on its home-and-away season finish in the chart below.
That same data can be represented in tabular form and including estimated probabilities for each team conditioned on its ladder finish. This view reveals that, for any given final ladder position, the simulations have Adelaide as the team with the highest probability of winning the Flag.
The Crows aside however, it's interesting to note that Geelong generally fares better than Sydney when it finishes in the Top 4, but worse when it misses the Top 4. The Cats' overall probability for the Flag is lower than the Swans' because the Cats are assessed as being more likely to miss the Top 4.
Finally, if we analyse the simulation to see who meets in the Grand Final we find that an Adelaide v Sydney Grand Final is considered most likely, that pairing arising in almost 28% of replicates. The two next-most common quinellas also involve the Crows.
Across the 100,000 simulations, 33 different pairings appeared at least once, though only 12 appeared in more than 1 replicate in 100.
The most unlikely pairing that appeared at least once saw the Western Bulldogs facing Melbourne in the Grand Final. That pairing appeared only twice.
Amongst the most-likely Grand Final pairings, a Sydney v Western Bulldogs game is assessed as being the most competitive with Sydney starting as just 52.5% favourites, and an Adelaide v GWS matchup as the least competitive, with Adelaide overwhelming 72% favourites in such a contest.
Given the probabilities shown here, the only pairings offering value at the TAB are:
It's clear that MoSSBODS is seeing a lot in the Crows' performances that isn't currently being reflected in prices or popular opinions. Just how perceptive or deluded that opinion proves to be will be a fascinating story for me across the remainder of the season.
]]>And, for a change, we'll do it as a table.
A few features of that table stand out immediately:
In those simulations, looking firstly at the home-and-away projections, we find that the Hawks' loss to Melbourne in Round 20 very much opened the door to the pursuing pack.
While the simulations still have the Hawks as about 30% chances for the Minor Premiership, that percentage is less than one-half of what it was before the weekend. The Hawks' loss was mostly the Crows' and Swans' gain, although the Cats also saw their Minor Premiership chances increase from about 10% to 16%.
The Hawks' Top 4 chances were also dented, leaving them as, curiously, only 73% chances for such a finish, which is less than the chances for Adelaide, Geelong and Sydney, and only slightly above that for GWS. That's because, as we can see from the Dinosaur Chart, the Hawks' ladder profile is curiously shaped - they're most likely to finish 1st, but, if they don't, most likely to finish 3rd. They're also more likely to finish 6th than 2nd..
Other team profiles are generally much more standard, though Sydney's is a little atypical as well, with each of positions 1st through 4th about equally likely.
The heat-map of that data is as follows.
The battle for places in the Top 5 is rendered very starkly in this chart. Outside the Top 5, teams' fates are much clearer and concentrated in two or three most-likely finishes.
According to the simulations (and in line with what many sports journalists have been saying for a little while now), our 2016 Finalists have all but been determined. There is, however, possibly a little sparring to do over 8th place, with the Roos, Saints and Port all having non-trivial chances for such a finish, but with the Roos having by far the better of those prospects.
The Spoon's final resting cabinet remains less clear, though Essendon are now about 2/1 on favourites to be adding it to their collection. The Lions soak up most of the remaining potential though there are universes, however rare, where the Dockers claim last.
(By the way, given current TAB prices and the probability estimates for each team's Minor Premiership, Top 4 and Top 8 chances shown above, the only value bets are Adelaide and Hawthorn for the Minor Premiership.)
With so much uncertainty remaining around the exact ordering of the top teams, it's to be expected that this will be apparent when we look at some of the most-likely orderings for teams.
For example, even the most-likely Top 2s turned up in less than 10% of simulation replicates. It had Adelaide finishing 1st and Geelong 2nd.
Eight other pairings appeared in at least half as many replicates, and two others (Hawthorn/Adelaide and Hawthorn/Sydney) appeared almost as often as that most-likely pairing.
Across the 10 most common pairings, Adelaide appears in six, Geelong in five, Hawthorn and Sydney in four, and GWS in one.
Expanding our view to encompass the Top 4 spots we find even more uncertainty. The most common Top 4, Adelaide/Geelong/Hawthorn/Sydney, occurs in less than 3% of simulation replicates, though it is more than 25% more likely than the next most common quartet of Hawthorn/Sydney/Adelaide/Geelong.
The same four teams appear in all 10 of the most common Top 4s, just in different orders, with Hawthorn finishing 1st in half of the 10, Sydney in three, and Adelaide in the remaining two. Geelong finishes no higher than 2nd, and GWS no higher than 3rd.
There's more certainty about positions 5th to 8th, though even here the most likely ordering of GWS/Western Bulldogs/West Coast/Kangaroos appears in less than 1 replicate in 7.
The West Coast and Kangaroos, usually in that order, share 7th and 8th between them in all but one of the quartets shown here. In the one where they don't, the Eagles sneak into 6th, pushing the Dogs into 7th and leaving the Roos in 8th.
Combined, the 10 orderings shown here account for about two-thirds of all simulation replicates.
It's still very difficult to talk meaningfully about a most-likely Top 8, but there are now some orderings that have probabilities above 1%. The 10 most common of those orderings appear below.
Whilst we can see some signs of MoSSBODS' high opinions of the Crows in the tables and charts we've just looked at, that becomes much clearer when we look next at MoSSBODS' assessment of team's Finals chances.
For example, the heatmap below reveals that MoSSBODS has the Crows as clear Flag favourites, assessing them as almost 40% chances for that honour. It also shows that, should the Crows make the Grand Final, they will be, weighting across all potential opponents, about 62% favourites to win it (ie 38.8/(38.8 + 23.6)).
Geelong are assessed as being next most likely to win the Flag. MoSSBODS has them as about 18% chances, and has their weighted Grand Final probability, should they make it, as about 54% against all potential opponents.
In terms of most likely finishes, Adelaide's is to win the Grand Final, Geelong's, Sydney's, Hawthorn's and GWS's is to lose a Preliminary Final, the Western Bulldogs' is to lose a Semi Final, while West Coast's, the Kangaroos', St Kilda's, Port Adelaide's and Melbourne's is to lose an Elimination Final.
We can break down each team's ultimate Finals fate in terms of where they finished at the end of the matching simulated home-and-away season and display it as per the chart below.
As we'd expect based on history, it's rare for a team to win the Flag from outside the Top 4.
We can also analyse this Finals data on the basis of how likely it is that a team wins the Flag given that they finished the home-and-away season in a specific ladder position. The data for that, for those teams most likely to finish in the Top 8, appears in the table below.
We see that, as we'd expect, teams are most likely to win the Flag if they finish in the Top 2, and more likely to win if they finish in the Top 4 rather than in positions 5 through 8. Also, Adelaide is more likely than any other team to win the Flag from any give home-and-away ladder finish.
Adelaide's estimated Flag chances of 38.6% imply that the TAB's currently offered price of $4.50 - the same as being offered for Sydney, Hawthorn and Geelong as it happens - represents considerable value. The same cannot be said for any other team's current price for the Flag
Lastly, let's take a look at the simulation's views on Grand Final pairings and victory probabilities for the teams in those pairings.
According to the simulations, an Adelaide v Sydney Grand Final is the most likely, that pairing occurring in almost 18% of all replicates. When those Grand Finals were, in turn, simulated, Adelaide won just over 64% of them.
The next most likely pairing was an Adelaide v Geelong matchup, which turned up a little less than 16% of the time and had the Crows winning about 54% of the time.
Altogether, 37 different pairings occurred in at least one replicate, and these are all shown in the table at right. Just 14 of them, however, accounting for almost 97% of all replicates, each appearing 1% or more of the time.
Across those 14 pairings, MoSSBODS has it that a Sydney/Western Bulldogs would provide the closest contest, though an Adelaide/Geelong, Hawthorn/Sydney, or Western Bulldogs/GWS would also see only narrow favourites.
The least attractive pairing would be an Adelaide/GWS pairing, which MoSSBODS would rate Adelaide as 68% chances to win.
Given the probability estimates here and given current TAB prices, the only pairings offering value are:
They should, based on that analysis, be sixth, the full ordering being (with current Competition Ladder positions shown in brackets after each team name):
Hawthorn aside then, no team differs in its ranking using the Win Production Function and its ranking on the Competition Ladder by more than two places.
Hawthorn, it also turns out, is no more highly ranked than 4th on any of the Scoring Shot metrics tracked on the Team Dashboard.
The current Top 3s for each of those metrics are:
And, finally, the Top 3 teams in each quarter are:
]]>
Those are the latest opinions of the MoS Simulations, which this week comprise 100,000 replicates of the remaining home-and-away season and the Finals series ahead.
In Dinosaur Chart form, the latest projections of the home-and-away season show an increasing narrowing of each team's range of final ladder finishes, no team now showing a seriously feasible range extending beyond six places.
The fates of Adelaide, Geelong, GWS and Sydney remain the most uncertain, while those of the Brisbane Lions, Essendon and Fremantle seem surest.
Viewed in heatmap form, the teams' range of final home-and-away ladder finishes show that the Hawks now have a firm hold on the Minor Premiership; Geelong, GWS, Adelaide and Sydney are locked in a battle for 2nd through 5th; while the Western Bulldogs are most likely to finish 6th, West Coast 7th, and the Kangaroos 8th.
Given current TAB prices, these results make Adelaide seem value for the Minor Premiership at $15 and for the Top 4 at $1.40, and make the Western Bulldogs look attractive for the Top 4 at $26.
Outside the Top 8 we have Port Adelaide and St Kilda most likely to tussle for 9th and 10th, Collingwood most likely to finish 11th, Melbourne 12th, Carlton 13th, Richmond 14th, Gold Coast 15th, Fremantle 16th, the Brisbane Lions 17th, and Essendon most likely to snag the Spoon.
After this week's results, the simulations now have the most likely 1-2 pairing at the end of the home-and-away season as Hawthorn and GWS, that ordering appearing in more than one-fifth of the replicates.
Next most common was a Hawthorn- Geelong pairing, which cropped up 18% of the time, the reverse ordering showing its face about another 4% of the time.
In total, Hawthorn appears in six of the 10 most-likely Top 2 pairings, Geelong and Adelaide in 5, GWS in 3, and Sydney in 1.
Turning next to the commonest Top 4s, we find that Hawthorn appears 1st in all 10, GWS 2nd in four, 3rd in two, and 4th in two more, Adelaide finishes 2nd in one, 3rd in three, and 4th in three more, while Sydney finished 2nd in one, 3rd in another, and 4th in three more.
None of the ten most-common quartets, however, crops up in more than one replicate in 20.
Focussing next on ladder positions 5th to 8th, we find that the Western Bulldogs dominate, finishing 5th in one ordering, 7th in another, and 6th in the other eight.
The Roos are also well-represented here, finishing 7th in four of the orderings and 8th in the other six.
West Coast also appears in all 10 orderings, once in 6th, five times in 7th, and four times in 8th.
The only other teams to appear in the list are Sydney, who finish 5th twice and 6th once, Adelaide, who finish 5th three times, GWS, who finish 5th twice, and Geelong, who also finish 5th twice.
None of the quartets shown here appears in more than about 9% of the replicates.
Finally, in terms of the home-and-away season, let's look at the possible orderings for all eight of the Finalists.
To start with, it's important to note that none of the orderings here occurred in more than about 1.5% of replicates but, that said, the six most common all had a Hawks-GWS 1-2 finish.
As well, the Western Bulldogs finish 6th in all of the orderings, while West Coast and the Kangaroos share 7th and 8th between them.
Adelaide remain favourites for the Flag according to these latest simulations, though their probability fell from about 31% to 29% this week. The Hawks' probability also fell (from 22% to 18%) as did the Cats' (from 24% to 20%), whilst Sydney's rose from 7% to 12% and GWS' rose from 11% to 16%.
The Roos and Eagles remain the teams most likely to bow out in the Elimination Finals, the Dogs in a Semi-Final, and GWS, Geelong and Hawthorn in a Preliminary Final. Adelaide, meantime, are about as likely to lose in a Preliminary Final as they are to lose the Grand Final.
Three teams, should they make it, are assessed as being more likely to start the Grand Final as favourites (averaged across all of the opponents they might meet there): Adelaide, Geelong and Hawthorn.
The following chart depicts how likely it is that the potential Finalists might bow out at a particular stage and how those exits relate to their final home-and-away ladder finishes from the relevant replicates.
Adelaide's information is particularly interesting as we see them bowing out in a Semi-Final or a Preliminary Final far more often when they finish outside the Top 4 than when they finish inside it. This is true also of Geelong, GWS and Sydney, though less so of Hawthorn.
Finally, when we look at potential GF pairings we now find that:
Altogether, about 40 different GF pairings appeared in at least one of the simulation replicates.
]]>Clearly then, Scoring Shot production and concession matters more than Scoring Shot Conversion, and later Quarters matter slightly more than earlier ones.
Lastly, let's look at how the number of games actually won by each team compares to what we'd expect they might have won given their scoring statistics interpreted through the Win Production Function:
Teams that have won more games than we'd expect:
Teams that have won fewer games than we'd expect:
At least that's the view from the latest MoSSBODS' simulations using the methodology explained in detail in this post but which, simply put, combines Offensive and Defensive ratings of all 18 teams along with an assessment of how well each team plays at every ground to simulate the remainder of the home-and-away season and the finals. For this week's blog, we'll perform this simulation 10,000 times and summarise the results here.
GWS and Geelong did a great deal for their own Top 4 hopes this weekend, increasing them to 83% in the case of the Cats and to 67% in the case of the Giants. These gains came mostly at the expense of Sydney and the Western Bulldogs, whose Top 4 chances have now fallen to 52% and 15%, respectively.
The complete profile of every team's possible ladder finishes at the end of the home-and-away season appears in the chart below, known affectionately here as the Dinosaur Chart, for obvious reasons.
As the season progresses, team's menu of possible finishes tends to shrink, and we find now that each team has, at most, four ladder spots that span the range of its opportunities. For some teams, such as the Brisbane Lions, Essendon and Fremantle, the range is even narrower.
If you prefer your data numeric, here's the same chart as a heat-map, which shows even more clearly how most teams are now homing in on a handful of ladder finishes.
A Hawthorn-Geelong 1-2 finish at the end of the Home-and-Away season is now the most likely, that combination cropping up in almost one-quarter of the simulation replicates.
Next most likely is a Hawthorn-Adelaide finish, which has an estimated probability of 20%.
Combined, the 10 most-likely 1-2 finishes now account for over 80% of all replicates, with Hawthorn appearing in 6 of them, Adelaide and Geelong in 5, GWS in 3, and Sydney in 1.
Moving next to Top 4s, we find that Hawthorn appears in 1st place in all of them, and Geelong appears in 2nd place in half of them.
Adelaide comes 2nd in three of them, and GWS in the remaining two.
Sydney appears no higher than 3rd, and does that it only one of the Top 10 combinations, finishing 4th in three more.
No combination, however, appears in more than about 6% of replicates, so there are a lot of feasible possibilities not listed here.
The Kangaroos finish in 8th place in all but one of the Top 10 combinations for positions 5th through 8th, and snag 7th in one other.
West Coast finishes in 6th in three of the combinations, 7th on six more, and in 8th in another.
Sydney finishes in 5th in three of the entries, and in 6th in one more.
Note also that no combination shown here appears in more than 7.5% of replicates so, again, there are many feasible combinations not shown here.
Lastly, let's take a look at what last week's results have done to the Flag prospects of the teams in contention for that honour.
Adelaide remain favourites for the Flag, winning it about 31% of the time (and losing it another 26%), ahead of Geelong who win it 24% of the time (and lose it 18% of the time) and Hawthorn who win it 22% of the time (and lose it 19%).
Hawthorn, Geelong and GWS are also assessed as being most likely to go out in a Preliminary Final, Sydney and the Western Bulldogs in a Semi-Final, and West Coast and the Kangaroos in an Elimination Final.
A breakdown of these results in terms of where a team finished at the end of the home-and-away season in the relevant simulation replicate, appears in the chart below.
Looking more closely at the simulation replicates, we find that the most likely Grand Final sees Adelaide meet Geelong (18%), or Adelaide meet Hawthorn (17%).
Other common pairings are GWS v Adelaide (11%) and Hawthorn v Geelong (10%).
]]>That's the view from the latest simulations based on MoSSBODS' Ratings using the methodology explained in this post from last week (and the post it links to.)
From a "who wins the Minor Premiership" or "who finishes Top 4" point of view, however, the news was all bad for the Swans and all good for the Hawks. In the Top 4 contest there was also better news for GWS and Geelong.
What's slightly depressing - though also a little encouraging - about all this is that the current estimates of teams' chances are broadly consistent with the TAB bookmaker's, the only exception being that the Dogs look like a little bit of value at $2 for a Top 4 finish.
Elsewhere, in races where the TAB is no longer fielding markets, a little more clarity was achieved in the Spoon race, where the Lions gain was mostly the Dons' welcome loss. The simulations now rate the Lions as about 2/1 on chances.
This week you'll see I've also added a new column in the table above, which shows the change in the number of Expected Wins for each of the teams after the most-recent results. The big winners there were the Hawks, Power and Saints, who all gained about 0.8 wins, while the big losers were the Swans, Roos and Dees, who all lost about the same amount.
Time then, surely for a Dinosaur Chart.
And, as we might hope, it's showing a pleasing concentration of scales for most teams as the season progresses (and yes, I know, we now understand that many dinosaurs had feathers rather than scales ...). Only a handful of teams now have genuine claims for more than two or three ladder finishes, Geelong, Sydney and the Western Bulldogs the most notable exceptions.
If you prefer your data numeric, here's the same chart as a heat-map, which shows even more clearly how most teams are now homing in on a single ladder finish.
A Hawthorn-Adelaide 1-2 finish at the end of the Home-and-Away season now seems most likely, that combination cropping up in about one-third of the simulation replicates.
Perhaps a little surprisingly, a Hawthorn-Geelong finish appears as the third-most likely option, though a Geelong-Hawthorn finish is not amongst the Top 10.
Similarly, a Hawthorn-Sydney finish rears its head in about one replicate in 12 while a Sydney-Hawthorn finish is not on the list at all.
Combined, the 10 most-likely 1-2 finishes now account for about 75% of all replicates, with Hawthorn appearing in 7 of them, Adelaide in 5, the Western Bulldogs in 3, Geelong and Sydney in 2, and GWS in 1.
Moving next to Top 4s, we find that Hawthorn appears in 1st place in all but two of them, Adelaide filling that role in the remainder and finishing in 2nd in five more of the Top 10 quartets.
Hawthorn also comes 2nd in one combination and 3rd in another, whilst Adelaide bags 3rd on three occasions.
The Western Bulldogs sneak into 2nd once, and finish 3rd and 4th on two occasions. Geelong snatches 2nd twice, 3rd twice and 4th four times, while Sydney finishes 2nd once, 3rd twice, and 4th four times.
No combination, however, appears in more than about one replicate in 40.
Positions 5th through 8th are not a great deal clearer, though GWS finishes 4th in three of the 10 most-common quartets, as does Sydney. Geelong and the Western Bulldogs also finish 5th twice each.
Sixth place is also shared amongst the same three teams, GWS and the Western Bulldogs grabbing it in three, and Geelong and Sydney on two each.
West Coast finishes 7th in all of the 10 most-common quartets, as do the Kangaroos in 8th.
No combination, however, appears in more than about one replicate in 30.
Finally, in terms of the Home-and-Away season simulations, the most common Top 8s are those as shown in the table at left, though it would be unwise to place much faith in any of them as even the most frequent of them appears in less than 1 replicate in 200.
One thing that is notable about this list is that the Eagles and Roos finish 7th and 8th in every entry..
Lastly, let's take a look at what last week's results have done to the Flag prospects of the teams in contention for that honour.
Broadly, the results have not made a great deal of difference, except that:
In numerical, heat-map form, the data is as it appears at left.
Looking more closely at the simulation replicates, we find that the most likely Grand Final sees Hawthorn meet Adelaide (16%).
After that, the next-most common pairings are Adelaide v Geelong (15%), Adelaide v Sydney (11%), Adelaide v Western Bulldogs (9%), and Adelaide v GWS (7%).
]]>This week, just one quick piece of commentary - what the Top 8 would look like if it were based on each team's expected number of wins given its scoring statistics and according to the MoS Win Production Function (with last week's ranking on the same basis shown in brackets):
The first thing to point out is that nothing can usefully be concluded based on a sample of 5, so I won't be saying anything specific about Hawthorn, but I think the more interesting question is to ask how you might determine how many close games a team "should" win.
Implicit in many of the discussions is that teams, regardless of their relative abilities, all become 50:50 propositions in close games. I think that position is logically untenable. If a better side was expected to record, say, 60% of the scoring opportunities at the time the game started then, field position notwithstanding they must surely be expected to generate a better than 50% share of any remaining opportunities in the back end of a close contest.
My initial assumption was that, assuming level scores, the two team's chances with any fraction of the game remaining should be the same as they were at the start. That assumption, on reflection, is also wrong. The original assessments of the two teams' chances were based on the total set of scoring shots expected to be generated during the entire course of the game but, as time reduces and the number of likely remaining scoring shots diminishes, it stand to reason that the weaker team has a higher likelihood, by chance alone, to generate the majority of the remaining scoring shots and therefore win.
Imagine, for example that we placed 100 balls in a bowl, 55 green and 45 blue from which we made 50 draws at random with replacement. The likelihood of drawing more blue than green balls - of blue "winning", if you like - in this case is about 19%. But, if we only make 5 draws instead, the probability climbs to 41%.
So, the better team should win close games more than 50% of the time but less often than its pre-game probability would have suggested.
One way of coming up with a figure for the proportion they should win is to proceed as follows.
We've found in previous analyses (for example, this empirical one from 2009 and this theoretical one from 2014) that the final margin of a V/AFL game can be modelled as a Normal random variate with mean equal to the pre-game expected margin and standard deviation of about 36.
That assumption can be used to model the game margin distributions for teams of varying levels of pre-game superiority and to ask of the resulting distribution: of all games that finished with a margin of X points or less (as a win or a loss for the superior team), in what proportion does the superior team win?
The answer to that question is summarised in the chart below.
Each line in the chart relates to a team with some fixed level of pre-game superioirity. The bottom line is a team that is 0 points better than its opponent (ie an equal-favourite), the next to a team that is assessed as a 6-point favourite, and so on in 6-point increments. The top line relates to a 60-point pre-game favourite.
As we move from left to right across a line we increase the range of victory margins that we are considering. We can read the proportion of games that the superior team is expected to win for any given victory margin from the vertical axis.
So, for example, a team that was a 6-point pre-game favourite would be expected to win about 53% of games that finished with a margin of 18 points or less (for either side).
What we see then is that, as suggested, superior teams are expected to win more than one-half of games finishing with a margin within any nominated range, but also that the probability doesn't stray all that far from 50% all that quickly for teams that are only slightly superior. For example, even a 24-point better side will win only about 57% of games won by 18 points or fewer.
Vastly superior sides, however, win much larger proportions of close games. For example, a 48-point better team will win 70% of games decided by 24 points or fewer.
Armed with this chart it would be possible to go back for any team and use their pre-game bookmaker handicaps in a large enough sample of games that finished as narrow wins or losses, and assess the extent to which they have under- or over-performed in those games.
I'll leave that, in fine professorial style, as an exercise for the reader.
]]>In this post from 2014 for example, which is probably the post most similar in intent to today's, I used Beta regression to model team conversion rates:
Both models explained about 2.5 - 3% of the variability in team conversion rates, but the general absence of statistically significant coefficients in the first model meant that only tentative conclusions could be drawn from it. And, whilst some teams had statistically significant coefficients in the second model, its ongoing usefulness was dependent on an assumption that these team-by-team effects would persist across a reasonable portion of the future. We know, however, that teams go through phases of above- and below-average conversion rates, so that assumption seems dubious.
Other analyses have revealed that stronger teams generally convert at higher rates when playing weaker teams, so it's curious that the first model in that 2014 post did not have statistically significant coefficients on the MARS Ratings variable.
Maybe MoSSBODS, which provides separate offensive and defensive ratings, might help.
For today's analysis we will again be employing a Beta regression (though this time with a logit link and not fitting phi as a function of the covariates), applying it to all games from the period from Round 1 of 2000 to Round 16 of 2016.
We'll use as regressors:
(Note that the attendance and time-of-day data has been sourced from the extraordinary www.afltables.com site.)
Now, in recent conversations I've been having on Twitter and elsewhere people have been positing that:
What's appealing about using including MoSSBODS ratings as regressors is that they allow us to explicitly consider the first argument above. If that contention is true. we'd expect to see a positive and significant coefficient on a team's own Offensive rating and a negative and significant coefficient on a team's opponent's Defensive rating.
On the second argument, whilst I don't have direct weather data for every game and so cannot reflect the presence or absence of rain, I can proxy for the likelihood of dew in the regression by including the variables related to the time of day that the game started and the month in which it was played.
Looking at the remaining regresors, venue is included based on an earlier analyses that suggested conversion rates varied significantly around the all-ground average for some venues, and attendance is included to test the hypothesis that teams may respond positively or negatively in their conversion behaviour in the presence of larger- or smaller-than-average crowds.
Details of the fitted mode appear below.
The logit formulation makes coefficient interpretation slightly tricky. We need firstly to recognise that estimates are relative to a notional "reference game", which for the model as formulated is a game played at the MCG, starting before 4:30pm and played in April.
The intercept coefficient of the model tells us that such a game, played between two teams with MoSSBODS Offensive and Defensive ratings of 0 (ie 'average' teams) would be expected to produce Conversion rates of 53.1% for both teams. We calculate that as 1/(1+exp(-0.126)).
(Strictly, we should include some value for Attendance in this calculation, but the coefficient is so small that it makes no practical difference in our estimate whether we do or don't.)
Next, let's consider the four coefficients reflecting MoSSBODS ratings variables. We find, as hypothesised, that the coefficient for a team's own Offensive rating is positive and significant, and that for their opponent's Defensive rating is negative and significant.
Their size means that, for example, a team with a +1 Scoring Shot (SS) Offensive rating and a 0 SS Defensive rating playing a team with a 0 SS Defensive and Offensive rating would be expected to convert at 53.3%, which is just 0.2% higher than the rate in the 'reference game'. This is calculated as 1/1(1+exp(0.126+0.008)).
Strong Offensive teams will have ratings of +5 SS or even higher, in which case the estimated conversion rate would rise to just over 65%.
Similarly, a team facing an opponent with a +1 Scoring Shot (SS) Defensive rating and a 0 SS Offensive rating, itself having 0 SS Defensive and Offensive ratings would be expected to convert at 52.8%, which is about 0.3% higher than the rate for the 'reference game'.
The positive and statistically significant coefficient on a team's opponent's Offensive rating is a curious result. It suggests that teams convert at a higher rate themselves when facing an opposition with a stronger Offence.as compared to one with a weaker Offence. That opponent would, of course, be expected to convert at a higher-than-average rate itself, all other things being equal, so perhaps it's the case that teams themselves strive to create better scoring shot opportunities when faced with an Offensively more capable team, looking to convert less promising near-goal opportunities into better ones before taking a shot at goal.
In any case, the coefficient is only 0.004, about half the size of the coefficient on a team's own Offensive rating, and about one-third the size of that on the team's opponent's Defensive Rating, so the magnitude of the effect is relatively small.
To the venue-based variables then, where we see that three grounds have statistically significant coefficients. In absolute terms, Cazaly's Stadium's is largest, and negative, and we would expect a game played there between two 'average' teams, starting before 4:30pm in April to result in conversion rates of around 46%.
Docklands has the largest positive coefficient and there we would expect a game played between the same two teams at the same time to yield conversion rates of around 56%.
The coefficients on the Time of Day variables very much support the hypothesis that games starting later tend to have lower conversion rates. For example, a game starting between 4:30pm and 7:30pm played between 'average' teams at the MCG would be expected to produce conversion rates of just over 52%. A later-starting game would be expected to produce a fractionally lower conversion rate.
Month, it transpires, is also strongly associated with variability in conversion rates, with games played in any of the months May to August expected to produce higher conversion rates than those played in April. A game between 'average' teams, at the MCG, starting before 4:30pm and taking place in any of those months would be expected to produce conversion rates of around 54%, which is almost 1% point higher than would be expected for the same game in April. The Month variable then does not seem to be proxying for poorer weather.
Relatively few games in the sample were played in March (150) so, for the most part, April games were the first few games of the season. As such, the higher rates of conversion in other months might simply reflect an overall improvement in the quality and conversion of scoring shot opportunities once teams have settled into the new season.
Lastly, it turns out that attendance levels have virtually no effect on team conversion rates.
It's important to interpret all of these results in the context of the model's pseudo R-squared, which is, again, around 2.5%. That means the vast majority of the variability in teams' conversion rates is unexplained by anything in the model (and, I would contend, potentially unexplainable pre-game). Any conversion rate forecasts from the model will therefore have very large error bounds. That's the nature of a measure as lumpy and variable as Conversion Rate, which can move by tens of percentage points in a single game on the basis of a few behinds becoming goals or vice versa.
That said, we have detected some fairly clear "signals" and can reasonably claim that conversion rates are:
Taken across a large enough sample of games, it's clear that these effects do become manifest, and that they are large enough, despite the vast sea of randomness they are diluted in, to produce detectable differences.
Next year I might see if they're large enough to improve MoSSBODS score projections because, ultimately, what matters most is if the associations we find prove to be predicitively useful.
]]>Or, if you prefer your data in heat-map form ...
So, the Cats fall from almost 50% chances for the Minor Premiership to just 7.4% chances, as you can see from the table below that compares this week's simulation results with last week's. The Cats' drop, along with GWS' fall from 14% to about 1%, converted into Minor Premiership gains for Sydney (+30% points), Adelaide (+11% points) and Hawthorn (+10% points).
The Cats' Top 4 chances were also dramatically reduced this week, from about 90% to 58%, while GWS' chances fell even further from about 72% to 29%. Here too, the beneficiaries were Adelaide (+8% points), Sydney (+43% points) and Hawthorn (+22% points).
In the race for a spot in the Top 8, Port Adelaide's chances dropped from about 18% to 7%, while those of West Coast, the Kangaroos, Collingwood, and Melbourne all rose by about 2 to 3% points. For Collingwood and Melbourne that amounted to a rough doubling of their chances.
And, lastly, in the Spoon race, the Lions' loss and Essendon's relatively strong showing against St Kilda saw the Lions' chances rise by about 10% points and Essendon's fall by about the same amount, moving the Lions' into MoSSBODS outright favouritism for the cutlery.
Comparing MoSSBODS' latest assessments with current TAB markets suggests that there is some value in the Swans' Minor Premiership and Top 4 prices, and also in the Dogs' Top 4 price.
The two most-likely Top 2 finishes at the end of the Home-and-Away season now see Sydney and Adelaide filling those spots, in either order. Combined, the two possible orderings account for 30% of the simulation replicates.
The third- and fourth-most likely pairings involve Adelaide and Hawthorn, and these account for roughly another 16%.
Sydney and Geelong are the next most likely pairing, but only in that order (6%), and then follows the Sydney and Hawthorn pairing, in either order (11%).
The Western Bulldogs appear in the 8th and 9th-most likely pairings, but finish in 2nd in both.
When we shift our attention to the most likely Top 4s, Sydney and Adelaide dominate the picture, each appearing in one half of the 10 most likely quadrellas.
Hawthorn sneaks into 2nd in two of them, 3rd in three more, and 4th in three others, while the Dogs grab 3rd twice and 4th on four occasions.
Geelong fill in all of the remaining vacancies, snatching 3rd twice and 4th on three occasions.
Note that none of these orderings appeared in more than 1 replicate in 40, so the number of feasible quadrellas remains very large.
Next, looking at the single-chance Finals places, we find that the most likely orderings all involve GWS finishing somewhere between 5th and 7th.
The Kangaroos also appear in each of the Top 10 orderings, most often in 8th, as do West Coast, most often in 7th, but occasionally in 6th or 8th. Geelong also make four appearances, and Hawthorn just one.
Here too it's worth noting that even the most likely of the orderings appeared in only just over 1 replicate in 30.
Traditionally, I've not simulated the Finals until the week before they've started, but I've had a number of requests this year to extend the usual Home-and-Away projections right through to the end of the competition.
So, this week, I've done that.
The simulation of Finals works in much the same way as those in the Home-and-Away season. We use each team's current Offensive and Defensive Ratings and add a stochastic component to them for each game in each replicate.
To apply the appropriate Venue Performance values and Travel Penalties we need, however, to model the venuing of each Final, for which purpose I have assumed that:
(This season, based on the simulations, there's no need to worry about Finals hosted by Fremantle, the Brisbane Lions or Gold Coast but, to avoid confusion, should the need arise, Fremantle will be assumed to host at Subiaco, and the Lions and Suns to host at the Gabba.)
We also assume, of course, that the Grand Final is played at the MCG.
So then, to the results.
In this first chart we summarise the final finish for every team that made at least one Final in at least one replicate, breaking down each team's finish by the ladder position it achieved at the end of the relevant home-and-away portion of the replicate.
Adelaide, for example, win the Flag in about 40,000 of the 100,000 replicates, having finished in 1st or 2nd on the ladder in about 25,000 of those replicates.
Sydney wins the Flag only about 15% of the time, and is more likely to have bowed out of the Finals in a Preliminary Final.
Here's that same data presented as a heat-map.
We see then that the Roos and Eagles are most likely to bow out in the Elimination Finals, that GWS and the Western Bulldogs are most likely to bow (wow?) out in the Semi-Finals, and that Geelong, Hawthorn and Sydney are most likely to finish 2016 as losing Preliminary Finalists.
At current TAB prices that leaves Adelaide as the only team with a sufficiently large positive expectation on the Flag market to provide at least a 5% expected return.
To finish, I thought it might be interesting to see what the simulations reveal about the relationship between ladder finish and Finals prospects, regardless of which team finishes in which position.
It's clear, as you might expect, that the teams finishing in 1st or 2nd have the greatest chance of making the Grand Final and winning the Flag, and that opportunity reduces with every pair of Finalists.
]]>In summary then, teams' current competition rankings are most associated with their Scoring Shot Concession and Production, and their 1st and 4th Quarter performances, and least associated with their Opponents' and Own Scoring Shot Conversion performances.
Next, a quick update on the comparison between teams' winning rates and the rate that their Scoring Shot statistics would imply (according to the MoS Win Production Function).
Teams that have won fewer games than expected:
Teams that have won more games than expected:
Ranking teams based on their expected winning percentage would give a top 8 as follows
So, what am I referring to?
In the blog post in which MoSSBODS was introduced, I talk about the various components of the Rating System, which comprises:
Currently, in estimating a particular team's expected Scoring Shot (SS) production against a specified opponent at a specified Venue, I use this formula:
Expected SS = Overall Expected SS + (Own Offensive Rating - Opponent Defensive Rating) + Own Venue Performance value + Own Travel Penalty
For the home team, the Own Venue Performance value is typically positive and the Travel Penalty zero since they're usually playing in their home State. For the away team, the Own Venue Performance is mostly zero, sometimes negative and only occasionally positive, and the Travel Penalty either 0, if the game is in its home State, or -3 Scoring Shots otherwise.
In this formulation then, the Travel Penalty serves only to depress the expectation of away team SS production.
In the new formulation we instead assume that all four of the Venue Performance and Travel Penalty components have a role in determining both teams' Expected SS production.
Specifically, we now use:
Expected SS = Overall Expected SS + (Own Offensive Rating - Opponent Defensive Rating) + ((Own Venue Performance value + Own Travel Penalty) - (Opponent Venue Performance value + Opponent Travel Penalty))
This change tends to:
In other words, it means we tend to predict higher scores for both teams and in aggregate.
For example, in the 126 games already completed in the current season, the predicted home team scores are higher under the new formulation in 112 games, the predicted away team scores are also higher under the new formulation in 112 games, and the predicted total score is higher under the new formulation in 111 games.
If we analyse MoSSBODS performance so far this season we find that it has, indeed, tended to underestimate home team, away team and total scores, so this adjustment seems to be, at least, in the right direction.
We can apply this new formulation to the entirety of V/AFL history and compare the mean absolute error (MAE) performance using this and the previous formulation.
Doing that yields the following:
Lowering MAEs by that much over such a huge span of history, is fairly significant and indicative that the change is a "good thing".
Looking only at games played since 2000 sees even larger performance improvements in home team and total score forecasting, and a smaller, though still positive, improvement in away team score forecasting.
For 2016 alone, the statistics are:
For me, that's extraordinarily compelling evidence for the change, which is why I'm implementing it immediately.
It might be worth exploring whether the now 50:50 allocation of the Venue Performance and Travel Penalty components to the home and away teams is optimal, but that's a task for the off-season, not least because this allocation, mathematically, has no effect at all on MoSSBODS' margin forecasts. With the 50:50 allocation, the increase or decrease in both the home and away team's score forecasts, relative to the old formulation, are always of an identical size.
Since MoSSBODS has done relatively well on margin forecasting in this and in previous seasons, that also seems to be a desirable outcome of the switch to the new formulation.
]]>Regardless, with another round's uncertainty resolved, what we find is that most team's menu of possible finishes has diminished, though a number of mid-table teams still have quite broad vistas spanning 8 or 9 plausible ladder finishes in some cases.
Also, as we observed last week, the teams again seem to form a number of natural groupings:
So, relative to last week, we find that:
Comparing these probability assessments to the latest TAB markets reveals that the only wagers with a significantly positive edge are the Cats and the Crows for the Minor Premiership or for a Top 4 finish.
As ever on MoS of course, YMMV, especially if you've different estimates of team ability.
So, what do the latest simulations tell us about the most likely Top 2 teams at the end of the home-and-away season?
Geelong, perhaps unsurprisingly, figure prominently, appearing in 8 of the 10 most-common Top 2s, including the two commonest of all, where they finish 1st and which represent almost 30% of the simulation replicates.
The Cats aside, Adelaide and GWS each appear in four of the Top 10, and Hawthorn and Sydney in two each.
If we expand our view to Top 4s, we find that Geelong now completely dominates, appearing in 1st place in each of the Top 10. Adelaide also appears in all of the Top 10, five times in 2nd, three times in 3rd and twice in 4th, while GWS appears in nine of them, four times in 2nd, four times in 3rd, and once in 4th, while
Sydney and the Western Bulldogs appear in three each, 3rd in one and 4th in two, while Hawthorn pops up in five, once in 2nd, once in 3rd, and three times in 4th.
And, lastly if we look at the entirety of the Top 8, we find, most importantly, that no ordering has achieved an estimated probability higher than about 0.30%, but also that the eight most-common orderings all have the Cats in 1st place.
Eight of them, also, have the Giants in 2nd, and nine of the 10 have the Crows in 2nd or 3rd.
Perhaps most interestingly, all of them involve the same eight teams in a different permutation.
]]>Anyway, that request for me to start simulating the season triggered a lively, but very civil debate on Twitter about how to approach the task. The main point of contention was whether or not any adjustment should be made to a team's assumed abilities during the course of a single replicate in the simulation, on the basis of the simulated result of a game.
If, say, Melbourne were to defeat Adelaide in one simulated version of Round 15 - which would represent a surprise result given the two teams' current ratings - should Melbourne's team rating be increased and Adelaide's decreased before simulating the results for Round 16?
My view is that they should not be altered because we have no new information in the simulation about Melbourne's or Adelaide's ability, just a single simulated result which, while unlikely, is by definition completely consistent with the teams' assumed pre-game ratings. If we model victory margins as stochastic variables then, I'd suggest, we shouldn't respond to the randomness that this produces as if it were somehow a sign of improved ability. Doing so implies that we believe team ratings respond to the randomness inherent in actual results.
The counterargument seems to be that we would adjust the teams' ratings were we to see this result in real life, so we should do the same when we see it in a simulation, to which my response would be that we make adjustments when we see a result in real life because we assume there is some deterministic component in that outcome.
Just how much we assume an actual result reflects random versus deterministic elements and how much, as a result, we adjust team ratings on the basis of that single result, is at the heart of any ratings system - adjust too much and we respond excessively to randomness, too little and we ignore the real changes in team abilities over time.
But, in the case of the simulations, we know in advance that any deviations from expectation we see are due solely to chance, so there's no logical basis, I'd argue, on which to proceed otherwise.
The participants in the conversation split almost 50:50 into those who do and those who don't adjust team ratings within a single replicate, but one of the modellers who does not make adjustments (@RankingSW on Twitter) provided a subtle alternative: use the same base team ratings across every round of the simulation, but treat them as random variables, drawing a different value for each simulation run, and maybe even for each round in each simulation run, centred on those base ratings. These adjustments to ratings are made without regard to previous simulated results and are used to reflect the fact that each team's base ratings are inherently measured with error (because, for example, of the mis-alloation of "surprises" in previous actual results into random versus deterministic components).
I've incorporated this idea in the 2016 simulations, as outlined in the following section.
The simulation of each game this year has proceeded in the following manner:
This process of simulation is more complicated than that which I've used in previous years and, as a consequence, is a little slower to run. So, for now, we'll need to make do with 25,000 simulations of the remainder of the season, which take a bit over an hour to run. I might ramp up the number of simulations, if time permits, in future weeks.
Each team's simulated results are subject to three sources of variability, one that comes from the stochastic components of their base Offensive and Defensive ratings, another than comes from the modelled on-the-day variability of their Scoring Shot production around its expected value, and a third that comes from the modelled on-the-day variability in Scoring Shot conversion around its expected value (which is 53% for all teams).
That variability manifests in terms of a range of possible final ladder positions, which are summarised below in what I last year dubbed the "Dinosaur Chart".
Not surprisingly, with so many games still to be played, most teams have a relatively wide range of simulated potential ladder finishes, in some cases spanning as many as 13 ladder spots. Only the Lions, Dons and, to a lesser extent, Suns, appear to have a significantly more restricted menu.
Looking at these results, instead, as an annotated heat map, reveals more details about the relative likelihood of specific ladder finishes for each team, and the sets of teams that appear most likely to contest particular ladder position groups.
The number shown in a cell represents the proportion (multiplied by 100) of all simulation runs for that particular team that resulted in the team finishing in the ladder position indicated. So, for example, Geelong finished 1st in 45.8% of all simulation runs, more often than any other team. Since every team appears in the same number of simulations, the number also reflects the proportion of all runs in which that team finished in that position. In other words, both the rows and the columns sum to 100.
From this heat map we can see that the teams form some natural groupings:
As always, a sanity check using the prices of a professional bookmaker is advisable, and for this purpose we'll use those of the TAB.
Broadly speaking, the simulation results accord with the bookmaker's thinking, though a handful of prices appear to offer some value if you have faith in the simulation inputs and process.
Specifically:
Adelaide appears to offer particular value in the Minor Premiers and Top 4 markets, which you might interpret as either an opportunity or an error in modelling, depending on your perspective.
Lastly, let's look at what the simulations reveal about the teams most likely to occupy ranges of ladder positions.
Firstly, consider 1st and 2nd at the end of the home-and-away season.
About 14% of simulation runs finished with the Cats in 1st and the Giants in 2nd, and another 6% with the reverse ordering. Slightly more common in aggregate was a Cats / Crows 1-2 finish, though neither possible ordering of that pair was as common as Cats in 1st and Giants in 2nd.
Geelong appears in all but two of the Top 10 pairings, Adelaide in four, GWS in four, Sydney in two, and Hawthorn in two.
Next, let's expand our view to consider positions 1st through 4th.
The first thing to note is that none of these quartets is particularly likely. Even the most common of them (Geelong / GWS / Adelaide / Sydney) appeared in only about 1 simulation run in 50.
Also noteworthy is the fact that Geelong is in 1st position in all 10 of the most common quartets, and that GWS also appears in all 10, half the time in 2nd, four times in 3rd, and once in 4th.
Adelaide also makes 10 appearances, Sydney 5, Hawthorn 4, and the Western Bulldogs only 1.
Finally, we'll focus on the bottom half of the Top 8.
Here we find a much more diverse range of possibilities, though again even the most likely of them (Dogs / Hawks / Eagles / Roos) appears in only about 1 simulation run in 50.
Amongst the 10 most common results we find that:
In future weeks we might also review other aspects of the simulated results, for example the most common Top 8. For now though, analyses of that sort are fairly futile because of the high levels of variability that remains in final ladder positions. As an example of this, amongst the 25,000 simulations run for this blog post, the most common Top 8s appeared only twice.
]]>Instead, I'll just note the surprising number of teams whose winning percentage is substantially different from what would be expected based on their Scoring Shot data and the MoS Win Production Function.
Currently, we have:
Now the MoS Win Production Function was created almost five years ago, so it might be that there has been a permanent change in the way the competition rewards scoring behaviour. That's a possibility I intend to explore further during the off-season.
But, it might also be the case that some teams truly have done better (or been "luckier") and others done worse (or been less "lucky") at converting scoring behaviour into competition points and that we'll see some realignment between scoring behaviour and competition points, especially for the teams listed above, as the remainder of the home-and-away season plays out.
]]>Note that, as per MoS Tradition, I order teams in the Dashboard on the basis of competition points scored as a proportion of the maximum possible, which means, for example, that this week I have Adelaide above West Coast since their 32 points have been earned from fewer games played, despite the Eagles' superior percentage.
The only observation I'll make this week is the continued, surprisingly low correlation between teams' ladder positions and their ranking on Own Scoring Shot and Opponent Scoring Shot Conversion. After this week, the rank correlations are:
What's driving success this season is not conversion, but Scoring Shot generation and concession, the relevant rank correlations being:
Though it wasn't possible to entirely rule out a deterministic component to this phenomenon, a significant proportion of it could be explained, I argued, as simple regression to the mean.
Thinking a little more about the situation, I realised that, were regression to the mean the major cause, we'd expect to see the phenomenon appear most strongly when a team followed a win with a loss, or a loss with a win, and probably disappear entirely when a team won or lost both of the two games we're analysing.
That's the hypothesis we investigate in the following chart, in which four trios of bars are now shown for each season, the leftmost trio providing the information for situations where teams lost the two games being analysed, the rightmost trio where they won the two games, and the two innermost trios the situations where they lost in the second week after winning in the first, or won in the second week after losing in the first.
The pattern we observe here is the pattern we hypothesised, with the phenomenon mostly disappearing for the win/win and lose/lose scenarios, and appearing strongly for the win/lose and lose/win scenarios.
In the previous blog we conditioned only on the outcome of the previous game, which meant that the patterns we saw there were a mixture of two of the four trios of bars seen here. When we conditioned on a previous loss, we'd have been seeing a mix of the first and third trios, and when we conditioned on a win, the second and fourth trios.
So, consider the situation where we're conditioning on a loss in the previous week. The proportion of each first and third trio that we saw will depend on how common it was across all teams for a loss to be followed by another loss or, instead, to be followed by a win. If losses were more commonly followed by another loss, the mixture would contain more of the first trios where conversion rates are virtually identical, and the phenomenon of higher conversion rates conditioned on a previous loss would be weaker.
Conversely, if losses were more commonly followed by wins, the mixture would contain more of the third set of bars where conversion rates are more likely to increase, and the phenomenon of higher conversion rates conditioned on a previous loss would be stronger. It's important to note, however, that a mixture that contains any proportion of the third set of bars will exhibit some tendency for losses to be followed by higher conversion rates.
A similar line or argument can be put forward for the situation where we're conditioning on a previous win.
So, the last piece of the analysis is to look at the relative proportions of loss/loss versus loss/win in the last two games, and of win/win versus win/loss.
What we find is that, even in those years where non-alternating results are relatively more common (eg 2006), it's still the case that losses are followed by wins about 40% of the time, and wins followed by losses about 40% of the time. As a result, we still see compelling evidence of regression to the mean when we condition on a previous loss or a previous win.
Enough.
]]>Previous analyses on the topic have revealed that:
All of the effects are small in magnitude, and it remains the case that the overwhelming majority of the game-to-game variability in a team's conversion rate appears to be largely a function of random factors (or at least of factors I've no so far explicitly considered).
So, I was a little surprised when I decided to investigate the relationship between teams' conversion rates in successive games, in one case after a win, and in the other after a loss.
In the chart below we have the results for the past 20 home-and-away seasons, including the partially completed 2016. For each season we have two sets of bars, the trio of bars on the left showing the number of games for which a team's conversion rate fell, increased, or stayed the same in the game played immediately after a win, and the trio on the right showing the same data for the game played immediately after a team's loss. We exclude drawn games from the analysis because their inclusion serves only to clutter the chart within altering its story.
And, that story is incredibly clear. Teams tend to convert at higher rates after a loss, and lower rates after a win, and this has been the case for every one of the seasons shown.
Now maybe this is just a recent phenomenon, so let's get more ambitious and produce the chart for the entire history of the sport.
If you click on the chart you'll access a larger version of it, a careful review of which will reveal that in all but about 7 or 8 seasons the same phenomenon holds. Win a game, and then expect to convert at a lower rate next week; lose and do the opposite.
So, what's going on here?
Most likely, it's an interesting example of a statistical phenomenon known as regression to the mean, which is used to describe what happens when random variables are repeatedly drawn from the same bell-shaped distribution and, in one draw, an exceptionally large or small value is selected. As a matter of pure logic, the draw after that will, more often than not, be closer to the mean than further away from it.
If you picture in your mind's eye the Normal distribution and imagine drawing a point somewhere, say, well to the right of the distribution you can see that more of the probability density of the distribution lies nearer the mean (the peak of the "bell") than further away from it, out in the tails.
How that applies to the current situation is as follows. If we know that a team has won a particular game then we also know that it's more likely to have converted at a relatively high rate, as evidenced by the chart below, which tracks the average conversion rates of winning and losing teams in each season from V/AFL history.
So, if a team's conversion rate can reasonably be modelled as a bell-shaped random variable, then it's likely that the rate of a team that won in the previous week will regress towards the mean in the following game. So, teams that won their previous game will tend to see a reduction in their conversion rate in the following week. We can argue similarly for teams that lose to conclude that such teams will be more likely to record a higher conversion rate in the following week. This might well be enough to explain the phenomenon we observe.
However, the structure of the V/AFL draw, which tends to have teams playing at home and then away in successive weeks, might tend to exacerbate the phenomenon given the fact that, as noted above, home teams tend to convert at higher rates than away teams, and, partly as a result, tend to win more often than they lose.
In an attempt to remove this component of the phenomenon I reanalysed the data, this time excluding as a permissible "previous game" any home team wins or away team losses.
Though seasons like 1999, 2011 and 2016 no longer faithfully follow the pattern we observed earlier, there's still a strong tendency in most years for the phenomenon to appear.
Pure regression to the mean seems to be a clear component of what we're seeing.
To investigate just how much it might be contributing, divorced from any considerations of the V/AFL draw, for the final piece of analysis I simulated the scores of 50 seasons each involving two teams of identical ability playing 200 games, using the theoretical team scoring methodology I developed in this and subsequent blogs in 2014.
For the current version of the simulations I assumed that the two teams had identical Scoring Shot and Conversion distributions (ie same means, same shape parameters etc). Given that assumption, any tendency to see lower conversion rates after wins and higher ones after losses can be attributed solely to regression to the mean.
We find evidence of pure regression to the mean for losing teams in 48 of the 50 seasons, and for winning teams in 49 of the 50 seasons.
It seems clear that a substantial proportion of the observed phenomenon of winning teams tending to convert at lower rates in their next game, and losing teams converting at higher rates can be explained by nothing more than regression to the mean, ignoring any contribution that might come from the nature of the V/AFL draw or other factors.
We can't entirely rule out other, non-random explanations for some or all of the phenomenon - it might be, for example, that heavier emphasis is placed on goal-kicking accuracy after losses than after wins, and that this focus reveals itself in the phenomenon we've observed.
In the end, it comes down to a judgement call about the relative contributions of random and non-random factors, but the fact remains that a large proportion of what we've observed could be explained largely by random factors alone.
]]>Not surprisingly then, the Hawks are one of four teams assessed by MoS' Win Production Function as having won at least one game more than their scoring statistics would justify. In full, that list is:
Six more teams have won at least one game fewer than their scoring statistics imply:
Had results gone according to the Win Production Function (and fractional wins were possible), the Top 8 would now be:
Let's have another look this week at the metrics on which each team is ranked most differently compared to their Competition Ladder position (minimum 5 places different):
Recently, one such follower was wondering about the revealed difficulty of Hawthorn's schedule this year (ie how good were the teams Hawthorn faced at the time they met them, not as assessed at the start of the season) and another was musing about the extent to which any draw imbalances tended to even out - or at least reduce - over a sequence of seasons. I thought it might be interesting to bring these two notions together and look at revealed schedule strengths across multiple seasons.
This analysis will allow us to investigate how "even" the schedule has been for different teams across those seasons, and which teams have endured the least and most challenging schedules during that period.
Firstly though, we need a working definition of draw imbalance.
A draw or schedule might be said to be imbalanced over some period if the average strength of the opponents faced during that period, at the venues where they were played, varies across teams.
Given that definition, imbalance in the AFL scheduling is a nearly inevitable practical consequence of the decision not to employ an all-plays-all home-and-away regular season. While it might be theoretically possible to, in some seasons, find a schedule in which teams play only 22 of a possible 34 games and yet still all face identically challenging sets of opponents, in practice this is almost certainly an impossibility.
And, in any case, it's overtly not the intent of the AFL to construct balanced schedules for teams within a season - the draw is biased towards including games played between teams of roughly equal demonstrated abilities from the previous season (eg 14th vs 15th), and biased against probable mismatches (eg 1st vs 18th). If you're going to discard over one-third of all possible matchups (108 of 306), the AFL would contend, better to eliminate games that are less likely to result in narrow victory margins.
If we can't expect to see parity within a single season, we might though expect to see something closer to it across multiple seasons, certainly for teams whose abilities have waxed and waned across the full range during the period. So, let's investigate that.
Any notion of balance requires a quantification of team abilities and for this purpose I'll be employing the MoSSBODS Team Rating System, which estimates teams' offensive and defensive abilities on the basis of the Scoring Shots they create and concede, and the quality of the attacks and defences against which they do so.
We'll look at the 2000 to 2015 Home-and-Away seasons and ask: what was the average Combined Offensive and Defensive Ratings of all the opponents faced by a given team at the time those teams were played?
But schedule strength is not just about which teams were played, but also about where they were played, because teams' performances are indisputably affected by venue.
Now MoSSBODS, as well as estimating teams' underlying abilities, also estimates the enhancement or inhibition of those abilities that occurs when they play at different venues. It makes this assessment not just for each team at its home ground or grounds, but for every team at every ground where it has played at least 30 games (before then the assumed effect is zero).. These measures are called Venue Performance Values in MoSSBODS.
Also, MoSSBODS adds 3 Scoring Shots to the Rating of a team when it hosts a team from another State. This is known as the Travel Penalty in MoSSBODS and is added to the Venue Performance Values for the competing teams to arrive at a Net Venue Effect.
We will incorporate these aspects of the scheduling into our assessment of schedule strength by calculating the average Net Venue Effect (from the opponent's viewpoint) for all games played between any pair of teams. Put another way, the only element of the MoSSBODS Rating System we'll be excluding from our assessment of the schedule strength of a particular team will be MoSSBODS' assessment of that team's own underlying strength when playing at a neutral venue.
So, we have as our measure:
Average Venue-Adjusted Opponent Strength = Average Opponent Combined Rating + Average Net Venue Effect
The units for this measure, as they are for all elements of MoSSBODS, are Scoring Shots (SS).
The table at right provides the summary details for every team for the 2000 to 2015 period.
It's sorted based on each team's Average Venue-Adjusted Opponent Strength, which reveals that the Western Bulldogs, based on this measure, have faced the most difficult revealed home-and-away schedule across the 16 seasons. On average, venue-adjusted, their opponents' have enjoyed an 0.8 Scoring Shot (SS) advantage per game. This compares with the all-team average of 0 SS.
Now in some sense, that advantage is not entirely of the Dogs' opponents' doing, since the Average Net Venue Effect is an amalgam of:
The Dogs, like the four teams immediately below them on this list, suffer from the fact that they share a home ground with other Victorian teams. This means that, when facing those teams, they'll not enjoy the net benefit that a team such as Geelong, for example, will enjoy because they play at a venue where their opponents have less experience and will therefore probably have negative Venue Performance Values.
At the bottom of this list are the teams with the easiest schedules. It's interesting to note that, the Cats aside (who are known to be formidable at Kardinia Park but who also play a lot of games at Docklands and the MCG and play slightly better than expected at these venues too), the teams there are non-Victorian teams.
They're not there because of the 3 Scoring Shot Travel Penalty - they suffer that about as often as they enjoy it - but because they do relatively better, Travel Penalty aside, when playing away than do their opponents when travelling to these teams' home grounds. There seems to be a clear advantage in having a home ground that is exclusively, or almost exclusively, your own.
Another interesting aspect of this table is the correlation between the teams' average strengths and the unadjusted average strength of the opponents they face. As alluded to earlier, the AFL tries to skew the draw to ensure that strong teams meet other strong teams more often than they meet weak ones, and that weak teams tend to face other weak teams. This being the case, you'd expect a correlation between team and opposition strength, and that is indeed what we see if we correlate the data in the third and fourth columns of the table: the overall correlation is +0.42. That's not huge, but it's clearly non-zero.
That correlation disappears - indeed, reverses - once we incorporate Net Venue Effects, however, which is something I'd argue it is more difficult to hold the current AFL schedulers accountable for. This aspect of the imbalance was locked in, perhaps inadvertently, when the decision to move to shared home grounds for Victorian teams was taken.
We can dive a little deeper into some of the imbalances by looking at the statistics for every possible team pairing, which is what we do in the table that follows (click on it for a larger version).
Each cell in this chart contains three numbers: the average opponent strength and average Net Venue Effect across all games played between the teams named in the row and column, and, in brackets, the number of times they've played. A cell's colour is based on the sum of the average opponent strength and Net Venue Effect and is more red the larger is that sum (meaning the opponent was "tougher"), and more green the smaller is that sum (meaning the opponent was "easier").
According to MoSSBODS, the Dogs' challenging draw has been especially attributable to their encounters with Geelong, who've been Rated +2.92, on average, prior to their games, and who has enjoyed a +2.43 average Net Venue Effect advantage over the Dogs - a combination of the Cats' relative strength at Docklands and Kardinia, and the Dogs' lesser strength at both of those venues. The Dogs have also faced significant venue-adjusted opponents in Sydney, Collingwood and Adelaide.
In general, you can get a sense for how lucky or unlucky a team has been when facing a particular opponent by running your eye down the first set of numbers in any particular Opponent team column. For example, if we look at West Coast we can see that Port Adelaide have played them when, on average, their Combined Rating has been -0.56 SS, while Richmond has played them (albeit 3 times fewer) when, on average, their Combined Rating has been +1.29 SS. This is another thing that the AFL schedulers can't do much about, especially when teams perform much better or worse than their previous season's performances would have foreshadowed.
Similarly, you can get a sense of the relative abilities of a team at and away from home against different opponents by scanning across a row. If we review the Dogs' row, for example, we can see that they suffer a Net Venue Effect deficit against 15 of the 17 teams, which suggests that they don't travel well, don't perform at home especially well relative to most opponents, or a combination of both.
I'll finish the analysis of this table by pointing out the major imbalances for each team (excepting GWS and Gold Coast) in terms of pure matchup counts across the 16 seasons:
There are some quite substantial differences in that list, driven in many cases, of course, by the desire for "local derbies" or to perpetuate (or instigate) "traditional rivalries".
Another of my Twitter followers has suggested that he believes a natural "epoch" in the AFL lasts for about eight years so, for the final piece of analysis, I've replicated the previous table but using data only from the home-and-away seasons of 2008 through 2015.
The ordering isn't all that different excepting that, most notably:
No other team's schedule difficulty ranking changes by more than 3 places.
For me, the key conclusion from all this analysis is that, across the last 16 (or even 8) seasons, teams have endured schedules of varying average strength. Not all of this difference can be attributed to a team's generally above- or below-average ability across this period and the AFL's desire to match them with teams of similar talents.
Other factors that contribute to the variability in schedule strength are:
Some of the differences between teams' schedule strengths are clearly material. The gap between the teams at the top and bottom of the list, the Dogs and the Crows, for example, is almost 2 SS per game, or a little over a goal. That's enough to account for a sizeable difference in expected winning rates. But, as I said earlier, that's not entirely due to pure scheduling - some of it comes down to how well or poorly the Dogs and the Crows perform at and away from home.
Such imbalances will never go away, I'd suggest, but identifying and quantifying them is an important component of assessing any team's performance.
Of course all of this analysis is founded on the assumption that the MoSSBODS System does a reasonable job of estimating team strengths and venue effects. If anyone reading this has suggestions for other ways that schedule imbalance might be measured (or even defined), I'd be very keen to hear about them.
]]>They remain, according to the MoS Win Production Function, almost two wins and five ladder positions better off than their Scoring Statistics would suggest.
The other teams whose Scoring Statistics imply win-loss records most different from reality are:
Teams with better then expected actual records
Teams with worse then expected actual records
I note also in passing that Geelong's drawing of their 4th Quarter with Carlton on Sunday has left GWS as the only team with a 100% record for any quarter They've won all 10 of their 2nd Quarters this season.
]]>Correlation Between Own Scoring Shot Production and the Competition Ladder: +0.87
Biggest Differences
Correlation Between Opponent Scoring Shot Production and the Competition Ladder: +0.86
Biggest Differences
Correlation Between Difference in Own and Opponent Scoring Shot Production and the Competition Ladder: +0.94
Biggest Difference
Correlation Between Own Scoring Shot Conversion and the Competition Ladder: +0.47
Biggest Differences
Correlation Between Opponent Scoring Shot Conversion and the Competition Ladder: +0.26
Biggest Differences
Correlation Between Difference in Own and Opponent Scoring Shot Conversion and the Competition Ladder: +0.58
Biggest Differences
Correlation Between Q1 Performances and the Competition Ladder: +0.87
Biggest Difference
Correlation Between Q2 Performances and the Competition Ladder: +0.80
Biggest Difference
Correlation Between Q3 Performances and the Competition Ladder: +0.65
Biggest Differences
Correlation Between Q4 Performances and the Competition Ladder: +0.74
Biggest Differences
]]>
For those of you who might be unfamiliar with the term, a "bogey team" is, loosely, thought of as being a team that seems to beat your team more often than they should, even when they're the less-fancied of the pair.
That somewhat intuitive description lends itself to a natural, quantifiable measure of what I'll call Bogeyness, a measure for which I'll define as the difference between the number of times one team has beaten another and the number of times we might reasonably have expected that to occur.
As an equation then:
Bogeyness Score = Actual Wins - Expected Wins
So, for example, if your team has played another team on four occasions and been equal-favourites on all four, then we'd expect both teams to have won, on average, two games. If your team lost all four then the Bogeyness Score for your team against this opponent would be -2, since your team had won 2 games fewer than expected.
Calculating Bogeyness Scores for all pairs of teams across the period 2000 to the end of Round 8 2016, using MoSSBODS to calculate each team's victory probability in every game, yields the chart below.
In this chart, the size of the dot is proportional to the number of games played between a pair of teams, and the colour of that dot denotes the Bogeyness Score. If we review the results for any given team (ie scan across a row), the dark green dots represent that team's "bogey teams" - the opponents that have won more often than they should have.
We see for Sydney then, for example, that Richmond is something of a "bogey team" but less so than Collingwood and only about as much as Adelaide and Geelong. Sydney themselves, however, are "bogey teams" for Brisbane, Carlton, Port Adelaide and West Coast during the period we're reviewing.
Hawthorn is a "bogey team" for Adelaide, Carlton, Collingwood, Fremantle and Melbourne, but has underperformed, relatively speaking, against Geelong, Port Adelaide and Richmond.
If you support a particular team you'll probably have your own views about which are your bogey teams and whether this chart aligns with your perceptions. I'd be keen to hear your feedback on this.
There are a few things we need to consider when interpreting the Bogeyness Scores shown here.
The first is that it's a mathematical and statistical fact that the Bogeyness Score will tend to increase with the number of games played between two teams (in the same way that the expected number of excess heads over tails in a series of coin tosses increases with the number of tosses (also here). Since most pairs of teams in this chart have played each other a similar number of times across the time period we're considering, that's not such an issue here, except for GWS and the Gold Coast.
The second is that the Bogeyness Scores shown here depend partly on the accuracy of MoSSBODS probability assessments in each game. If MoSSBODS is poorly calibrated then the Scores shown here are likely to be more variable than their "true" values.
Now MoSSBODS has been by no means perfectly calibrated across the 2000 to 2016 period but, as the chart at right reveals, it's not been especially poorly calibrated across any part of the home team probability space either.
In an ideal world the line on this chart would be a 45-degree line passing through the (50%, 50%) point, since such a line would imply that, when MoSSBODS assigned the home team a probability of X% it would win, on average, X% of the time.
We see a few imperfections, such as the fact that home teams assigned a 50% probability tend to win a little over 50% of the time, but the overall picture is of a reasonably well-calibrated probability forecaster.
So, I think it's reasonable to say that MoSSBODS, overall, is doing a sufficiently good job at estimating team victory probabilities.
That's not to say, of course, that it might not systematically be under- or over-estimating the chances of individual teams or in specific contests, but the difficulty in determining this comes from the challenge of differentiating pre-game miscalculation of a team's chances across a series of games and the genuine underperformance of that team in those games. Put another way, it might truly be the case that a team was, objectively, a 75% chance of winning all 10 contests against some other team, but that it won only 3 because it consistently underperformed in those games.
Lastly, in interpreting the Bogeyness Scores, we should recognise that deviations from expectations are to be expected, and that some of those deviations will be large due solely to chance. In my view, the notion that a team "plays above themselves" or "has a hoodoo" when playing some other team or at a particular venue, across a reasonably long period of history, is a highly suspect notion. It might be the case that, in the short-term, one's teams style works particularly well against another, otherwise better-performed team, but it's unlikely that this will be the case over an extended period.
With that in mind, I'd be inclined to consider most of the larger (in absolute terms) Bogeyness Scores here to be partly a reflection of short-term advantages resulting from a beneficial mismatch in styles or a well-suited venue, but mostly a reflection of random variation.
They're still fun to look at though.
What if we take a longer view of history, go right back to 1897, and review the Bogeyness Scores across that expanse of time?
If we do that, we arrive at the chart below, prepared on the same basis as the earlier chart.
I noted earlier that larger absolute differences between actual and expected wins were more likely for pairs of teams that had played more often, and we see the empirical evidence for this fact in the chart above where darker red and darker green points tend only to be larger points, and where smaller points tend towards the yellows and oranges.
One way to control for this is to standardise the Bogeyness Scores by dividing them by the square root of the number of games played between the relevant teams, which is what I've done for this final chart
Taking this view we see that Carlton has consistently over-performed relative to expectations when playing St Kilda, as did Fitzroy against South Melbourne, and South Melbourne against St Kilda.
All the same interpretational caveats apply to these charts as they did to the chart for 2000 to 2016.
MoSSBODS calibration, however, is even less of a concern for this long expanse of history than it was for the 2000 to 2016 timeframe, as evidenced by the chart at right, which represents a model about as empirically well-calibrated as you might wish for.
So, certainly, some teams tend to do better or worse than expected against other teams, even across relatively large expanses of time, but the extent to which that means one team is the other's "bogey team" or that it's merely the natural outcome of a random process, is a topic worthy of further analysis and discussion.
]]>Meantime six of the seven teams filling the bottom spots on the ladder have themselves conceded, on average, at least 100 points a game with Essendon, the lone exception, only falling five points short of joining the list.
MoS' Win Production Function now has it that, based on the teams' scoring statistics:
A competition ordered by expected number of wins would look like this:
]]>
Robert Nguyen was kind enough to link to MoS in this piece on the importance (or otherwise) of defence in AFL (the MoS link is the first one in the article). Thanks Robert.
It's been quite a couple of years for MoS.
]]>Earlier this week I noticed some odd referrer URLs in the traffic to the site, URLs that were blocked from public access but that looked as if they might be coming from the servers of our national broadcaster here in Australia, the Australian Broadcasting Commission (or the ABC) affectionately referred to as "Aunty".
Mid-week, the source of that initial traffic became clearer when this article by Jack Kerr was posted on the ABC's The Drum website.
It's a little odd being referred to via the phrase "the likes of Matter of Stats" and I can't really say I agree with the main thesis of the article, but mainstream coverage it is, and that's not to be sniffed at.
]]>It's entirely possible that I've broken something with these changes, so I'd be grateful if you let me know of any glitches you encounter. I'm also happy to receive any feedback - good or bad - about the changes generally, and to hear about suggestions you might have for new, different (or less!) content.
MatterOfStats has now been visited by people from 146 different countries, the latest a visit from someone in Saint Bathelemy in late December. There are still though, according to Flag Counter, 95 countries that have yet to visit, amongst them Afghanistan; Andora; The British Virgin, Christmas, Cook, Faroe, Falkland and Virgin Islands; Guernsey; The Isle of Man; Samoa; Sudan; Swaziland; Togo; Tonga; Tuvalu; Vanuatu; and The Vatican City (go figure). I'm hoping that, sometime during the next 12 months, a visitor from the 150th country will stop by MoS, but the time between new countries has been extending significantly. (For some interesting, somewhat related maths on this, Google the Coupon Collector's Problem.)
There's now also been at least one visitor from every State In the US (most of all from California), and from 8 of the 13 Canadian regions, the missing ones being Newfoundland and Labrador, Northwest Territories, Nunavut, Prince Edward Island, and Yukon Territory.
In a few weeks it'll be exactly 10 years since I wrote the first e-mail to a handful of friends, summarising the predictions made by now long-retired The Model. To think that MoS has since then touched, however lightly, on the lives of people in almost 150 different countries frankly amazes me. Thanks for being one of them.
]]>Such observations, while fun to marvel at if only for the ingenuity of their conception and the breadth and volume of the raw data that fuels them, can delude the casual fan into a misunderstanding of the nature of a truly analytic approach. For an observation like this - a statistic or rule - to be useful in an analytic sense, it needs to be reliably related to some outcome measure of interest, such as who wins, by how much, how many points are likely to be scored, and so on. The more tortured and specific an observation, the less likely it is to serve this function.
So it was interesting to me in the latter stages of the 2015 AFL competition to see fans and pundits trotting out statistics about, for example, the comparative success of Minor Premiers, playing at home in Preliminary Finals, as if this information somehow trumped the fact that the revealed form of the year's actual Minor Premiers, Fremantle, suggested that they were a much weaker Minor Premier than just about any we'd seen in recent seasons. Knowing how a particular team has performed in a particular season with mostly the same players will always be more informative about their chances in a given contest than any number of statistics about the performance of other teams in other seasons whose main similarity extends only to where they happened to finish on the ladder.
To talk like a Bayesian for a minute (which, I'll be honest, feels a little fraudulent for me to do, trained as I was almost solely in the Frequentist tradition) it's about building an informed prior based on an appropriate weighting of relevant empirical evidence. Not every relationship derived from history deserves equal weight - some probably don't even deserve a non-zero one.
I've been struggling to come up with a useful way of thinking about these different types of empirical evidence presented as observations and, inspired by the useful distinction highlighted in this paper, landed on the terms that I've used in the title of this blog, namely Predictive Observations and Descriptive Observations.
A Descriptive Observation is one that draws a comparison between some current situation and similar situations from the past - for example, other teams that have finished in the same ladder position or played on the same ground. Similarity can be narrowly or broadly defined here, but is key to determining how informative the Observation is about the current situation. Every Observation is a Descriptive Observation, but they are Predictive Observations too only when the nature and level of the similarity is such that the Observation improves our ability to make predictions about the current situation. A Descriptive Observation can be interesting and apparently unlikely, but still have little or no predictive utility.
Once we've satisfied ourself that some Observations have predictive value, we can order them based on the degree to which they improve our ability to make predictions. In statistical modelling terms we can think of Observations as variables, and draw a parallel with the notion of variable importance. That line of thinking alerts us to the fact that there is no single measure of importance, and that the notion can be conceptualised in a number of slightly different ways. All the approaches and measures, however, are grounded in the idea that importance relates to how much better we can predict with the variable (Observation) than without it.
Ultimately then, whether or not a Descriptive Observation is also a Predictive one is a purely empirical matter, but there are some heuristics we can employ to help us make a rough assessment of whether or not an Observation is predictive and, if so, how much:
You might have other heuristics that could be added to this list. If so, please let me know and I'll include them. Any general thoughts or feedback is also welcomed.
In the meantime, cast a critical eye over the Descriptive Observations that you encounter and see how many of them you think would qualify as Predictive ones too.
]]>To estimate a probability for each of those outcomes we first need to update the team-versus-team probabilities, which I've done and summarised in the table at right.
(Those of you following these simulations closely might notice that Fremantle's and West Coast's probabilities have not improved this week, despite their victories. That's because I erroneously calculated last week's probabilities using teams' ChiPS instead of MARS Ratings. These two sets of Ratings being so highly correlated this didn't make a huge difference in the simulated outcomes, but it did somewhat overstate the Dockers' and Eagles' chances, and understate the Hawks'. We'd still have found value for the same teams in the TAB markets for the Flag, Making the GF, and Losing in the Prelim, though we'd have only found in the GF Quinella market for an Eagles v Hawks GF, and not Freo v Eagles, or Hawks v Tigers GFs.)
Using this new probability input matrix which uses, as it should, MARS Ratings, the simulations provide fresh Team Progression probabilities as shown at left. West Coast have now become the simulations' clear favourites for the Flag, a view that is shared by the TAB Bookmaker who now has the Eagles at $2.75. According to the simulations that price represents value, and is the only one that does so in the Flag market.
In the market for Making the GF, West Coast at $1.35, Sydney at $5, and Hawthorn at $2.10 all represent wagers with positive expectations, but only the Hawks provide the minimum 5% edge that I've been using as a threshold for worthiness. Fremantle at $1.90 and Sydney at $2.10 are the only teams offering value in the Losing in the Prelim market.
A West Coast v Hawthorn rematch is still estimated by the simulations as the most-likely GF pairing, though the prospects of a Fremantle v West Coast matchup have almost tripled to have it only slightly less likely. Both of these pairings are currently priced at $2.50 on the TAB and so do not represent value.
Of the remaining seven possible pairings, Hawthorn v Sydney, and Fremantle v Sydney matchups are assessed as being most likely by both the simulations and the TAB, though neither represent value. The five other possibilities are relatively remote and not priced attractively.
]]>Over the 15 years from 2000 to 2014, teams that finished higher on the ladder have defeated teams that finished lower on the ladder:
Combined, that means teams higher on the ladder have won 76% of the time.
So, when you build a statistical model based on this data, it places a heavy emphasis on the designated Home team in each game (ie the team from higher on the ladder). That's reflected in the large intercept term in the binary logit model I've built using the Finals data since 2000, which is:
ln(Prob(Home Wins)/(1-Prob(Home Wins)) = 9.77 + 0.02754 x Home Team MARS Rating - 0.03642 x Away Team MARS Rating
What's curious about this year is that the Minor Premiers, Fremantle have a MARS Rating that ranks them only fifth amongst their peers, which, when you use the model, despite its bias towards designated Home teams, a status that Fremantle will always enjoy, makes them slight underdogs against the Eagles, more substantial underdogs against the Hawks, only slight favourites against Richmond, and only comfortable favourites against the four other Finalists. These and all the other team-versus-team probabilities are shown in the matrix at right, which forms the sole input to the simulation process.
This input probability matrix implies the output probabilities shown in the table at left for each team's progression through the Finals series. It has West Coast and Hawthorn as both having about 60% chances of making the Grand Final, and West Coast as being slightly more likely of winning it. Fremantle, largely because of the modelled benefit of being Minor Premiers, but also because of the fact that they have the double-chance, are the third-most likely team to play in the Grand Final, and the third-most likely to win it.
West Coast, which at the time of writing is priced at $4.25 for the Flag on the TAB, is the only team representing value in that market. They also represent value at $2.15 in the market for just making the Grand Final, as do Fremantle at $2.50, but with a far less-attractive implied edge (only 3.5%).
Two teams represent value in the market for the losing Preliminary Finalists: Sydney at $2.05, and Richmond at $5.25.
Looking at Grand Final pairings, a West Coast v Hawthorn matchup is comfortably the most likely, it carrying a probability of about 36%. The two next-most likely matchups, each with about a 15% probability, are Fremantle v West Coast, and Fremantle v Hawthorn.
At current TAB prices that makes only the following pairings attractive:
As always, the extent to which you'll find these simulations appealling will depend heavily on how consistent your own assessments are with the probabilities shown in the team-versus-team matrix above.
On that front, I feel obliged to point out that, had you followed the wagering suggestions from previous simulations running up to the Finals this season, you'd have recorded a 17.89 unit loss on the 36 wagers recommended. It's really not been a great year for MoS wagering ...
]]>
In the latest round of simulations, West Coast finished top 40% of the time, which makes their $2.90 price on the TAB look attractive, though to believe that you need to share, to some extent at least, ChiPS' views about the Dockers' and Eagles' prospects in the two remaining games, which are included in the simulation Input Matrix shown at right.
As you'll note, ChiPS continues to rate Port Adelaide a genuine chance of toppling the current competition leaders in their Round 23 matchup, though it does also rate the Dockers as near-certainties in their Round 22 clash with the Dees. The Eagles, by comparison, are assessed by ChiPS as slight favourites for their Round 22 game against the Crows, but as overwhelming favourites for their Saints matchup the following week.
Only five teams now have significant chances of finishing somewhere in the Top 4, Fremantle, West Coast and Hawthorn all now certainties or nearly so, leaving Sydney and Richmond to quibble over 4th. The simulations have the Tigers claiming that spot only about 1 time in 5, but that's enough to have their $6 price at the TAB carrying a positive expected return.
The Dogs and Roos (and even the Crows) aren't completely without hope of finishing Top 4, but it's a 40/1 shot for the Dogs, and a 100/1 shot for the Roos.
In the battle for a Finals spot, seven of the eight places are all but spoken for, the Crows and Cats now scrapping for 8th with the Crows 4/1 on favourites.
Carlton's win over Melbourne dramatically tilted the odds in the Spoon market in favour of the Lions, now assessed as almost 85% chances of finishing last.
The week's Dinosaur Chart and heat map reveal an accelerated reduction in the number of possible ladder finishes for each team, with only Adelaide, the Kangaroos, Port Adelaide and the Western Bulldogs retaining four viable options.
Two possible orderings now account for over 90% of the simulation's 1-2 finishes, about 57% seeing Fremantle as Minor Premier and West Coast as Runner Up, and another 34% seeing the same two teams filling these positions, but in the opposite order.
Hawthorn sneaks into second in most of the remaining replicates, finishing behind West Coast in 6.5% of replicates, and behind Fremantle in about 3%.
Sydney surprises by finishing second in about 3 replicates in 1,000, and tests the limits of impossibility by finishing first in about 1 replicate in 15 to 20,000.
The most common Top 4 from the simulation is Fremantle / West Coast / Hawthorn / Sydney and occurs in over 40% of replicates, while the next-most common, which flips the ordering of the top 2, occurs in about 1 replicate in 4.
Richmond sneaks into fourth in the two next-most common outcomes, these accounting for another 19% of replicates.
After that we see a number of unusual quartets, including one with the Dockers finishing fourth (4% of replicates), and another with the Dogs finishing fourth (1.3% of replicates).
Some meaningfully most-common Top 8s are now emerging from the simulation process, including one which cropped up in 8% of all replicates, and four in total, accounting for almost one-quarter of all replicates, which involve the same eight teams in different orders.
It's not until we reach the fifth-most common Top 8 that we find one with a different team, this one including the Cats at the expense of the Crows and appearing in just under 4% of all replicates.
This week we'll explore the interdependencies in the Finals fates of the teams by focussing mainly on 8th and 9th positions, looking firstly at which teams occupy 8th when another occupies 9th.
From this table we read, for example in the first row, that in those replicates where Geelong finishes 9th (which represent 44% of all replicates), 60% of the time it does so with Adelaide finishing 8th, 23% of the time it does so the Kangaroos finishing 8th, and 16% of the time it does so with the Western Bulldogs taking the last Finals position.
That 60% figure for Geelong's row and Adelaide's column, along with the 93% figure in Adelaide's row and Geelong's column, highlights that the main remaining battle for the final spot in the 8 is between the Cats and the Crows.
Expanding our viewpoint just a little, we can also review the 50 most-common orderings from the simulation for ladder positions 5th through 9th, where we find that three of the four most-common orderings, accounting for over one-quarter of all replicates, and five of the eight most-common orderings, accounting for over 40% of all replicates, see Geelong finishing in 9th position. In these, mostly it's Adelaide in 8th, though it is also the Roos in just over 4% of replicates.
]]>In that sense then, no single replicate can purport to be a representation of a variant of season 2015 that we would ever expect to see in real life. That is, I think, an absolutely fair point, so it would be wrong to characterise the outputs in that way.
So then, how should we interpret them and in what, more limited, sense do they tell us something about:
Firstly, let's remind ourselves of what we're doing in the simulation process. In running the simulations we're implicitly assuming that
Historical data allows us to confirm the unbiased nature of the Bookmaker's opinions in 1. and to estimate the size and distribution of the deviations in 2. What that history has shown us is that the deviations are distributed as a Normal random variable with a standard deviation of about 36-38 points. Numerous posts have found that, on balance, this standard deviation is the same for all games with perhaps a few exceptions.
If we agree that this is a reasonable basis on which to model the outcome of a single game then it's fair to say that, for any single replicate, the outcomes of all 170 games are entirely consistent with what the Bookmaker might have expected for each game this season given his pre-game beliefs about the relative strengths of the teams and his understanding of the inherent variability of a game in the AFL. It's true, as we've agreed, that, were the games played in the same sequence as they have been in 2015, he would almost certainly have revised his opinions about the relative strengths of the teams on the basis of earlier results but, taken individually, no single result should be excessively "surprising".
So, I think it's fair to say that each replicate represents a set of outcomes for the 170 games that is entirely consistent with the 2015 TAB Bookmaker's pre-game opinions for each of those games. The question is whether you believe adding the results for each team and treating it as a legitimate replicate of the entire season so far provides any useful insight into the two questions I posed at the start of this blog. That is, I think, mostly a matter of personal preference and I find myself developing cogent arguments for both the defence and the prosecution.
To diverge for a moment though, as I noted in my reply in the Comments section though, there's no doubt that the choice of the standard deviation for the random element of the result of each game has a significant bearing on the variability in team results that we observe across replicates. To get a sense of just how large an effect this assumption has I've re-run the simulations, assuming that the standard deviation of the random component is only half as big (ie 18 points).
Here, firstly, is a comparison of team ladder finishes. We still see some results for some teams that are very different from the current ladder - for example, Essendon are in the Top 8 in 1% of replicates - but these are less common.
Note that the ordering of the teams in terms of average ladder finish is identical in both sets of simulations, these being driven by the relative average Bookmaker-expected margins for each team rather than by the variability of the results around those margins.
The Dinosaur Charts and heat maps of teams' ladder positions and the most common Top 2s and Top 4s tell a similar story - that of a reduction in the variability across replicates.
SIMULATIONS WITH STANDARD DEVIATION OF 36 POINTS
SIMULATIONS WITH STANDARD DEVIATION OF 18 POINTS
The most common Top 8s also occur more frequently with the smaller standard deviation, though still even the most common of them crops up only 0.1% of the time.
All 10 of the 10 most common Top 8s see Hawthorn finishing 1st and Fremantle 2nd, with any two of Port Adelaide, Sydney and West Coast finishing 3rd and 4th. The Roos take 6th exclusively, either of Adelaide or Richmond take 7th, and any of Adelaide, Richmond or Collingwood finish 8th.
It's always an interesting and useful exercise to think about how best to interpret the results of some simulation or analysis, no less so in the case of the simulations of the 2015 season. My sense from the feedback I've received - from Rob and from others, this having been one of the most popular blog posts on MatterOfStats for a while - is that the contention that the TAB Bookmaker would have expected, say, Port Adelaide to be higher on the ladder now, and the Western Bulldogs lower, is a reasonable one. In that sense then the simulation - or at least the mean outcome for each team - does reflect the 2015 season that the TAB Bookmaker would have expected.
Debate is more, firstly, at the level of the individual replicate and whether or not each provides a realistic version of a season. In the sense of "likely to be observed in real life", I think many replicates would be deemed "unrealistic", but in the sense of an assemblage of 170 games each individually plausible and matched to the TAB Bookmaker's pre-game opinions about one of the 170 actual games this season, every replicate is, by design, "realistic".
Finally, it's about the emergent property we call a season: if we assemble 170 realistic (in the latter sense) replicates, one for each game of the real season, is it legitimate to think about that assemblage as a replicate of the 2015 season?
That, I think, is a very interesting question and I very much thank Rob (and others) for making me ponder it. As ever, your comments are welcomed too.
]]>Further, imagine that the approach I've been using to model those in-game random elements to simulate the results of the games in the remainder of the season (as described in this post) is an adequate and faithful way of encapsulating this randomness.
Granted those conceits it's possible to simulate the season so far, using the (negative of the) TAB Bookmaker's pre-game line market handicap for each game as his estimate of the likely outcome, to explore how the competition ladder might look instead right now after 20 rounds, had this Bookmaker been right and had the random elements played out in the different ways they might consistent with the way I've modelled them, across, say, 100,000 simulation replicates.
If we do that we might then describe the outcomes of that exercise in some of the now-familiar outputs from the Finalist simulations - for example the table at right showing the probabilities that, at this point in the season, a team might be lying 1st, in the Top 4, in the Top 8, or last.
It reveals that, in over half of the replicates, Hawthorn would now lead the competition and that in just 21% would Fremantle do so, as they do in the version of the "simulation" that we're experiencing. Also revealed in the fact that there's a version of reality, where the results of games were completely consistent with the TAB Bookmaker's views and where randomness had chanced its way onto the stage exactly as we've empirically modelled it, but where the Lions currently lead the competition. Granted, it happened in only 3 of the 100,000 replicates, but it is conceivable, such is the nature and size of the random elements of football when played out across a 170-game canvas.
There are also versions of season 2015 - a bit over 1% of them - where the Dons currently sit in the Top 4 and even more versions (12%) where the Dons sit in the Top 8. And there are plausible if extreme versions of 2015 where the Swans or even Fremantle, currently sit in 18th.
One of the fascinating things about this exercise, I find, is how it makes you think about the relative contributions of luck and ability to the fortunes of the teams, and how much we tend to downplay the former and focus mainly on the latter.
The other interesting aspect is to look at how far from their expected or most likely ladder position each team currently sits. In this regard, Port Adelaide might be seen either as the most unlucky or most underperformed of the teams this season, the simulations suggesting their most-likely current position is 3rd despite the fact that they sit 12th on the ladder.
Conversely, the Western Bulldogs might be seen as the luckiest or most overachieving team, their actual ladder position of 4th contrasting with their most-likely ladder of 11th should they have performed in line with TAB Bookmaker expectations.
One commonsense check on the results is to order the teams based on the average TAB handicap that each has faced, and to compare this ordering with the average ordering in the simulations.
Those average handicaps appear in the table at right and show a very similar ordering of the teams. Port Adelaide, for example, sit 5th on this measure having, on average given their opponents 11.4 points start in the line market, and the Dogs sit 12th having enjoyed, on average, almost a 4 point start.
We can also, of course, created the usual Dinosaur Charts, though the longer-term nature of the simulations we're performing here, looking at 20 rather than, say, 4 or 5 rounds, makes for charts that are more wave-like in nature. One thing that is immediately apparent from this chart is how broad the range of feasible current ladder positions is for every team.
Another perspective on this same observation is provided by the heat map below, the vast pink swathes of which reflect that same broad spectrum of possibilities for almost every team.
In addition to looking at the possibilities for individual teams, we can also consider the most likely Top 2s and Top 4s at this point in the season, details of which are provided below.
Across the 100,000 replicates, about 1 in 5 saw the competition with a Hawthorn / Fremantle 1-2 pairing and about another 1 in 10 saw it with the same pairing but in the opposite order.
Sydney lies second to the Hawks in another 8% or replicates, while Port Adelaide sits Runner Up to the Hawks in another 7%.
None of the 10 most-common pairings includes the Fremantle / West Coast duo that we actually have in the current ladder (though West Coast does sit second behind Hawthorn in about 7% of replicates).
About 1 in 40 replicates had Port Adelaide in first and Hawthorn in second, which is yet more evidence of how different Port Adelaide's actual fate is to that which the TAB Bookmaker's opinions have implied.
All 10 of the most-common Top 4s at this stage of the season from the simulations had the Hawks lying in first, and the eight most-common of those had Fremantle sitting in second behind them.
Randomness being what it is though, and having 20 rounds to play itself out, even the most-common quartet occurred in only just over 1 replicate in 100.
Rampant randomness is even more apparent when we look at the Top 8s churned out by the simulations, no single one of which appeared in more than 8 replicates. So, even if we'd known about the entire 20 rounds at the start of the season what the TAB Bookmaker has come to know about each round just before it's taken place, and even if the competition had panned out exactly as he'd imagined in each round, we'd have had, at best about a 12,500/1 shot of seeing the Top 8 we most expected right now.
(For the record, that most-common - though it's not at all "common" and only barely "most" - ordering was Hawthorn / Fremantle / Kangaroos / Sydney / West Coast / Port Adelaide / Collingwood / Richmond).
You might, I suppose, consider all this simulation and discussion to be purely academic and esoteric and divorced from the reality of season 2015. But the results here reflect two realities: the beliefs that the TAB Bookmaker held just prior to each round, and the contribution that randomness makes to the outcome of every game of football, as near as I can empirically model it. And that, I think, makes these results, if not profound, then certainly worth reflecting upon.
If the same sets of inputs to a season can yield such a diversity of outputs we should acknowledge that the ladder we see today reflects a mixture of ability and luck - perhaps more of the latter than we sometimes admit - and that the plaudits we shower on the successful and the brickbats we toss at the apparent underachievers might sometimes be more than either deserve.
]]>Fremantle is now rated a 60% chance by ChiPS for the Minor Premiership, an assessment only slightly below that of the TAB, which at $1.30 rates them as between about 70 and 75% chances, depending on the assumption you make about the overround embedded in that price.
That's just over a 16% point reduction in Freo's chances, the beneficiary of that reduction being, mostly, the Hawks, whose probability increased by just over 13%. That assessment makes the Hawks value at their current $4.75 price for the Minor Premiership. West Coast also saw its probability for the top spot increase, but by only about 3% points.
In the race for a Top 4 spot, only two teams enhanced their chances last weekend, the Hawks who moved to near-certainties, and the Dogs who moved from 15% to 33% chances. Richmond (down 12%) and Sydney (down 5%) were the weekend's big losers, though Richmond does appear to represent value at $8 for a Top 4 finish.
Adelaide made the largest move in the Final 8 market, their victory seeing their chances lift by over 22% points (much to my relief after my recent Adelaide Advertiser coverage). They currently look value at the TAB's price of $1.28. The Dogs and the Roos also made non-trivial improvements to their Top 8 chances, the Dogs now becoming near-certainties and the Roos almost 88% chances.
Geelong (down 21%), GWS (down 10%), and Collingwood (down 6%), saw the largest falls in their Top 8 chances, though the Giants at $11 seem slightly generously priced.
In the only other market tracked here, the Spoon market, there was a big shake-up, with the Lions' victory over Carlton seeing those two now assume rough equal-favouritism for an 18th-placed finish.
ChiPS' input matrix for the 100,000 simulation replicates appears below.
The usual Dinosaur Chart and Heat Map follow and depict the ever-diminishing range of ladder finishes that each team might reasonably entertain. Adelaide, Geelong, GWS, the Kangaroos, Richmond and the Dogs (and maybe Sydney and Collingwood) are the only teams with genuine prospects for four or more different positions, all other teams being limited now to just two or three.
In the latest round of simulations, the most common 1-2 finish saw Fremantle take out the Minor Premiership and Hawthorn finish as Runners Up. That result appeared in about 36% of replicates. Next most common was a Fremantle / West Coast finish, which appeared in just under a quarter of all replicates, slightly more often than a Hawthorn / Fremantle finish, which appeared in about 22% of all replicates.
The Dogs and Swans made very unexpected appearances in 2nd place in a handful of replicates, the most unlikely of which involved a Fremantle / Sydney 1-2 finish and which cropped up in only 2 of the 100,000 replicates.
Both of the two most-common Top 4s saw Fremantle finish as Minor Premiers and the Swans grabbing 4th, the 2nd and 3rd places being taken by the Hawks and the Eagles in one order or the other. These finishes both appeared in about 1 replicate in 8.
The third-most common Top 4, which occurred only slightly less often, saw the Dogs take 4th behind Freo, Hawthorn and West Coast, in that order.
A number of the less-likely Top 4s had Richmond taking 4th spot, though none of these combinations appeared in more than about 1 replicate in 20.
Turning, lastly to Top 8s, we find that there is still no ordering with a probability exceeding 2%, and six different orderings with probabilities exceeding 1%. Fremantle finishes top in six of the 10 most-common orderings, Hawthorn 2nd in six, West Coast 3rd in six, Sydney 4th in seven, the Bulldogs 5th in five, Richmond 6th in seven, the Roos 7th in eight, and Adelaide 8th in eight.
Again this week we look, firstly, at inter-team dependencies by inspecting the simulation replicates to see how often Team A makes the Finals depending on whether Team B makes or misses the Finals.
One curiosity of this week's analysis is that the Crows' chances of making the Final 8 is actually enhanced by the Pies' making the Final 8 because, to do so, the Pies would need to defeat both Geelong and Richmond. In net terms, that turns out to be beneficial to the Crows, though it's so very unlikely that it really is mostly of academic interest.
In reality, Adelaide's chances are far more bound up with Geelong's, the Crows' prospects falling to 45% should the Cats sneak in.
Geelong's fate is about equally tied to Adelaide's, the Kangaroos', Richmond's and the Dogs', the Cats' probability climbing should any of them miss out. Realistically though, the Cats might only hope to take the Crows' or the Roos' spot in the Final 8.
GWS's hopes are most linked to the Roos', and the Roos are most at risk from Geelong and GWS amongst the Finals aspirants with non-trivial chances of making and missing the Finals.
Lastly, we'll look at the extent to which each of the 27 remaining games might effect the composition of the Final 8, using the methodology I first described in this post from a few weeks back and then refined by adding a Weighted Impact Index in this post from a few weeks later.
In the comments on last week's Finals simulations blog Nick raised a possible anomaly in the results, which got me to thinking about the sampling variability of the Raw and Weighted Impact Indexes. They're based on 10,000 simulations - my script not yet optimised sufficiently to allow me to run many more than this in a sensible timeframe - so for games where the home team is, say a 10% chance, the estimate of a team's chances of making the Final 8 contingent on the home team winning, is likely to be based on about 1,000 simulations, the standard error for which could be as high as 1.5%. As the home team's probability moves nearer 50% this standard error will decrease, but could still be as much as half that amount. In the context of assessing the impacts of single games on individual teams, that's a non-trivial standard error.
Accordingly, this week, rather than ranking each of the remaining 27 games, I've instead included a broader assessment of each game's impact, ratings them as either Low, Medium, High or Very High. Whilst sampling error might bounce around a single game's ranking, it's less likely to change its impact rating on this 4-point scale.
So, here are the ratings:
(Note that I've removed any reference to the Raw Impact Index now. For reasons that I touched on in earlier blogs, it's an inferior measure, the moreso now when I think about the effects of sampling error.)
Four games then have been assessed as having Very High impact. These are the games where flipping the result from a home win to a home loss has the greatest aggregate absolute impact on each of the Finals aspirants' chances, adjusting the raw absolute impacts to account for the relative likelihood of a home win versus a home loss.
This week we've only one game assessed as having Very High impact, the Saints v Cats game on Saturday. We've also two games assessed as having High impact (GWS v Sydney, and the Kangaroos v Fremantle) and two as having Medium impact.
Next week has two games rated as Very High impact and one as High impact, while the final round has one Very High and two High impact games. Every other game in Rounds 22 and 23, however, is rated Low impact so, on balance, I think it's fair to say that Round 21 is likely to have the greatest overall influence on the composition of the Final 8.
]]>Thanks to Scott Walsh and the team for giving me the opportunity to talk about analytics to a broad audience (and who would ever have thought those Dinosaur Charts would walk the pages of a major metro newspaper?)
]]>According to the latest simulations, which are based on the team-versus-team probabilities shown at right, Freo are now about 77% chances for the Minor Premiership, with the Hawks about 12% chances, and the Eagles 11% chances.
No doubt some would quibble about a few of the entries in that matrix, but they flow directly from the current ChiPS Team Ratings, and the assumptions the ChiPS System has been making all season about each team's Home Ground Advantage (HGA) at each venue and the benefits of playing an out-of-state team at home. And, it's fair to say, the predictors derived from the ChiPS System, C_Marg and C_Prob, have been having a stellar year in predicting results and assessing probabilities, as you can see from the latest Tipster Dashboard.
That said, I do think there's room to improve the HGA and Interstate adjustment factors next season by allowing them to adjust dynamically during the course of a season, rather than stay static throughout the entire season as they do now. A statistical modeller's job is never done ...
Anyway, those assessments render the latest TAB Dockers' and Eagles' Minor Premiership prices slightly unprofitable - suggesting we're not all that far away from the TAB bookmaker's opinions about these two teams - but also imply that the Hawks at $10 for the Minor Premiership might still represent a little value.
The simulations also have the Dogs and the Tigers finishing in the Top 4 often enough to make their $4 and $8 prices seem value in the Top 4 market, the Dogs now assessed as about 29% chances for the Top 4 and the Tigers 15% chances.
No team now offers significant value in the Top 8 market however, the Crows showing a small positive expectation, though one too small to make wagering advisable (if it ever is). The Miss the Top 8 market has now disappeared from the TAB, so no comments about value opportunities in that market are possible.
There's also still no market on the TAB for the Spoon though, should there have been one, the Lions would have firmed significantly after last weekend. The simulations now have them as about 81% chances for the Spoon, up from just under 60% at the end of the previous round. The Blues are now about 4/1 chances, and the Suns about 100/1 chances.
Time for the MoS Dinosaur charts, which this week show yet further consolidation in teams' possible futures, with only a handful still capable of finishing in any more than three ladder positions with non-trivial probabilities.
Adelaide, Geelong, GWS, the Kangaroos, Richmond, and the Western Bulldogs remain conspicuous by the range of plausible possible outcomes that await them, dependent on their own and their opponents' abilities over the remainder of the home-and-away season.
The latest heat map shows even more clearly the narrowing of most teams' most-likely finishes, and the relative uncertainty surrounding positions 5 through 10.
The latest simulations have a Dockers/Hawks 1-2 finish as a little over a 50:50 proposition, and a Dockers/Eagles finish as about a 3/1 shot. After that, a Hawks/Dockers finish arises about 11% of the time, and an Eagles/Hawks finish about 8% of the time.
It's not until we reach the 7th most-likely pairing that we find a Swan in the mix, that pairing having Freo in 1st and the Swans in 2nd, but appearing in less the 0.1% of the replicates.
Fairly obviously then it's almost certainly going to be some two of Fremantle, Hawthorn and West Coast finishing as Minor Premiers and Runners-Up.
Amongst the most common Top 4s emerging from the simulations, eight of the top 10 see Fremantle finishing first, and one of the remaining two have the Dockers finishing second.
Most common of all is a Freo/Hawks/Dogs/Swans finish, which arose in about 19% of all replicates, which is about 50% more often than a Freo/Eagles/Hawks/Swans finish, or a Freo/Hawks/Eagles/Dogs finish.
No quartet after those appeared in more than 7% of all simulation replicates.
Turning next to Top 8s, and again acknowledging that none of the sets shown here carry probabilities above 1%, we find that the single most-common Top 8 is Freo/Hawks/Eagles/Swans/Dogs/Tigers/Roos/Crows. That combination cropped up in about 0.9% of the simulation replicates.
Further down the list of most-common Finalists, the Cats make an appearance, they finishing 8th in the 7th-most common Top 8, nudging out the Crows, who the Cats play in Round 23.
It's instructive, I think to understand that no combination of teams for the Finals yet carries a probability that makes it more than a 100/1 prospect.
As is now custom, we'll first look at how one Finals aspirant's fate is dependent on another's, by reviewing a subset of the simulation replicates to estimate how likely it is that Team A makes the Finals conditional on Team B making or missing the Finals.
We see this week that Adelaide's fate appears very tied to Geelong's, the difference between the Crows' chances of making the 8 if the Cats do or don't now 44% points. Similarly, Geelong's chances seem most tied to Adelaide's, their estimated probability of playing in September moving by 35% points dependent of whether the Crows do or don't make the 8.
Collingwood's hopes of making the Finals, slim as they are, seem most tied to Richmond's, GWS' to Adelaide's, the Kangaroos' to Port Adelaide's, and Richmond's to Collingwood's. The Western Bulldogs' chances are largely impervious to the fates of the other Finals aspirants.
Lastly, we'll assess the importance of each of the 36 remaining games on the inclusion or exclusion of teams from the Final 8. As we have for the past few weeks, we'll provide both a raw and a weighted view of the importance of each remaining game, though we'll restrict our commentary to the latter view for reasons explained in that earlier blog post (which essentially amounted to a claim that they provided a more useful assessment).
Using that weighted view we find that the most important remaining contest is the Cats v Pies game in Round 22, which is slightly more important than the the Saints v Cats game in Round 21, and the Cats v Crows game in Round 23. Other games in the Top half-dozen are this week's Dons v Crows matchup, the Round 22 Crows v Eagles matchup, and next week's Giants v Swans matchup.
If we review each of the remaining rounds in terms of how many of the Top 10 games in contains, we now find that:
I've completed that task tonight and it's one of those things that, like writing, is far better when done than while doing. As far as I can tell all of the links now work; please let me know if you find otherwise. (Probably the main reason you might venture into the archive with intent would be to access the 2008 R2 Newsletter in which the MARS System is introduced, but there are some other fun pieces in there if you care to poke around.)
It's been a while since I updated everyone about MoS' drive to span the globe so let me now reveal that, as of today, people from 134 countries have visited the site. Their flags all appear in the map below. A couple of the more recent countries checked off the list have been Namibia and Barbados.
Within the United States, MoS has now had visitors from 50 of the 51 States, Idaho being the lone standout. I'm not sure what I need to do to attract traffic from that State but I am open to reasonable, probably non-potato themed suggestions.
Further north, conquering the provinces and territories of Canada continues to be a significant challenge. Five remain stubbornly non-traffic providing: Newfoundland and Labrador, Northwest Territories, Nunavut, Prince Edward Island, and Yukon Territory. I think I'm going to need a dedicated MoS fan to go there on a road trip.
Better yet, if you're planning overseas holidays anytime in the near future, see if you can add one or more of the locations in the list below to your trip and then visit MoS while you're there. This list contains the names of the 107 countries from which, according to Flag Counter, MoS has yet to record traffic.
]]>What I especially love about the piece is its recognition of the burgeoning AFL analytics community in Oz, within which I hope to be seen as a genuine participant in due course. As a Sydneyite, I know that an extended probationary period is required before earning that mantle.
Thanks to everyone from that community, all of whom continue to define a discipline that future generations will, I trust, recognise as the genesis for something much broader and more profound.
]]>Recently I was thinking about the best way of modelling a batsman's final score in a completed Test innings when I came across a piece by Brendon Brewer from the University of New South Wales on just this topic. He surmises that, for most batsmen, each Test innings plays out over two phases: an initial phase, before the batsman is said to be "set" and during which he is more susceptible to dismissal, and a second and final phase during which his probability of dismissal slowly declines as each run is scored and asymptotes towards some fixed value.
To model this situation he derives a Hazard function, which quantifies the probability that a batsmen is dismissed on a particular score conditional on the fact that he has attained that score. (Note that we're excluding not out scores in all our analyses or, put another way, we're considering only completed innings.)
The function has four parameters:
Brewer goes on to empirically estimate the parameters of this model for a number of Test batsmen. For my purposes here I'm going to use the values he calculated for Brian Lara, which I've replicated at right and then use to create a chart showing what they imply for the probability of dismissal at any given score.
The two phase nature of what we're simulating is made clear by the chart, and the L value of 2.8 makes the transition from Phase 1 to Phase 2 somewhat abrupt (an L equal to 0 would make the transition a step-function, which is as abrupt as it can be).
A batsman with the parameters shown here would be expected to average 49.4 runs per completed innings, a little below Brian Lara's actual average of 52.9, though that average included uncompleted (ie not out) innings.
We're now going to take those parameters and the Score distribution they imply, and use them to model 20 careers, each of 150 completed innings in length, and assuming that there is no innings-to-innings correlation in a batsman's scoring. In other words we'll exclude the possibility of form slumps and peaks, which might cause the Score distribution to vary from one innings to the next. (The existence of "form" is a testable claim, and one I might return to in a later blog.)
Here are the results:
Recognise that these are the careers for 20 batsmen of identical ability, and that any one of those careers is just as likely to have transpired as any other.
Batsman #2 becomes a legend of the game, with an average over 60 as well as 6 scores over 200 and 33 centuries. During his entire career, the longest he goes without a score of 50 or more is just 6 innings. He converts into centuries almost 50% of the time he reaches 50.
Contrast this with Batsman #12 who, truly, never realises his actual potential, finishing his lengthy career (if it were allowed go that long) with an average of just over 38, no double hundreds, just 12 centuries, and 14 ducks. Much of his (imaginary) career was no doubt spent defending his inability to convert 50s into 100s since he achieved this only 27% of the time. He also endured a period in his career when he went 11 innings without a score over 50.
Batsman #19, however, had a more dramatic "form slump", going 15 innings without a score over 50. Still, his 21 centuries and 5 double centuries ensured his somewhat elevated place in cricket history with a career average of 46.3 runs per completed innings.
As you cast your eyes across the other careers you'll spot other archetypes - Batsman #13 and Batsmen #18 who are known for converting 50s into 100s, often big ones; Batsman #14, who rarely goes cheaply but only once has converted a century into a double; and Batsman #17, who's good for 50 but only converts about 40% of the time - though he has reeled off those 5 superlative double tons.
Now there's nothing contrived about the results I've presented here. They are the genuine results of a single simulation of 20 careers. The combined average of the 20 careers is 49.3 runs per completed innings, which is entirely consistent with the long-run expectation of 49.4 runs per completed innings.
What they show is the extent of the natural variability in scoring amongst players of equal proficiency, even across a relatively long (in cricketing terms) career. Shorter careers would exhibit even greater variability.
It would be interesting, in fact, to take this same model and overlay some simple selectorial rules that might serve to truncate careers. For example, we might assume that a player making X scores below Y in Z innings would be dropped for a fixed number of games or dropped permanently, thereby shortening the number of innings over which his entire career plays out. Other stochastic elements could also be added, such as the number of games missed due to a "form slump".
None of this analysis diminishes the legacy of the legends of the game. Players with higher career batting averages are statistically more likely to have greater batting prowess than players with lower averages. But the variability we've witnessed here suggests that there will be exceptions, in both directions - players whose averages substantially understate and players whose averages substantially overstate their true underlying ability.
I think sometimes we'd do well to recognise that the scores we're witnessing might not - or might not only - indicate a form slump or a "golden period", but instead a momentary run of landing on the right or wrong side of random fluctuation.
]]>Under the Search By Category heading are links to each of the 17 different blog categories. Clicking on any of these links will retrieve a listing of all blog entries noted as belonging to that category. Under the Search By Tag heading is a single link to a page that contains a tag cloud. Clicking on any link on that page will bring up a list of blog entries associated with that tag.
I think I've done a better job categorising than tagging entries, but I'd appreciate any feedback on misclassifications, missed classifications, mistaggings or missed taggings.
Also, please let me know what you think about the site - good or bad.
Since last I posted about MoS visits from around the globe back in June, people from 18 more countries have found their way to the site. Sixteen of those countries I can readily identify and have listed below (the other two have escaped even a flag-by-flag comparison with the previous inventory and there's only so long I'm prepared to play the international version of Spot the Difference).
The new countries that I can identify are:
Flags of the 118 nations that have visited MoS
According to the Flag Counter website, 123 countries remain unaccounted for in MoS' traffic data.
Turning next to individual countries, MoS has also had visitors from five more US states since mid-June: Alaska, Kansas, New Mexico, Oklahoma and South Dakota. That leaves just five other for MoS to collect to complete the set: Idaho, Montana, North Dakota, Vermont and West Virginia.
Canada, in contrast, has resisted further MoS incursion, leaving seven provinces MoS-free: New Brunswick, Newfoundland and Labrador, Northwest Territories, Nunavut, Prince Edward Island, Saskatchewan and Yukon Territory.
]]>Of the three AFL statisticians they spoke to about tomorrow's Grand Final, I'm alone in tipping Sydney. But, one Swan or Hawk does not a Winter make (or something).
Anyway, here it is:
One thing that I found interesting about the three approaches described in the piece was that none of us appears to include any player information in our modelling and all of us have some notion of Team Rating and recognise the importance of venue. Convergent validity perhaps?
(Particular thanks to Nick Evershed from the Guardian for the opportunity.)
]]>
I've also come across other sites valuable to the dedicated AFL soothsayer or historian:
If you're aware of additional sites that you think are worthy of inclusion on this list or you're the owner of any of the sites listed above and you'd like me to amend the listing (say to include your contact details), please let me know.
]]>
I reckon I've had days like that too.
On a more positive note, I like the optimism behind these two search strings, both of which also resulted in MoS visits:
MoS might have provided some relevant information to that first searcher, but I suspect the second left the site none the wiser. I do hope he or she found an answer somewhere else.
(For any overseas visitors who missed the allusion in the title of this blog, see this YouTube video and this webpage.)
]]>
That 100th country's visitor, by the way, was from Armenia and, though I'd like to think otherwise, I doubt that he or she stayed for long.
The map of the 100 countries from which the site has had visitors so far does a good job, I think, of depicting the global reach that MatterOfStats has apparently had.
Don't get me wrong, I recognise that a fair portion of that traffic has been accidental and fleeting rather than intended and enthralled - after all, almost one half of the countries represented have visited fewer than 5 times - but the fact that people from over 50 nations have spent a little more than a little time on MatterOfStats is still enough to make me pause for a moment.
There are though, still portions of the addressable globe (as marketers love to say) that I can legitimately hope to virtually traipse. For example, MatterOfStats is yet to impinge on 10 US States.
That means of course that it has already infiltrated 41 States, in so doing generating over 1,000 visits, in particular from California, New Jersey, Texas, Virginia, Washington, New York, Missouri, Massachusetts, Pennsylvania, Florida and Illinois, all States for which the number of visits has been 20 or more.
Given the volume of traffic from those first two States, California and New Jersey, I can only imagine that some ex-pat Australians have discovered the site and spent some time here. If that's you and you're reading this, hi and please feel free to e-mail me.
I've a more difficult job envisioning deliberative visits from Alabama, Iowa, Hawaii, Delaware or Arkansas, but I'm ready to be proved unimaginative if any of the visitors from those States is willing to set me straight.
Canada is another English-speaking nation where MatterOfStats' coverage has been significant but not complete.
Of the 13 Provinces and Territories, 6 are represented amongst the visitors to MatterOfStats, most prominently Ontario and British Columbia.
That means, of course, that 7 are missing, so if you've any friends or colleagues who live there or are visiting, please have them drop by MatterOfStats so I can tick them off the list. While ever Chi remains the MatterOfStats mascot it seems particularly remiss not to have touched on the lives of Labrador residents.
]]>The table below provides the visit statistics for the 2014 calendar year to 21 May, and includes pages with at least 20 visits in that period (and excludes any of the more ephemeral weekly posts from the Wagers and Tips and the Team Dashboard journals, even if they've attracted that many visits).
Atop that table is one of the first posts from 2014, in which I described some of the things that have surprised me about the analyses of AFL I've conducted over the past 8 years. That blog entry has been visited over 400 times by an estimated 351 unique viewers, each spending, on average, over 3 minutes on the page. Less than 50% of people who have come to the page have 'bounced', that is, immediately exited the site, and only 45% have not gone on to visit some other page on MatterOfStats. When I wrote that post I'd have given you long odds that it would become popular.
The Pythagorean Expectation page has also attracted considerable traffic - about 26 pageviews a week since it went up. Andrew's posts on Game Statistics have also proven popular, attracting well over 100 pageviews each.
If you'd like to visit any of the pages you see listed in the table, clickable links for each are provided after the table.
matterofstats.com/mafl-stats-journal/2014/2/16/pythagorean-expectation-for-vflafl-and-the-nrl
matterofstats.com/mafl-stats-journal/2013/10/13/building-your-own-team-rating-system.html
matterofstats.com/mafl-stats-journal/2014/2/12/home-team-and-away-team-scores-across-vflafl-history
matterofstats.com/mafl-stats-journal/2014/1/19/explaining-variability-in-game-margins
matterofstats.com/mafl-stats-journal/2014/2/22/do-favourites-kick-straighter-than-underdogs
matterofstats.com/mafl-stats-journal/2013/6/16/game-statistics-and-the-dream-team.html
matterofstats.com/mafl-stats-journal/2014/1/17/set-of-games-ratings-all-teams-charts
matterofstats.com/mafl-stats-journal/2013/11/25/a-very-simple-team-ratings-system.html
matterofstats.com/mafl-stats-journal/2014/1/7/introducing-chips
matterofstats.com/mafl-stats-journal/2014/3/27/presentation-to-the-sydney-users-of-r-forum-surf-2014
matterofstats.com/mafl-stats-journal/2013/12/8/optimising-the-very-simple-rating-system-vsrs.html
matterofstats.com/mafl-stats-journal/2013/1/9/measuring-bookmaker-calibration-errors.html
matterofstats.com/mafl-stats-journal/2013/12/30/season-optimised-team-ratings
matterofstats.com/mafl-stats-journal/2014/2/23/modelling-miscalibration
matterofstats.com/mafl-stats-journal/2014/1/11/the-dynamics-of-chips-ratings-2000-to-2013
matterofstats.com/mafl-stats-journal/2014/4/6/vflafl-final-scores-weve-never-seen
matterofstats.com/mafl-stats-journal/2014/1/17/sogr-vsrs-analysis
matterofstats.com/mafl-stats-journal/2014/4/29/sources-of-surprisal-2006-to-2014-round-6
matterofstats.com/mafl-stats-journal/2013/5/4/on-the-randomness-of-final-afl-scores.html
matterofstats.com/mafl-stats-journal/2013/9/20/bookmaker-overround-a-general-framework.html
matterofstats.com/mafl-stats-journal/2013/11/24/estimating-home-ground-advantage-by-venue.html
matterofstats.com/mafl-stats-journal/2014/5/1/how-different-is-cmarg-really
matterofstats.com/mafl-stats-journal/2010/4/12/goalkicking-accuracy-across-the-seasons.html
matterofstats.com/mafl-stats-journal/2013/2/6/yet-another-look-at-bookmaker-overround.html
General Probability & Statistics
History and FAQ Pages
Yesterday, a little after 3pm Sydney time, I watched while MAFL's first and only mascot, Chi the Absurdly Determined, laying blissfully on my wife's lap in our vet's surgery, a study in zen-like non-attachment, went limp and then peacefully disappeared forever from our lives.
As some of you know, the last few months of Chi's life have been especially difficult, probably moreso for us than for him, as his bingo card of ailments and pharmacopoeian list of treatments grew more numerous, the best of 21st century veterinarian medicine pitted against, amongst other things, his murmurous heart, his failing eyesight, his arthritic body and his demented brain (though, truth be told, he wasn't doing much with that organ anyway). Our vet likened the situation to propping up a tent from the inside using only your hands while it's collapsing at multiple, random places, all around you.
It was, I truly know - and as my amazing wife and our amazing vet knew - the time to let go, or as near to that time as anyone could reasonably estimate but, regardless, it's hard to quell an uneasy feeling of betrayal; he made it so difficult.
Even early on his last morning he was, in his own mind, purposeful in heading towards the kitchen, backdoor, water bowl - who knew, given that his chosen path involved a plainly unnecessary detour via the TV in the lounge room. And then, throughout the remainder of the day he was as keen of appetite as ever, finishing most of a breakfast of eggs and ham, devouring the few, small coveted pieces of banana that were offered him, and then contentedly melting into whatever lap was available. If he knew nothing else on his last day, it was that he was loved.
One small comfort is that we never got to the point where all that was left of Chi forever spoiled our memories of all that he was.
And now, he's no more, and I feel sharply aware that, like the as-beloved pets before him, he's destined to become a blurry memory. He does have one thing in his favour though: he's easily the most photographed pet we've ever welcomed into our home. Should anyone ever decide to do a retrospective on him they'll suffer from no shortage of contemporaneous material.
It's cliched, I know, but if you have a pet - something, someone you love - be mindful in their presence and be thankful for the brief time you'll share together on the planet.
See you mate - and thanks.
]]>
It's a quick and easy way to see what, if anything, is new on the site and includes all posts, no matter which journal they appear in.
(With thanks to Debs for doing all the work necessary to make this happen.)
]]>The TAB Bookmaker, I've discovered to precisely no-one's surprise, does an inordinately good job of assessing teams' chances, as evidenced by the high level of what's known as "calibration" that his head-to-head forecasts exhibit. A well-calibrated forecaster is one for whom outcomes rated as 70% chances by him or her transpire about 70% of the time and, more generally, outcomes that he or she rates as X% chances occur about X% of the time. The TAB Bookmaker is exceptionally well-calibrated or, put another way and in the context of more than just head-to-head predictions, he's a very good estimator of the distributions of the random variables that describe different aspects of the outcomes in a football game.
But even he can't predict the outcome of every game or every aspect of a game, only estimate the relative likelihoods of potential outcomes, and if anyone has a financial interest in being able to foretell the future, it's a commercial bookmaker. Granted, it isn't especially profound of me to claim that the outcome of a sporting contest can't be known with certainty beforehand, even by someone with the data and motivation to do so if it were possible, but the extent to which the outcome appears to be determined "on the day" is particularly astonishing to me.
Consider the following statistics, based on the performance of the TAB Bookmaker, which I've observed and analysed over these seven years:
From a gambling point of view these observations are of no special import. Indeed, the Bookmaker's prices incorporate the observed variability since, if they did not, then wagering on, for example, favourites would be ridiculously lucrative. The TAB Bookmaker acknowledges the variability in outcomes and prices accordingly. This suggests that even he recognises that a huge proportion of the result of any single game is due to things that were unforeseeable before the contest began, events or talents that were only observed "on the day".
How much of this unpredictability is generally down to the superior skills of the team that eventually won and how much is, instead, due to essentially random factors - or, at least, factors outside the control of the two teams - is an interesting question to ponder.
As a theoretical exercise, consider the same game being replayed say a thousand times with each of the teams exhibiting identical skill levels in each game (whatever that means), so that the only cause of differences in the outcome, if any, are attributable to pure chance. Would all one thousand contests produce the same outcome? If not, how much variability would there be in the result across the thousand replicates?
The very fact that the TAB Bookmaker is able to allow for the variability in outcomes of real games in his pricing is, to me, tellng in itself. Surely if the causes of deviation from the expected result were specific to "on the day" but not truly random factors they'd not exhibit sufficient order to permit the Bookmaker to make such allowances in his prices. It's only because the random factors are indeterminate for a particular game but determinate, on average, across a sufficiently large number of games, that he can perform such a feat. That's indicative of results being "drawn" from a "distribution", surely.
In essence it comes down to whether or not you believe there is inherent, irreducible randomness in a sporting outcome that's not a consequence of the teams' relative skills on the day and, if there is, how much it contributes to the final outcome of a game and, moreover, to a season.
Sports commentators and, I'd argue, most sports fans, seem to premise their discussion of sporting results assuming that, a few bad umpiring decisions or blatantly obvious "bounces of the ball" aside, the outcome was somehow the "right" one, that the "better team won on the day" - in short that, in the normal course, randomness does not play dice with the footballing universe.
I'm not so sure.
Why I think it matters is because it goes to how we should treat our champion teams. By convincing ourselves that their success was, in the main, their own doing, we feel justified in celebrating their achievements as deserved reward for superior skill - and, as a consequence, in treating the team they defeated as less worthy if nonetheless valiant. But what if that's equivalent to congratulating the roulette player who picks "red" and doubles her money while deriding the "sucker" who chose "black" and lost it all? Or, to pick an example that's not so obviously determined by chance, of celebrating the victor in a game of backgammon while deriding the loser.
One of the best ways, of course, to reduce the element of chance in your assessment of a team's skill level is to view its performance across a series of contests. Based on that, there's clear merit in a 6-month long home-and-away season to winnow the 18 teams down to 8 finalists, but to then use a 4-week 9-game series to anoint an ultimate champion then seems a trifle odd.
Maybe the English Premier League know what they're doing ... but then we'd miss out on the drama of Grand Finals like the Swans-Hawks game of 2012.
]]>Treating a not out innings as equivalent to a completed innings seems unfair. In effect such treatment is equivalent to assuming that, had the team's innings continued, the batsman would have scored no further runs. That's a possible outcome, but surely not the most likely one (Chris Martin aside).
Tradition has it that, instead, a player's average will be calculated by dividing the total number of runs the player has scored by the number of completed innings he or she has had. Runs scored in innings where a player remains not out are therefore considered something of a bonus, and a player considered to be playing cautiously near the end of his or her team's innings in an effort to remain not out is sometimes accused of "playing for his (or her) average".
With a little maths we can answer a simple question: after a not out innings, how many extra runs would a player have needed to score before being dismissed in order to have the same average at the end of the innings?
First, let's define some terms:
Using these terms, prior to the current innings the player's average would have been R0/I and, after the current innings, it became (R0+S)/I.
We want to know what the value of E is, the extra runs that he or she would have needed to score before being dismissed, so that (RO+S+E)/(I+1) = (RO+S)/I.
Solving this for E yields E = (R0 + S)/I, which is the player's average before the current innings + S/I. Relative to a player's average, S/I will generally be small, so we can say that:
The current method of incorporating a not out score in a player's average is equivalent to assuming that he or she would have scored before being dismissed in addition to whatever score he or she had accumulated as many runs as was his or her average prior to the innings.
So, for example, if a player with an average of 30 remains 27 not out at the end of the team's innings, as far as her average is concerned it's as if she batted on and scored 57 - the 27 she already had plus her average.
How reasonable is this assumption is practice?
One obvious criticism is that players tend to have a period of additional vulnerability early in their innings and that the likelihood of their being dismissed diminishes with the length of their innings. If that's the case then effectively crediting them with their average if they remain not out 0 at the completion of their innings seems generous, and doing the same if they're 100 not out seems unfair.
Put another way, what this suggests is that the extra runs a player scores before being dismissed is a function of the runs he or she has already scored.
We can test this hypothesis by calculating conditional averages for the actual scores of some players. Here's Ricky Ponting's (as at 14 Feb 2011).
Looking firstly at the numbers for Tests, we see that, in those innings where he is eventually dismissed, once Ricky has made 0 runs, on average he scores 46.3 runs more before he is dismissed. If we add the runs he scores in not out innings (and divide by the number of completed and not out innings), then this average increases to 47.7 runs.
Perhaps more interestingly, once he gets to 30 he's good for another 54 or so runs.
One thing you'll notice here is that there's no evidence of Ricky scoring a larger number of additional runs before being dismissed once his score reaches about 10. This suggests that the current treatment of not out innings might not have much of a distortionary effect on player averages - well, on Ricky's at least.
Actually, to make that argument we need more than that the number of additional runs scored be roughly constant, we need it to be roughly constant at around Ricky's average.
And, for Tests, it is, give or take a run or two.
For ODI's, however, the treatment of not outs is almost certainly distortionary. Even choosing the most generous example where Ricky has already scored 10 and including all of his not out innings, he still is only good for, on average, an extra 38 runs, not the 43 runs that he's effectively being credited with.
The problem, I think, stems from the inherently truncated nature of the limited overs format, which means that, if anything, a player is more likely to be dismissed as his innings progresses and balls remaining dwindle - a factor that can be seen to some extent in Ricky's statistics for scores of 40 and over.
At this point you might well be wondering if these conclusions apply only to Ricky Ponting. Well, here's the same analysis for Sachin Tendulkar.
For both Tests and ODIs, the overall pattern of Sachin's data is similar to that for Ricky. There's certainly no evidence for the hypothesis that he's likely to score more additional runs the more he's already scored in either form of the game, once he's reached about 20.
Again we need to compare these numbers to his career statistics.
Here too we find for Sachin in Tests that his conditional average scores are just a few runs shy of his traditional average, but for ODIs the difference is much greater.
It would be an interesting exercise to complete this analysis for some other players of different styles and averages - perhaps a Sehwag, a Gilchrist or a Smith - but that's a task for another weekend.
In the meantime, my tentative conclusion would be that the current treatment of not out innings does little to distort a player's test average but probably inflates his ODI average.
]]>Indeed, consistency's a characteristic that sports commentators reserve for their warmest - and often longest - soliloquys, and players and teams, once they've reached an acceptable level of performance, announce as though scripted that they're now "striving for consistency". So, surely, consistency is always a good thing, isn't it?
Well no, not always. There's a sport - or a version of it - that rewards a modicum of inconsistency. It's four ball, better ball golf, which I'll abbreviate to 4BBB for the remainder of this blog. For those of you unfamiliar with this golfing variant all you need to know is that the contest is played between two teams each comprising two golfers. All four golfers play every hole and a team's score for a hole is the lower of the two scores of its team members. So, if the two golfers on Team A make a 4 and a 5, while those on Team B make a 3 and a 7, Team B wins the hole because their lower score is 3, which beats the lower score of 4 for Team A.
When the low score for the two teams is the same, the hole is said to be 'halved' and, effectively, nobody wins it. The team that wins the contest is the team that wins the greater number of holes of those played. For our purposes we'll assume the teams play 18 holes each and that all participants play off what's called "scratch", which means that their final score for a hole is the number of strokes they took to complete the hole, which is not adjusted in any way to account for the differing abilities of the participants.
(I know, by the way, that this form of golf is usually called "four ball, best ball" but the choice on each team is between two scores so it should really be "better" rather than "best". Pedantry, like mould, once it takes hold pervades everything.)
Okay. Now imagine the following scenario. You're scheduled to play a 4BBB against two equally-talented golfers. Both of them can be expected to double-bogey 5% of the time, bogey 20%, par 50%, birdie 20% and eagle on 5% of holes. You're of the same calibre as these two opponents and can be expected to shoot double-bogeys, bogeys and so on in the same proportions.
You have a choice of two partners for the match:
All players have the characteristic that their score on any given hole is unaffected by their own scores on other holes and by the scores of the other golfers on this same hole or on previous holes. In statistical terms, this means that each golfer's score on a hole is 'independent' of their own and other's scores on this and on any previous holes in just about every way you can think of.
Which partner - C or E - offers you the better chance of victory?
As you've probably already guessed by now, it's Partner E. With him you can expect to win 32.5% of holes, halve 36.3% of holes, and lose 31.2%. Paired with your consistency and pitted against your opponents', it's his inconsistency that makes him valuable to you. He shoots over par more often than you or your opponents do (he'll average about 1.6 over par on average over the 18 holes), but your consistency often saves him when he does that. Vitally, he also breaks par more often than anyone else and, since it's the low score that wins the hole, that makes him, in the current scenario, an asset often enough to be valuable.
Greater inconsistency, however, is not always beneficial. To pick an extreme example, if Partner E produced birdies 26% of the time - so he still breaks par more often than do you or your opponents - but double-bogeys the other 74% of he time, then the pair of you could expect to win only 29.1% of holes, halve 33.3%, but lose 37.6%. In aggregate, then, you'd lose about 8.5% more than you'd win and you'd end up drinking in the 19th far more often to commiserate rather than to celebrate.
Taking the consistent you out of the picture for a moment, it's true that mutual, but again independent, inconsistency can be beneficial too. If, for example, two players like partner E paired up against the consistent duo, they'd win 33.6% of holes, halve 34.1%, and lose 32.3%. In total then, they'd win 1.3% more than they'd lose, the same nett result as a consistent you and a partner of type E would achieve.
The best partner of all, though, is one whose scores are negatively correlated with yours. So, for example, imagine a partner who, when you shoot bogey, tends to shoot par or better and, conversely, when he or she shoots bogey or worse, you tend to shoot par or better. In this case we're now breaking the previously stated assumption that the scores of each golfer are independent.
So let's return to the situation where you and your partner face that same consistent duo and let's assume that, overall, both you and your partner generate eagles, birdies and so on at the same rate as they do. Now, as an example of negatively correlated scoring, you and your partner's scores have the following characteristics: when you shoot eagle, your partner shoots bogey half the time and double-bogey the other half; when you shoot birdie, your partner shoots par 20% of the time, bogey 75% of the time and double-bogey the other 5%; when you shoot par, your partner shoots eagle 4% of the time, birdie 26%, par 62%, bogey 5%, and double-bogey 3%; when you shoot bogey, your partner shoots eagle 5% of the time, birdie 20% and par 75% of the time; and when you shoot double-bogey, your partner shoots eagle 40% of the time, and birdie the other 60%.
Those percentages mean that your score and your partner's are highly negatively correlated, and that makes you a formidable pair. Against the consistent duo you can now expect to win 34.7% of holes, halve 39.6% and lose only 25.8%. On balance, you'll win about 9% more holes than you'll lose. Remember : you and your partner both, on average, tend to produce eagles, birdies and so on at exactly the same rate as your both of your opponents. It's just the emergent property that is your negatively correlated scores that makes you so devastating to encounter.
In summary, inconsistency can be good in 4BBB, but only if it's moderate inconsistency. Negative correlation is even better (as indeed it is, and for similar reasons, if you're looking for a valuable asset to add to your portfolio).
]]>A little calculation shows that those concerns were legitimate. The table below shows, for solutions of different sizes - that is, for solutions involving different numbers of coin denominations - how many possible combinations must be considered and how long it would take to consider them using the integer programming routine that I have, which can consider about 1,000 potential solutions per minute.
The curious reader might be interested to know that each combination calculation requires the use of factorials and is of the form 94!/(94-s+1)!(s-1)! where s is the number of coins permitted in the solution. The 94 comes from the fact that there are 94 possible coin denominations to be considered, starting with the 6c and finishing with the 99c. (Note that every potential solution must include a 5-cent piece in order to be capable of producing a solution that delivers a total of 5-cents, and that I've assumed that no optimal solution would include a denomination lower than 5-cents.)
Looking at the final column of the table you can see why I was able to solve the 4-coin problem as it required just a couple of hours of computation, but baulked at attempting the 5-coin problem, which would have needed a couple of days. After that point, things quickly get out of hand.
For example, we'd need a year and a half to topple the 7-coin problem, a generation to solve the 8-coin problem, and a few geological epochs - the exact number depending on which epoch you choose - to address the 15-coiner. The 48-coin solution is the one that would require most time and could be comfortably knocked over in a bit over 3 exa-years - or 3 x 10^18 years if you prefer, which is about 225.6 million times the current best estimate of the age of the universe and, I think, could fairly be labelled 'a while'.
After cracking the 48-coiner it'd all be downhill again, the 49-coin solution taking the same time as the 47-coin solution, the 50-coiner taking the same time as the 46-coiner, and so on.
Facing these sorts of time frames we need, as economists and mathematicians love to say, 'a simplifying assumption'.
In my case what I've decided is to consider as candidate solutions only those involving coins with denominations that are multiples of 5-cents. None of the solutions in the 2-coin, 3-coin or 4-coin problems has involved denominations outside that definition, so I'm taking that as (a somewhat weak) justification of my simplifying assumption.
What this assumption does is turn the 94 in the formula above into an 18, and that makes the identification of solutions feasible during your and my lifetime, which is surely as good an example of the ends justifying the means as any that has ever been posited in the past.
Wielding my freshly-minted assumption as a weapon, I've bludgeoned solutions for the 5-coin through to the 19-coin problems, and the number of solutions for the 5-coin and 6-coin problems are small enough to list here. For the 5-coin problem the solutions are (5,15,25,30,65), (5,15,35,45,50) and (5,20,30,45,55), each of which requires an average of 1.84 coins per transaction, and for the 6-coin problem the solutions are (5,10,25,40,45,50), (5,15,20,25,40,70), (5,15,20,40,45,55), (5,15,20,45,55,80), (5,15,25,30,60,65), (5,15,25,30,65,70) and (5,15,25,35,45,50), each of which requires an average of 1.68 coins per transaction.
The number of solutions for the problems involving more coins rises sharply making these solution lists impractical to provide here. Instead, for completeness, here's a table showing the number of solutions for each sized problem and the average number of coins that this solution requires.
To finish with, here's a graph of how the optimum coins per transaction declines as you allow more and more denominations in the solution.
This graph suggests that there's not much to be gained in going beyond 6 coin denominations, at which point you need, on average, only 1.68 coins per transaction. What's more, if you play with the 6-coin solutions a bit, you'll find that with any of them you'll never need more than 2 coins to complete any transaction from 5-cents to 95-cents, which is something you can't say of any of the 5-coin solutions.
For me then, the optimal optimal solution is (5,15,20,25,40,70). With it, you'll never need more than two coins to produce any total between 5-cents and 95-cents and you should be able to identify the two coins you need quite quickly.
And that, ladies and gentlemen, is the last you'll read here about the coin problem. (Promise.)
]]>There were, it turned out, nine such sets each of which was optimal and required only 2.11 coins, on average, to sum to any amount from 5-cents to 95-cents.
Well since we've solved the problem for 4-coin sets, what about solving it for 2-coin and 3-coin sets?
Only three 2-coin solutions are optimal - (5,20), (5,25) and (5,30) - and each of them requires 3.68 coins on average to produce sums from 5-cents to 95-cents. The combinations of (5,15) and (5,35) are next most efficient, each requiring an average of 4 coins per transaction.
Most efficient amongst what I'd consider to be the odd-looking solutions is (5,12), which requires 7.05 coins per transaction, an average that is bloated by horror outcomes for higher amounts such as the 14 coins required to produce a 95-cent total (5 x 12-cents + 9 x 5-cents).
Moving onto 3-coin solutions we find that there are five that are optimal - (5,15,40), (5,20,30), (5,20,45), (5,25,35) and (5,25,40) - each requiring an average of 2.53 coins per transaction. Glistening amongst the six next-most efficient solutions is (5,20,25), which has the twin virtues of being near-optimal (it requires just 2.63 coins per transaction) and of being patently practical.
Thinking some more about 3-coin solutions, if we were forced to retire one of our current coin denominations, it's the 10-cent that should go as this would leave (5,20,50), a solution that requires an average of only 2.74 coins per transaction. This is only marginally less than optimal in the 3-coin world and is actually not all that much worse than the 2.32 coin average that our current (5,10,20,50) set offers.
Instead of thinking about which of our current coin denominations we might retire, what if, instead, we thought about how we might grow an efficient set of denominations, starting with an optimal set of two coins, then adding a third coin to produce another optimal set, and then finally adding a fourth to again produce an optimal set. Can it be done?
Yes, it can, as is illustrated below.
To produce an all-the-way optimal path from two coins to four we could start with either the (5,20) or the (5,30) sets, though not with the (5,25) set since, although it would allow us to move to an optimal 3-coin solution with the addition of a 35-cent or a 40-cent coin, we would be unable to create an optimal 4-coin solution from there.
For our third coin we could either add the 20-cent coin if we'd started with (5,30), or add the 30-cent or 45-cent coin if we'd started with (5,20).
Lastly, if we'd started with (5,20,45) we could add a 30-cent or a 35-cent coin, or if instead we'd started (5,20,30) we could add a 45-cent or a 65-cent coin to produce an optimal 4-coin solution.
The only way to reach the six other equally optimal 4-coin solutions would have been to start with the sub-optimal (5,15) set, which as we noted earlier falls only a little short of the optimal solutions in requiring on average 4 coins per transaction to the optimal solutions' 3.68 coins per transaction.
I am, naturally, curious about the optimal 5-coin solution (and the 6-coin, and so on) but I don't think that I can find this solution in a practically feasible amount of time using the integer programming optimisation routine that I am currently. Perhaps more at a future date, though probably not.
]]>Imagine that the first two rounds of the season produced the following results:
Geelong and St Kilda have each won in both rounds and Geelong's percentage is superior to St Kilda's on both occasions (hence the ticks and crosses). So, who will be placed higher on the ladder at the end of the 2nd round?
Commonsense tells us it must be Geelong, but let's do the maths anyway.
How about that - St Kilda will be placed above Geelong on the competition ladder by virtue of a superior overall percentage despite having a poorer percentage in both of the games that make up the total.
This curious result is an example of what's known as Simpson's paradox, a phenomenon that can arise when a weighted average is formed from two or more sets of data and the weights used in combining the data differ significantly for one part compared to the remainder.
In the example I've just provided, St Kilda's overall percentage ends up higher because its weaker 115% in Round 1 is weighted by only about 0.4 and its much stronger 160% in Round 2 is weighted by about 0.6, these weights being the proportions of the total points that St Kilda conceded (165) that were, respectively, conceded in Round 1 (65) and Round 2 (100). Geelong, in contrast, in Round 1 conceded 78% of the total points it conceded across the two games, and conceded only 22% of the total in Round 2. Consequently its poorer Round 1 percentage of 130% carries over three-and-a-half times the weight of its superior Round 2 percentage of 169%. This results in an overall percentage for Geelong of about 0.78 x 130% + 0.22 x 169% or 138.8, which is just under St Kilda's 142.4.
When Simpson's paradox leads to counterintuitive ladder positions it's hard to get too fussed about it, but real-world examples such as those on the Wikipedia page linked to above demonstrate that Simmo can lurk within analyses of far greater import.
(It'd be remiss of me to close without noting - especially for the benefit of followers of the other Aussie ball sports - that Simpson's paradox is unable to affect the competition ladders for sports that use a For and Against differential rather than a ratio because differentials are additive across games. Clearly, maths is not a strong point for the AFL. Why else would you insist on crediting 4 points for a win and 2 points for a draw oblivious, it seems, to the common divisor shared by the numbers 2 and 4?)
]]>He tells you that the models he is offering each use different pieces of data about a particular game and that neither of them use data about which is the home team. He adds - uninformatively you think - that the two models produce statistically independent predictions of the winning team. You ask how accurate the models are that he's selling and he frowns momentarily and then sighs before revealing that one of the models tips at 60% and the other at 64%. They're not that good, he acknowledges, sensing your disappointment, but he needs money to feed his Lotto habit. "Lotto wheels?" , you ask. He nods, eyes downcast. Clearly he hasn't learned much about probability, you realise.
As a regular reader of this blog you already have a model for tipping winners, sophisticated though it is, which involves looking up which team is the home team - real or notional - and then tipping that team. This approach, you know, allows you to tip at about a 65% success rate.
What use to you then is a model - actually two, since he's offering them as a job lot - that can't out-predict your existing model? You tip at 65% and the best model he's offering tips only at 64%.
If you believe him, should you walk away? Or, phrased in more statistical terms, are you better off with a single model that tips at 65% or with three models that make independent predictions and that tip at 65%, 64% and 60% respectively?
By now your olfactory system is probably detecting a rodent and you've guessed that you're better off with the three models, unintuitive though that might seem.
Indeed, were you to use the three models and make your tip on the basis of a simple plurality of their opinions you could expect to lift your predictive accuracy to 68.9%, an increase of almost 4 percentage points. I think that's remarkable.
The pivotal requirement for the improvement is that the three predictions be statistically independent; if that's the case then, given the levels of predictive accuracy I've provided, the combined opinion of the three of them is better than the individual opinion of any one of them.
In fact, you also should have accepted the offer from your Lotto-addicted confrere had the models he'd been offering each only been able to tip at 58% though in that case their combination with your own model would have yielded an overall lift in predictive accuracy of only 0.3%. Very roughly speaking, for every 1% increase in the sum of the predictive accuracies of the two models you're being offered you can expected about a 0.45% increase in the predictive accuracy of the model you can form by combining them with your own home-team based model.
That's not to say that you should accept any two models you're offered that generate independent predictions. If the sum of the predictive accuracies of the two models you're offered is less than 116%, you're better off sticking to your home-team model.
The statistical result that I've described here has obvious implications for building Fund algorithms and, to some extent, has already been exploited by some of the existing Funds. The floating-window based models of HELP, LAMP and HAMP are also loosely inspired by this result, though the predictions of different floating-window models are unlikely to be statistically independent. A floating-window model that is based on the most recent 10 rounds of results, for example, shares much of the data that it uses with the floating-window model that is based on the most recent 15 rounds of results. This statistical dependence significantly reduces the predictive lift that can be achieved by combining such models.
Nonetheless, it's an interesting result I think and more generally highlights the statistical validity of the popular notion that "many heads are better than one", though, as we now know, this is only true if the owners of those heads are truly independent thinkers and if they're each individually reasonably astute.
]]>In situations like this one where a subjective probability assessment is required people make their probability assessments using any information they have that they believe is relevant, weighting each piece of that knowledge according to the relative importance they place on it. So the difference between your and my estimates for our hypothetical Melbourne game could stem from differences in the information we each hold about the game, from differences in the relative weights we apply to each piece of information, or from both of these things.
If I know, for example, that Melbourne will have a key player missing this weekend and you don't know this - a situation known as an "information asymmetry" in the literature - then my 20% and your 40% rating might be perfectly logical, albeit that your assessment is based on less knowledge than mine. Alternatively, we might both know about the injured player but you feel that it has a much smaller effect on Melbourne's chances than I do.
So we can certainly explain why our probability assessments might logically be different from one another but this doesn't definitively address the topic of whose assessment is better.
In fact, in any but the most extreme cases of information asymmetry or the patently inappropriate weighting of information, there's no way to determine whose probability is closer to the truth before the game is played.
So, let's say we wait for the outcome of the game and Melbourne are thumped by 12 goals. I might then feel, with some justification, that my probability assessment was better than yours. But we can only learn so much about our relative probability assessment talents by witnessing the outcome of a single game much as you can't claim to be precognitive after correctly calling the toss of a single coin.
To more accurately assess someone's ability to make probability assessments we need to observe the outcomes of a sufficiently large series of events for each of which that person had provided a probability estimate beforehand. One aspect of the probability estimates that we could them measure is how "calibrated" they are.
A person's probability estimates are said to be well-calibrated if, on average and over the longer term, events to which they assign an x% probability occur about x% of the time. A variety of mathematical formulae (see for example) have been proposed to measure this notion.
For this blog I've used as the measure of calibration the average squared difference between the punter's probability estimates and the outcome, where the outcome is either a 1 (for a win for the team whose probability has been estimated) or a 0 (for a loss for that same team). So, for example, if the punter attached probabilities of 0.6 to each of 10 winning teams, the approximate calibration for those 10 games would be (10 x (1-0.6)^2)/10 = 0.16.
I chose this measure of calibration in preference to others because, empirically, it can be used to create models that explain more of the variability in punting returns. But, I'm getting ahead of myself - another figure of speech whose meaning evaporates under the scantest scrutiny.
The table below shows how calibration would be estimated for four different punters.
By way of contexting the calibration score, note that the closer a punter's score is to zero, the better calibrated are his or her probability assessments, and a punter with absolutely no idea, but who knows this and therefore assigns a probability of 0.5 to both team's chances in every game, will have a calibration score of 0.25 (see Punter #2 above). Over the period 2006 to 2009, the TAB Sportsbet bookmaker's probability assessments have a calibration score of about 0.20, so the numerically tiny journey from a calibration score of 0.25 to one of 0.20 traverses the landscape from the township of Wise Ignorance to the city of Wily Knowledge.
Does Calibration Matter?
It's generally desirable to be labelled with a characteristic that is prefixed with the word stem "well-", and "well-calibrated" is undoubtedly one such characteristic. But, is it of any practical significance?
In your standard pick-the-winners tipping competition, calibration is nice, but accuracy is king. Whether you think the team you tip is a 50.1% or a 99.9% chance doesn't matter. If you tip a team and they win you score one; if they lose, you score zero. No benefit accrues from certainty or from doubt.
Calibration is, however, extremely important for wagering success: the more calibrated a gambler's probability assessments, the better will be his or her return because the better will be his or her ability to identify market mispricings. To confirm this I ran hundreds of thousands of simulations in which I varied the level of calibration of the bookmaker and of the punter to see what effect it had on the punter's ROI if the punter followed a level-staking strategy, betting 1 unit on those games for which he or she felt there was a positive expectation to wagering.
(For those of you with a technical bent I started by generating the true probabilities for each of 1,000 games by drawing from a random Normal distribution with a mean of 0.55 and a standard deviation of 0.2, which produces a distribution of home-team and away-team probabilities similar to that implied by the bookie's prices over the period 2006 to 2009.
Bookie probabilities for each game were then generated by assuming that bookie probabilities are drawn from a random Normal with mean equal to the true probability and a standard deviation equal to some value - which fixed for the 1,000 games of a single replicate but which varies from replicate to replicate - chosen to be in the range 0 to 0.1. So, for example, a bookie with a precision of 5% for a given replicate will be within about 10% of the true probability for a game 95% of the time. This approach produces simulations with a range of calibration scores for the bookie from 0.187 to 0.24, which is roughly what we've empirically observed plus and minus about 0.02.
I reset any bookie probabilities that wound up above 0.9 to be 0.9, and any that were below 0.1 to be 0.1. Bookie prices were then determined as the inverse of the probability divided by one plus the vig, which was 6% for all games in all replicates.
The punter's probabilities are determined similarly to the bookie's except that the standard deviation of the Normal distribution is chosen randomly from the range 0 to 0.2. This produced simulated calibration scores for the punter in the range 0.188 to 0.268.
The punter only bets on games for which he or she believes there is a positive expectation.)
Here's a table showing the results.
So, reviewing firstly items from the top row we can say that a punter whose probability estimates are calibrated at 0.20 (ie as well-calibrated as the bookies have been over recent seasons) can expect an ROI of negative 22% if he or she faces a bookie whose probability estimates are calibrated at 0.19. Against a bookie whose estimates are instead calibrated at 0.20, the punter can expect to lose about 7%, or a smidge over the vig. A profit of 9% can be expected if the bookie is calibrated at 0.21.
The table on the right shows just how often the punter can expect to finish in the black - for the row we've been looking at about 2% of the time when facing a bookie calibrated at 0.19, and 89% of the time when facing a bookie calibrated at 0.21.
You can see in these tables how numerically small changes in bookie and punter calibration produce quite substantial changes in expected ROI outcomes.
Scanning the entirety of these tables makes for sobering reading. Against a typical bookie, who'll be calibrated at 0.2, even a well-calibrated punter will rarely make a profit. The situation improves if the punter can find a bookie calibrated at only 0.21, but even then the punter must themselves be calibrated at 0.22 or better before he or she can reasonably expect to make regular profits. Only when the bookie is truly awful does profit become relatively easy to extract, and awful bookies last about as long as a pyromaniac in a fireworks factory.
None of which, I'm guessing, qualifies as news to most punters.
One positive result in the table is that a profit can still sometimes be turned even if the punter is very slightly less well-calibrated than the bookie. I'm not yet sure why this is the case but suspect it has something to do with the fact that the bookie's vig saves the well-calibrated punter from wagering into harmful mispricings more often than it prevents the punter from capitalising on favourable mispricings,
Looking down the columns in the left-hand table provides the data that underscores the importance of calibration. Better calibrated punters (ie those with smaller calibration scores) fare better than punters with poorer calibration - albeit that, in most cases, this simply means that they lose money as a slower rate.
Becoming better calibrated takes time, but there's another way to boost average profitability for most levels of calibration. It's called Kelly betting.
Kelly Betting
The notion of Kelly betting has been around for a while. It's a formulaic way of determining your bet size given the prices on offer and your own probability assessments, and it ensures that you bet larger amounts the greater the disparity between your estimate of a team's chances and the chances implied by the price on offer.
When used in the simulations I ran earlier it produced the results shown in the following table:
If you compare these results with those shown earlier using level-stake wagering you find that Kelly betting is almost always superior, the exception being for those punters with poor calibration scores, that is, generally worse than about 0.24. Kelly betting, it seems, better capitalises on the available opportunities for those punters who are at least moderately well-calibrated.
This year, three of the Fund algorithms will use Kelly betting - New Heritage, Prudence, and Hope - because I'm more confident that they're not poorly-calibrated. I'm less confident about the calibration of the three new Fund algorithms, so they'll all be level-staking this season.
]]>The TAB offers a 50:50 proposition bet on every AFL game that the match will end with an even or an odd number of points being scored. I can find no reason to favour one of those outcomes over another, so even money odds seems like a reasonable proposition.
How strange it is then that 6 of the last 8 seasons have finished with a preponderance of games producing an even total. Surely this must be compelling evidence of some fundamental change in the sport that's tilting the balance in favour of even-totalled results. Actually, that's probably not the case.
One way to assess the significance of such a run is to realise that we'd have been equally as stunned if the preponderance had been of odd-totalled games and then to ask ourselves the following question: if even-totalled and odd-totalled games were equally likely, over 112 seasons how likely is it that we could find a span of 8 seasons within which there was a preponderance of once type of total over the other in 6 of those seasons?
The answer - which I found by simulating 100,000 sets of 112 seasons - is 99.8%. In other words, it's overwhelmingly likely that a series of 112 seasons should contain somewhere within it at least one such sequence of 6 from 8.
Below is a chart showing the percentage of games finishing with an even total for each if the 112 seasons of the competition. The time period we've just been exploring is that shown in the rightmost red box.
If we go back a little further we can find a period from 1979 to 2000 in which 16 of the 22 seasons finished with a preponderance of seasons with more odd-totalled than even-totalled games. This is the period marked with the middle red box. Surely 16 from 22 is quite rare.
Well, no it isn't. It's rarer than 6 from 8 but, proceeding in a manner similar to how we proceeded earlier we find that there's about a 62% probability of such a run occurring at least once in the span of 112 seasons. So, it's still comfortably more likely than not that we should find such a sequence even if the true probability of an even-totalled game is exactly 50%.
Okay, we've dismissed the significance of 6 from 8 and 16 from 22, but what about the period from 1916 to 1974 (the leftmost red box) during which 37 of the 59 seasons had predominantly odd-totalled games? Granted, it's a little more impressive than either of the shorter sequences, but there's still a 31% chance of finding such a sequence in a 112 season series.
Overall then, despite the appearance of these clusters, it's impossible to reject the hypothesis that the probability of an even-totalled game is and always has been 50%.
Further evidence for this is the fact that the all-time proportion of even-totalled games is 49.6%, a mere 55 games short of parity. Also, the proportion of seasons in which the deviation from 50% is statistically significant at the 1% level is 0.9%, and the proportion of seasons in which the deviation from 50% is statistically significant at the 5% level is 4.5%.
Finding meaningful and apparently significant patterns in what we observe is a skill that's served us well as a species. It's a good thing to recognise the pattern in the fact that 40 of the 42 people who've eaten that 6-day-old yak carcass are no longer part of the tribe.
The challenge is to be aware that this skill can sometimes lead us to marvel at - in some cases even take credit for - patterns that are just statistical variations. If you look out for them you'll see them crop up regularly in the news.
]]>