Sunday, 26 January 2014

A poor psephologist blames his tools...


With the South Australian Labor Party having launched their campaign for the March election I suppose it must be time to get back to blogging. However, this early in the run up there is really not much in the news to talk about, so I get to indulge my own little psephological projects. And that, of course, means it is time for a...

Statistics Party!!!



For the federal House or Representatives election last year I based my rather mediocre predictions on three tools – the Variable-Dependent Transparency Arrays which mapped past voting trends, the Pendulum which summarised the margin of each seat, and Seat Run-Downs for each state which summarised the general historical lean of each seat. Over the next few weeks, unless something more interesting or time-sensitive turns up, I will be analysing each tool's accuracy and usefulness; this will then inform my use of these tools (or lack thereof) in the state election. This week we look at the VDTA:

Tool Summary:


Numbers were crunched, maps were coloured and fun was had by all. The VDTA uses very subjective calculations to broadly summarise the voting trends of recent years by superimposing semi-transparent layers of election results so that recent outcomes eventually blot out older ones. The transparency of each layer depends on a variable – in this case the accuracy of using this election predicting the next.

Results Analysis:


These are the results from map we used:

Data source.

and these are the same results divided into distinct predictions based on their hexadecimal colour code (those with a higher red value are red, those with a higher blue are blue and the white divisions remain the same):

Blue are Coalition, red Labor and white excluded.

This map correctly predicted 116 seats, got 32 wrong and called 2 tossups.

Green are correct, red incorrect and black excluded.

This is roughly 78% accurate for all called (i.e. non-tossup) seats. Both tossups had insufficient data to calculate a value for the VDTA. A state-by-state (and territory-by-territory) breakdown of accuracy ratings is as follows:

ACT: 100% (2/2)
NSW: 79% (38/48)
NT: 100% (2/2)
QLD: 69% (20/29)
SA: 82% (9/11)
TAS: 80% (4/5)
VIC: 86% (32/37)
WA: 71% (10/14)

Superficially, we might expect an accuracy percentage of high 70s to low 80s by applying the same VDTA equation to the SA state election. Ignoring for a moment the likely differences between the two elections, lets remember that this is the first data point on the accuracy of this methodology. Lets use this figure as a ballpark, but not rely on it too heavily until we have a few more elections under our collective belt.

The obvious question, then, is whether or not we are using the optimal equation.


I can confirm that we are almost certainly not. I have no doubt that with a little tweak to the denominator in the equation we can improve the accuracy a little. And then, as I outlined in the methodology, redefining the number of elections factored into the C value could possibly improve the long-term predictive power of the method, at the expense of accumulating more short-term outliers. Then, of course, we could try changing the dependent variable (number of seats changing hands) to make the transparency dependent on margins or swings, on a seat-by-seat or national basis.

All of these could be fruitful avenues of investigation once we have more results to work off of, but it would be premature to tinker around now. I am sure we could get some startlingly accurate correlations between the VDTA and the actual results, but I sincerely doubt these would form a good predictive tool rather than an ad hoc and confectted match up with the previous outcome.

However, the VDTA was proposed as an alternative to simply averaging the history of the seats, and when we do a comparison, simply averaging is more accurate. This implies, at this early stage, that recent electoral data is not necessarily more relevant than older data. Further consideration is required, but here are the stats:

Green are correct, red incorrect and black excluded. Data.

ACT: 100% (2/2)
NSW: 94% (45/48)
NT: 100% (2/2)
QLD: 77% (23/30)
SA: 72% (8/11)
TAS: 40% (2/5)
VIC: 92% (34/37)
WA: 80% (12/15)

NATIONAL: 85% (128/150)

The only states where averaging performed worse than the VDTA were SA and Tasmania, which will be our next elections covered. At this point it seems that the VDTA introduces unnecessary noise, but alternatively may be more accurate in the upcoming predictions. I think it may pay to use both and see which works best in these two states and across other elections too.

Finally, the extreme case of a VDTA with 0% transparency which was the other simplistic map the VDTU was intended to supersede. In practice this would just be using the 2010 results as a blueprint for the 2013 predictions, possibly with intensity factored in to represent length of incumbency as proposed here.

The simplistic way of testing this is simply to look at what percentage of seats changed hands on the results pendulum, and the accuracy is 100% minus this value.

"Prediction" column reflects to my 2013 overall prediction, not the prediction of one specific method.

22 seats changed hands, which is roughly 15% of the seats. This gives the method of using the previous election as the predictions for the next 85% accuracy, the same as the seat averages and better than the VDTA. This technique does better with independents and minor parties, who may hold seats consecutively but rarely show up on the VDTA or seat averages.

Conclusion:


While more data is required, initial results suggest the VDTA is not an effective summary of past voting trends for the purposes of extrapolation into the future.