Infographinomicon: Polls, polls, polls...

They’re not very useful things, but if we didn’t have polls what would we use to fill our newspapers with?

The problems with polls are legion, and have been discussed by many of the bloggers listed to the right (and many more of the 146 million Google hits for ‘the problem with polls’). Then there are the dreaded polls of polls gaining popularity in the United States, which are generally just an average of other polls. While it may seem superficially reasonable to assume that an average will iron out errors between the polls (e.g. neutralise the biases of two opposing polling houses), unfortunately not all errors can be so easily dealt with, and some are further entrenched or even exacerbated. For example, in phone polls it is generally easier to fill the over-60s quota than the under 30s. The only options are to keep phoning until you get enough under-30s to answer (in which case they may not be representative of the generally inaccessible younger generation) or else to mathematically exaggerate the under-30s results and minimise the over-60s to represent their proportion of the voting population. This is called scaling and can exaggerate statistical anomalies.

Then there is the constant reporting of 1 and 2 percentage point gains and losses, even though the margin of error on most polls is roughly 3 pp. And then we have all the informal polling, and push polling, and selective use of data. To illustrate, here is an often weekly “poll” published for most of this year. It is the Q&A audience demographic, excluding those weeks (e.g. the “religion Q&A”) where the audience demographic was measured on another scale (e.g. religious belief):

Interestingly, despite the roller-coaster ride of Gillard’s failing popularity, Rudd’s resurgence and the subsequent Coalition momentum –- all of which are known to have affected voter intentions –- the polls seem to have flat-lined. The Coalition flutters overhead between the 40 and 50% marks, with the ALP roughly 10 percentage points below. Coincidentally, 10% is roughly where the Greens have been sitting all year. It is almost as though the ABC picks its studio audience to give a roughly consistent 50-50 split between conservative (Lib/Nat) and progressive (ALP/Greens) views.

Oh, wait… that is exactly what it does.

But my main gripe is simply the way polling has to be framed in order to be realistically achievable in terms of time and resources. The two main polls are preferred PM/approval-disapproval ratings type questions –- which are irrelevant because voters do not directly elect the PM -– and the one commonly phrased “if a vote were held today, who would you vote for?”

Now let’s ignore the point that the election is not being held today, and accept that these polls are a snapshot of the popular vote. The real issue is that our government is not elected by the popular vote. The country is divided into 150 seats, and you need to win just over 50% of the seats to form government. To win each seat you need just over 50% of the two-party preferred vote in that seat. In other words, it is possible to form a majority government with just over 25% of the vote. The TPP vote at that, which means you can win with an ever smaller support base if enough people vote for the “others”.

This means, conversely, that you can lose an election with almost three-quarters of the TPP vote. And, in an extreme hypothetical situation you could go from winning with 25.1% to losing with 74.9% in one term, giving you a loss of the back of an almost 50% swing in your favour.

Of course, in reality the swing lies mostly with the marginal voters who can win or lose a seat for a party, so it can be an accurate indicator. But if a 1% swing were predicted in favour of party A, the media would turn to their electoral pendulum and work out who would win the election assuming a uniform 1% swing. In other words all seat with a margin of 1% or less cross the floor, and the media counts up who has a majority.

Swings are rarely uniform, however. In the state WA election this year, I got a pretty close estimate of seats changing hands by assuming a swing of around ⅔ that reported in the polls. And then there was Albany, the most marginal seat on our pendulum for the ALP, which not only resisted the general dash to the Coalition, but actually improved its margin for the (now) opposition Labor party.

Now that ⅔ of the predicted swing was plucked out of mid-air, but my reasoning was simple – most of the campaigning (and in particular the “sand-bagging”) would be focused on the marginal seats, which would tighten up the figures there, while in the safe seats that no one cares about the polls would run away a little more and become exaggerated. This time around, we’ll be a little (emphasis on little) more scientific in our use of polling.

According to ABCNews24, Newspoll today released a new poll through News Limited, so you know this news is new. This poll is apparently (I’m going on second- and third-hand sources) predicting a 6% swing to the Coalition next weekend, as well as indicating a 5% decrease in the ALP primary vote in three marginal Victorian seats and 7% in five NSW coastal seats.

Note that the “coastal” demographic (which I have never really payed much attention to) is more volatile than the “marginal” chaps and chapettes. Again, the seats where the swings really matter are not quite as vulnerable as the nation as a whole.

But don’t take my word for this phenomena. (No, seriously, don’t. You’ll see why later.) Here is a graph based on the previous election’s data. It compares how marginal a seat is (vertical axis) with their swing towards the ALP (horizontal axis) from their 2007 position*.

I don’t think I have produced a more ambiguous graph yet. The scatter demonstrates an overall shift to the Coalition (I don’t know whether to call that a shift to the left or a shift to the Right…), but beyond that, not much. There are big swings in seats with high and low margins. Perhaps things become clearer if I ignore pro- and anti- incumbent swings and just look at an absolute swing across the board?

Nope. Not really.

To be fair. A line of best fit would probably run roughly bottom left to top right, but the correlation is very low with many distant outliers.

I had hoped to deduce a nice little line of best fit and use that to estimate the size of swings in various seats based on their margins to give a rough prediction of how many seats might fall during the election based on the latest poll. Unfortunately my hunch that marginal seats would be less influenced by swings is not borne out strongly in the data, so there goes that idea.

Instead, over the next few days, I will be looking at how strongly influenced by swings each seat has been over the last few elections to see if there is any logic in assigning seats a “swing index” instead. This index would represent whether the seat generally felt the trends more or less powerfully than the national average, or even if they tend to vote against the trend.

My gut feeling is that each seat will have a reasonably consistent number of swinging voters, and thus have a reasonably stable susceptibility to the factors driving the national swing. But then again, we’ve all just seen how reliable my gut feeling can be on these things.

*N.B. several seats have been omitted. Durack (WA), McMahon (NSW) and Wright (Qld) did not have a real swing, since they were created in 2010 (replacing Prospect (NSW),Lowe (NSW) and Kalgoorlie (WA)) and had no incumbent to swing to or from. However, I have still included the seats the new divisions were carved from, and the seats the old divisions amalgamated into, despite the obvious changes to their constituency makeup. Call me lazy – I know my mother does.

Denison (Tas) and Lyne (NSW) elected independents in 2010 while Kennedy (Qld) and New England (NSW) re-elected theirs. Since we are just looking at the 2PP swing, these can cause all kinds of confusion and misunderstanding. That does not mean these seats’ data is irrelevant, just that other factors may be in play and I need a different kind of graph. Likewise Melbourne (Vic) was omitted because it elected a Greens candidate in 2010.

Finally, any seat with a non-ALP-vs-Coalition margin in 2007 or 2010 was also omitted: Batman (Vic) and Grayndler (NSW) (second place Greens, 2010), Melbourne (Vic) (again) (second place Greens, 2007) and O’Connor (WA) (Nationals win over Liberals, 2010, after the WA Nats formed a breakaway from Warren Truss’s leadership). These were omitted because I calculated the 2010 2PP swings from the 2010 and 2007 2PP margins and didn’t want to extract the necessary major parties’ support by calculating back-flows from eventual 2PP stats. If you guys want it done, do it yourself.

Infographinomicon

Saturday 31 August 2013

Polls, polls, polls...

No comments:

Post a Comment