They’re not very useful things, but if we didn’t have polls
what would we use to fill our newspapers with?
The problems with polls are legion, and have been discussed
by many of the bloggers listed to the right (and many more of the 146 million Google
hits for ‘the problem with polls’). Then there are the dreaded polls of
polls gaining popularity in the United States, which are generally just an
average of other polls. While it may seem superficially reasonable to assume
that an average will iron out errors between the polls (e.g. neutralise the
biases of two opposing polling houses), unfortunately not all errors can be so
easily dealt with, and some are further entrenched or even exacerbated. For
example, in phone polls it is generally easier to fill the over-60s quota than
the under 30s. The only options are to keep phoning until you get enough under-30s
to answer (in which case they may not be representative of the generally inaccessible
younger generation) or else to mathematically exaggerate the under-30s results
and minimise the over-60s to represent their proportion of the voting population.
This is called scaling and can exaggerate statistical anomalies.
Then there is the constant reporting of 1 and 2 percentage
point gains and losses, even though the margin of error on most polls is
roughly 3 pp. And then we have all the informal polling, and push polling, and
selective use of data. To illustrate, here is an often weekly “poll” published
for most of this year. It is the Q&A audience demographic, excluding those
weeks (e.g. the “religion Q&A”) where the audience demographic was measured
on another scale (e.g. religious belief):
Interestingly, despite the roller-coaster ride of Gillard’s
failing popularity, Rudd’s resurgence and the subsequent Coalition momentum –-
all of which are known to have affected voter intentions –- the polls seem to
have flat-lined. The Coalition flutters overhead between the 40 and 50% marks,
with the ALP roughly 10 percentage points below. Coincidentally, 10% is roughly
where the Greens have been sitting all year. It is almost as though the ABC
picks its studio audience to give a roughly consistent 50-50 split between
conservative (Lib/Nat) and progressive (ALP/Greens) views.
But my main gripe is simply the way polling has to be framed
in order to be realistically achievable in terms of time and resources. The two
main polls are preferred PM/approval-disapproval ratings type questions –- which
are irrelevant because voters do not directly elect the PM -– and the one
commonly phrased “if a vote were held today, who would you vote for?”
Now let’s ignore the point that the election is not being
held today, and accept that these polls are a snapshot of the popular vote. The
real issue is that our government is not elected by the popular vote. The
country is divided into 150 seats, and you need to win just over 50% of the
seats to form government. To win each seat you need just over 50% of the two-party
preferred vote in that seat. In other words, it is possible to form a majority
government with just over 25% of the vote. The TPP vote at that, which means
you can win with an ever smaller support base if enough people vote for the “others”.
This means, conversely, that you can lose an election with
almost three-quarters of the TPP vote. And, in an extreme hypothetical
situation you could go from winning with 25.1% to losing with 74.9% in one
term, giving you a loss of the back of an almost 50% swing in your favour.
Of course, in reality the swing lies mostly with the marginal
voters who can win or lose a seat for a party, so it can be an accurate indicator. But if a 1% swing were predicted in
favour of party A, the media would turn to their electoral pendulum and work
out who would win the election assuming a uniform 1% swing. In other words all
seat with a margin of 1% or less cross the floor, and the media counts up who
has a majority.
Now that ⅔ of the predicted swing was plucked out of mid-air,
but my reasoning was simple – most of the campaigning (and in particular the “sand-bagging”)
would be focused on the marginal seats, which would tighten up the figures there,
while in the safe seats that no one cares about the polls would run away a
little more and become exaggerated. This time around, we’ll be a little
(emphasis on little) more scientific in our use of polling.
According to ABCNews24, Newspoll today released a new poll
through News Limited, so you know this news is new. This poll is apparently (I’m
going on second- and third-hand sources) predicting a 6% swing to the Coalition
next weekend, as well as indicating a 5% decrease in the ALP primary vote in
three marginal Victorian seats and 7% in five NSW coastal seats.
Note that the “coastal” demographic (which I have never
really payed much attention to) is more volatile than the “marginal” chaps and
chapettes. Again, the seats where the swings really matter are not quite as
vulnerable as the nation as a whole.
But don’t take my word for this phenomena. (No, seriously,
don’t. You’ll see why later.) Here is a graph based on the previous election’s
data. It compares how marginal a seat is (vertical axis) with their swing
towards the ALP (horizontal axis) from their 2007 position*.
I don’t think I have produced a more
ambiguous graph yet. The scatter demonstrates an overall shift to the Coalition
(I don’t know whether to call that a shift to the left or a shift to the Right…),
but beyond that, not much. There are big swings in seats with high and low
margins. Perhaps things become clearer if I ignore pro- and anti- incumbent
swings and just look at an absolute swing across the board?
Nope. Not really.
To be fair. A line of best fit would
probably run roughly bottom left to top right, but the correlation is very low
with many distant outliers.
I had hoped to deduce a nice little
line of best fit and use that to estimate the size of swings in various seats
based on their margins to give a rough prediction of how many seats might fall
during the election based on the latest poll. Unfortunately my hunch that
marginal seats would be less influenced by swings is not borne out strongly in
the data, so there goes that idea.
Instead, over the next few days, I will
be looking at how strongly influenced by swings each seat has been over the
last few elections to see if there is any logic in assigning seats a “swing
index” instead. This index would represent whether the seat generally felt the
trends more or less powerfully than the national average, or even if they tend
to vote against the trend.
My gut feeling is that each seat will
have a reasonably consistent number of swinging voters, and thus have a reasonably
stable susceptibility to the factors driving the national swing. But then
again, we’ve all just seen how reliable my gut feeling can be on these things.
*N.B. several seats have been omitted. Durack (WA), McMahon (NSW) and Wright (Qld) did not have a real swing, since they were created in 2010 (replacing Prospect (NSW),Lowe (NSW) and Kalgoorlie (WA)) and had no incumbent to swing to or from. However, I have still included the seats the new divisions were carved from, and the seats the old divisions amalgamated into, despite the obvious changes to their constituency makeup. Call me lazy – I know my mother does.
Denison (Tas) and Lyne (NSW) elected independents in 2010 while Kennedy (Qld) and New England (NSW) re-elected theirs. Since we are just looking at the 2PP swing, these can cause all kinds of confusion and misunderstanding. That does not mean these seats’ data is irrelevant, just that other factors may be in play and I need a different kind of graph. Likewise Melbourne (Vic) was omitted because it elected a Greens candidate in 2010.
Finally, any seat with a non-ALP-vs-Coalition margin in 2007 or 2010 was also omitted: Batman (Vic) and Grayndler (NSW) (second place Greens, 2010), Melbourne (Vic) (again) (second place Greens, 2007) and O’Connor (WA) (Nationals win over Liberals, 2010, after the WA Nats formed a breakaway from Warren Truss’s leadership). These were omitted because I calculated the 2010 2PP swings from the 2010 and 2007 2PP margins and didn’t want to extract the necessary major parties’ support by calculating back-flows from eventual 2PP stats. If you guys want it done, do it yourself.