Friday, 19 April 2013

More Donkeying Around

The last time I did a two-part post it was to provide details of the maths relating to my Variable-Dependent Transparency Arrays, and in between the news of the papal election broke and I had to wait an extra week to get my more topical post out.

This week I'm out in the field, so I couldn't provide a more topical post if I tried. And again, the need for a two-parter was caused by the desire not to overload you with maths that, I assure you, makes perfect sense in my head. Most of the maths this time is pretty basic, relying mostly on ratios, averages and differences but it is always ten times harder to explain in words than it is to actually do the maths.

The Story So Far:

So, we have been looking for a method to determined the prevalence of Donkey Voting, that statistically irritating practice of voting 1, 2, 3, 4 from top to bottom regardless of candidate order. The problem, of course, has been isolating these votes from those of people who voted 1, 2, 3, 4 from top to bottom as the result of reasoned decision-making.

Purely looking at the portion of votes received by first-place candidates is obviously going to overlook some major factors such as the fact that certain major parties regularly receive more votes than other minor ones.

It turns out that comparing their proportion of the vote in a given division against their parties national average is highly unreliable, giving an average of 3.05 and a standard deviation (σ) of 12.04 among parties that led the ballot, and compared to an average of 4.06 for all candidates and a σ of 11.86. In both cases the σ value is several times greater than the mean, and using the 68-95-99.7 rule we can expect a spread greater than ± 30. Given the mean is only around 5% of the range, we may as well call it 0 ± 30.

Some of the main factors affecting this calculation are that candidates will perform much better given fewer opposiing candidates, and will collect a larger portion of the vote when there is no other party competing for their support base (i.e. they are the only left-wing or right-wing party). It is also important to realise that, with the exception of the ALP and Greens, no party contested every seat and thus had a national primary vote undermined by at least a few seats where they received 0% of the vote.

Early results using only the 2010 federal election data suggest more reliable results can be obtained by looking at each parties overall average swing and comparing the average swing it received in seats where it lead the ballot. Further examination against other elections is needed, but it seems that first-place candidates regularly receive a notably greater swing than second-place or last-place candidates on this scale.

Because the swing compares each candidate's result against their performance (or, if a party has changed candidate, their predecessor's performance) in the previous election, it is not affected by the first two distorting factors mentioned for the previous method, assuming that the seat maintains a similar number of candidates belonging to similar parties. The greater the difference from election to election, the less reliable this method is likely to be. This method is not affected by parties who do not contest certain seats, either.

The Good, The Bad and The Intricate:

So far there has been one method that just plain does not work, and one that might. Here is a third, more complex method.

If a donkey voter ranks the candidates from top to bottom, and the first candidate is eliminated, their vote will pass to the second candidate. If we can determine the normal flow of preferences from one candidate to the others, we can see how many more votes than normal flow on to candidate two.

For example our great and glorious leader, Antony Green, proposed as rough guide for predicting preference flows in the WA state elections that 70% of Greens votes flow to Labor on a two-party preferred basis, and 75% Nationals votes flow to the Liberals.

If we find the average proportion of, say, Greens votes that flow to Labor and the Liberals then we could predict how the preferences would flow in a three candidate race between them if the Greens candidate dropped out first. Lets use Antony's 70% figure for this example.

In most cases 70% of Greens preferences flow to the ALP. In a case where the Greens candidate lead the ballot and the Liberals were placed second, however, lets say 35% of the vote passed to the Liberals. This is 5% greater than expected, and we could propose that 5% of Greens supporters in that seat donkey voted as an explanation for this deviation from the average. (Note that only Greens voters can be donkey voters in this example, since our definition of a donkey vote requires them to place a 1 next to the top (i.e. Greens) candidate.)

If 20% of voters voted for the greens, and 5% of these are donkey voters, then 1% of voters in that seat are donkey voters.

This method is very rough and that 5% variation could be the result of atypical political ideas in the voting public of that seat or the result of some division-specific factor. However, on a large scale this may yield an average figure for donkey voters as a proportion of the voting community.

Of course, determining the average flow of preferences is the hard part. If I had access to the ballots themselves - or for example if the AEC produced for each division a table listing the number of people who voted 123, the number voting 132 and do forth - then this would be quite simple. Instead we will have to try and calculate this another way.

One simple approach is to focus entirely on districts during the first redistribution of votes, and take an average of the portion of votes directed to each other party. In 2010, across all the seats in which Australia First was eliminated in round 1 and the Liberals also contested the seat, an average 13.94% of Australia First's vote passed to the Liberals.

The Carers Alliance and Communists both lost a single division on the first redistribution, so the figures in red are not really an average but a single result.

Note that these figures rarely add up to 100%, because parties contest different seats. To explain this, imagine Parties A, B and C contest a seat and that Party A is the first to drop out. Party A's votes are then distributed 70% to the like minded B party and 30% to the C party.

This is the only seat that A and B both contested which A lost in the first round, however there is another seat contested by A, C and D which A came lowest in as well. The votes of A are split 50-50 between Parties C and D. Thus A's votes are split with 70% going to B, 50% to D and an average of 40% to C, adding up to 160%. Now, it is easy to correct this so that the rows do sum to 100% simply be dividing by the original total then multiplying by 100 thus:

However, this overlooks a key point:
The proportions of a vote that passes on to another party relies upon which other parties are contesting the seat. Thus 30% of the A votes flow to C if B also contests the election, and 50% if it doesn't. In other words we need to compare apples with apples again, and only compare seats contested by an identical set of parties.
As we saw last week, there are very few cases where divisions are contested by candidates from the exact same set of parties. The most common case is the divisions contested by the ALP, Liberals and Greens only - and even then this amounts to only six divisions. To take an average of vote redistributions from seats where the parties contesting are directly comparable (and thus must also contain no independents) is not going to yield reliable results over a single election given the small sample sizes.
Instead, if I want to look at the preference flows of Party A to parties B, C and D, I take the average flows of all seats lost by A in the first round of which B, C and D are a subset of the contesting candidates. I would therefore look at the flows in seats contested by {A, B, C, D, E}, {A, B, C, D, F} and {A, B, C, D, G, H} as well. This is a less refined approach but will hopefully iron out any outliers far better.

This will all be much clearer once we get down to actual cases. There are 20 seats where the first party eliminated was at the top of the ballot (Aston, Bonner, Bradfield, Dawson, Dobell, Dunkley, Fowler, Gippsland, Gorton, Kingston, Lingiari, Lyons, Mackellar, McPherson, Menzies, Mitchell, Moore, Rankin, Scullin and Werriwa). These are the ones we are interested in, since we want to know how much of their vote passed on to second place, compared to what we would normally expect.

Results:

To begin with Bradfield (NSW), the three candidates were (in order):

1. GEMMELL, Susie (Greens)
2. GALLARD, Sarah (ALP)
3. FLETCHER, Paul (Liberal)
With 14,231 votes, the Greens ranked the lowest of the three and was eliminated. 10,977 of these votes flowed on to the ALP, the remaining 3,254 passing to the Liberals. This is a 77.13% flow to the ALP, while 22.87% of the votes flow to the Liberals.

In all the seats where the Greens are eliminated first and both the ALP and Liberals are present to receive a share of the votes (i.e. Canberra, Barton, Bradfield, Mackellar, Werriwa, Braddon) the average Labor flow is 75.19% and the Average Liberal flow is 22.87% (adjusted to ensure that the two equal 100% of the votes).

The flow in Bradfield is 1.94 pp higher than the average. This could be artificially boosted a little if any of those comparison seats also contained another party leaching some of the Greens flow-on vote. (As it happens they do not.)

This is equivalent to 1.94% of the 14,231 redistributed votes (which must contain all of the donkey votes), or 276 donkey votes.

Below, the same process is applied to all seats where the first-place candidate is eliminated first, except Moore. Since Moore is the only seat where One Nation was the first candidate to be eliminated and had a CDP candidate, the only comparable seat to determine the predicted flow for Moore is Moore itself. This means the actual and theoretical flows are identical, and the 'Lift' = 0 pp for all remaining candidates.

Independents have to be ignored in calculating predicted flow for obvious reasons (namely that there can be no comparable seats since only the division in question is contested by that particular independent). For this reason, Independents are not factored into the Redistribution Percent.

The seat of Lingiari has only one other comparable seat, due to the presence of the Country Liberals, a Northern Territory specific party. If we treat the Country Liberals and Liberal Party as equivalent, the results are as follows:

 And yes, technically NT is a territory, not a state. Live with it.

There is only one division out of the 19 reviewed here where the first-place party performed worse than the predicted flow would suggest: Gippsland (VIC) by -1.21 percentage points.

The average 'Lift' for first-place candidates is 11.06 pp (11.98 pp if Lingiari uses broader Liberal Party data), and the average number of calculated donkey votes is 395. The average electoral division has around 82,500 voters. At around 400 donkey votes per seat, that is a donkey vote rate of 0.48%.

Final Thoughts:

When I began working on this week's method of calculating donkey votes, I was tempted to calculate a basic table of preference flows, then move on to second round eliminations. Because we would know how many votes the second eliminated party got from the first, and we would know their probable preference flows, we could determine the preference flows of the second eliminated party, and then the third, and so on to refine our averages.

I decided against this for many reasons. Firstly time. Secondly, I can see errors accumulating, so that a tiny inaccuracy from the average of first round eliminations would add to any inaccuracy in the second round and so on. Thirdly, if you really want a reliable measure of preference flows (and, for that matter, exactly how many people voted 1, 2, 3, 4) you need the original ballots.

We have seen, over the last few weeks, estimates of the donkey vote rate ranging from around 2.34% to less than half of a percent. However, over the population of several thousand voters, donkey voting can potentially make a difference, especially in a preferential voting system like ours where a single vote can determine who gets eliminated after each round, and the flow of preferences that this entails. Even at a rate of 0.48% of a percent, there were almost 60,000 donkey voters last federal election. That is big enough to form a seperate electoral district - Lingiari had less than 46,500 registered voters in 2010.

Compulsory Voting: Don't be a Donkey.