Wednesday 13 February 2013

Variable-Dependent Transparency Arrays

Prelude:


I have a going away party tomorrow night, which is good news for me because there's a party and bad news because a friend is going away. It also means you get this week's post early, which is good news for you because you only had to wait six days since the last post, and bad news because you'll have to wait eight days for the next one.

So, if you missed last week's post, I created a colourful map with a stupid name. I did some other stuff too, but I'm going to spend this post discussing that map. Those of you interested in the process behind creating both that map and those below might want to read that first. Normal people might just want to look at all of the pretty colours. You know who you are...

Also, the nation-wide maps are pretty bulky. You can click on them to open if you want to identify specific seats, but this could chew up your bandwidth and/or data quota pretty horrendously over the next few months. Sorry about that, but it is necessary in order to represent all 150 seats geographically. Some seats are difficult to see. The best views are afforded by right-clicking on the image and choosing to open it in a new tab or window. This is still not great for tiny seats, but there are only three solutions that I can think of.

The first is to make the maps bigger. This is inconvenient for me, harder to view on screen for you and more demanding for your bandwidth. The second was proposed to me – that I use vector graphics. This would be fantastic if I were competent enough to make them and this blog were designed to host them. The third, which I am using, is to provide supplementary posts (below) with all the data provided in a concise, readable form that abandons geographic distribution for maps and provides cold hard numbers for statistics.

And now, to business:

Trends versus Averages:


So the point of my previous map was to show the past voting history of a seat. Perhaps the simplest way I could have done that would be to average the results of past elections thus:



However, as always, there are several problems with this. Firstly, some seats have much longer histories than others. Durack (WA) and Wentworth (NSW) both display pure blue, since both have been won by Coalition parties (or their predecessors) in every election. The difference is that Wentworth was proclaimed in 1900 and has supported the coalition in every election since federation while Durack was proclaimed in 2008 and has contested only one election. Wentworth's trend is pretty stable and allows me to boldly declare with confidence that, short of retirement or health concerns, Wentworth will be retained by Shadow Minister for Communications and Broadband Malcolm Turnbull. Predicting for Durack, on the other hand, carries all the usual dangers of extrapolation from minimal data points (e.g. illegal polygamous relationships).

To illustrate another problem with this approach, consider the fictional seat of Green, a outer-metropolitan seat that first ran in the 1984 election. It is named after Antony Green and its main industry is pebble counting. Below are four representations of Green from parallel universes:



In the top left universe, Green was consistently an ALP seat until the turn of the millennium, then voted Liberal ever after. Since our predictions at this point are purely concerned with trends, this is probably a moderately Liberal seat short of the Coalition imploding.

In the top right, however, Green voted for the Liberals until the 1996 election, then switched to become a typical Labor seat.

The bottom left version of Green is more volatile, possibly influenced by the constantly changing policies on both sides that impact on the high-risk, high-reward pebble counting industry. It has voted for the Liberal party four times and Labor six times, but neither has held the seat for more than two consecutive elections. This seat is marginal and considered a Tossup.

The bottom right version of Green is slightly less volatile. It is also a bellwether seat, and so presumably contains a demographic that roughly approximates the nation's varying seats in equal proportions. This is also a Tossup but will probably follow the general trend in polling.

The two tossups, the safe Liberal seat and the safe ALP seat at first glance look identical, but how many of you can see the subtle difference?

Don't worry if you can't because, of course, there isn't one. The point I am making in my traditionally long-winded way is that this approach only considers averages, not trends. It gives the opinions of 1910 equal footing with those in 2010, even though there are far more voters from 2010 than 1910 expected to vote this year. (If this turns out not to be the case you can expect a very interesting blog post in September and/or Edwardian-era zombies.)

One possible way of mapping trends, as opposed to pure averages, would be to display each seat as it currently stands, but with different intensities of colour for the length that a party has held a seat; strong red or blue could represent seats that have consistently voted ALP or Coalition respectively since 1901, while paler seats have shorter runs, with seats that changed hands in 2010 almost white.



This map has two major draw backs. Firstly, and perhaps most obviously, only seats dating back to the early 1900s andthat have voted consistently since then show up in any real intensity. A great many consistently Labor seats appear marginal because they were Coalition during Howard's 1996 landslide and many reliably Coalition seats appear marginal because of Rudd's 2007 landslide. Both years saw unusually large surges for one side or the other in the public vote and the 2010 election may have since rendered many of these seats safe by most conventional measures. Instead, these seats are lost to a faint haze of red or blue at best.

Secondly, consider a seat that has voted consistently for one party since 1901, except once in 2001. Compare that to a seat that has voted consistently for the same party since 1990 (voting for another party in the preceding decades). Based on trends both are safe for their current party, but the first is probably the safer of the two. Despite this, the latter appears the more intensely coloured because it's history is uninterrupted for longer. In more extreme cases a 2010 outlier could make a very strong seat for one incumbent look like a very marginal seat for another. For example, the seat of Lyne looks marginal because Independent MP Rob Oakeshott has only held it since a 2008 by-election, and thus gets the absolute minimum colouring of one federal election (2010). Prior to Oakeshott the seat was consistently Coalition since its proclamation in 1949, and thus should be called a safe blue in the event of Oakeshott declining to run (or possibly even if he does run, since his siding with the ALP in the hung parliament may have lost him considerable right-wing support – although his recent, outspoken, high-profile opposition to mining in his seat may have won back many of his supporters).

Clearly to examine trends geographically we need a map that includes all of the data since 1901 (unlike the second map here) and yet mathematically favours recent trends over old data (unlike map 1).

This is where the map with a silly name comes in...

Variable-Dependent Transparency Arrays:


This map, as I have noted more than once, displays all incumbents as semi-transparent layers. The opacity of each layer is proportional to the number of seats that changed hands at the following election as a percentage of all seats (not counting those introduced in the following election). Or



where O is opacity, c is the number of seats that change hands at the next election not including new seats and t is the total number of seats at the next election not including new seats.

If each layer had around 10% opacity, the top layer would contribute 10% of the colour, the second layer 9% and the third 8.1%. This accounts for 27.1% of the colour in the top three layers, giving data from 2001, 2004 and 2007 over a quarter of the total influence. (2010 data cannot be used here until we know the c-value of the 2013 election.) In this way new trends replace old ones without introducing arbitrary cut-off dates into the data.

Eventually, of course, layers far back in the array will contribute no visible influence on the map. My own personal experimentation suggests any influence less than 1.5% over an area of 1250 pixels will not be picked up by human eyesight (or at least by my eyesight, which is roughly the same thing). This figure jumps to around 5% with a 1-pixel border of black between the (feebly) contrasting areas. The varied scales used by the AEC maps which I have adapted make determining an average display size for a seat difficult, but 1250 pixels is roughly the area of the Division of Fadden on these maps at full display size. Fadden is very close to the median of district sizes and despite the differing scales of the insets appears visually to be about the median here too (though I have not confirmed this through measurement - even my patience has limits).

At 10% opacity the top seven layers each contribute over 5% of the total colour scheme, so this map can be said to have seven layers of depth. My image software, however, can deal with accuracies down to 0.1% opacity prior to being messed up by .jpg compression.

It turns out that 10% opacity gives close to the maximum possible depth for such an array. Anything lower than 10% and the lower colours lack the potency to assert their influence. At 8% opacity we are reduced to a six-layer deep image, and obviously below 5% even the top layer fails to contribute sufficient colour to make a distinct impression on its own.

Going the other way, higher opacity soon begins to block out the lower layers.

At around 12 to 13% the eighth layer contributes just over 4.9% of the colour. Allowing for the primitive nature of my experimentation this may possibly result in an eight-layer deep image, but after being saved as a .jpg these will be virtually indistinguishable from an array with 10% average opacity.

This is where the 8.69 comes from in the equation. The average percentage of seats changing hands in the top seven layers is around 13.1. This means c/t*100% will give an average value of 86.9% (~ two layers of depth). By dividing this by 8.69 the average opacity for the top seven layers is 10% and we achieve near-maximum penetration.

Larger seats seam to be more susceptible to influence by lower layers. The largest seats on this map had influences just visible from layer 14 – twice the depth predicted for Fadden. Nothing was done to correct this apparent susceptibility of larger rural seats since it is a result of perception and the human eye. The raw values displayed by the map are mathematically accurate, which trumps our lying little eyeballs.


My Methods: the Least of a Thousand Evils?


The invisibility of data from before 1990 (layer seven) in medium-small seats and 1974 (layer 14) in larger seats should not be a cause for concern. If the influence of these elections is invisible and the equation used is reliable, it follows that this data has less than 5% influence on the predicted outcome. This is insignificant compared to the error inherent in using past election results to predict future ones in marginal seats. Modifying the equations to ensure these early elections have a visible impact would clearly over-represent their influence.

I added the caveat that this method was sound so long as the equation was reliable. Perhaps, for example,



yields more accurate predictions, suggesting that only the previous three elections have any real relevance to future predictions. (The average percentage of seats changing hands over the last three elections is 88.2 and 60% average opacity allows a three-layer deep display for 1250 pixels; 88.2/60 = 1.47).

Perhaps seat stability needs to be measured over multiple elections, so c = average number of seats changing hands for the next two, three or more elections. The problem with this, of course, is that in order to obtain a c value incorporating the following three elections' results, our most recent layer would be 2001 and we would be basing our predictions on trends from the middle of the Howard-era. Rudd's ALP landslide victory in 2007 would be based on trends from 1996, during Howard's landslide victory for the Coalition.

Alternatively, stability could be measured not based on seats won or lost the following term, but on the margin by which each seat is held. Marginal seats would contribute little colour to a seat, while safe margins of 10%+ would contribute significantly more. This, however, is a very long project, requiring me to dig up the pendula for each election and apply it individually seat by seat.

That is not to say I won't do it, merely that I won't be doing it right now. It also assumes I can obtain the data. Wikipedia has data up to (but not including) 1925 and the AEC gives the necessary figures to calculate the margins after 2001, so I'm only missing about three quarters of Australia's voting history. If anyone knows where I can find the relevant pendula feel free to comment below.

And on that note I will sign off. I did have the aim of discussing the 2010-2013 pendulum next post, but with the announcement of the Pope's resignation – the first Papal abdication in 600 years – I feel a desire to try my hand at the very different arena of conclave voting analysis. While I did previously state my focus would be on Australian and American politics I have been known to dabble in other nations electoral processes and even the UNSC vote last year. The vote for the papacy, however, will be completely new territory for me.

But then, who knows what I will actually end up discussing?

No comments:

Post a Comment