Primary Preliminaries

After a while in mothballs, it's probably fair to say that an update is overdue. This year will see elections in both Australian territories (NT August 27, ACT October 15), a probable Federal election (which must be held before January 15 next year), a UK referendum on EU membership (June 23) and a full year of United States Presidential hijinks (culminating in the November 8 election).

With Super Tuesday just around the corner (March 1) it is probably worth dusting off the old Infographinomicon and checking the engine will still turn over.

Ah... just as I left it...

Here we go...

What are Primaries and (How) Do They Work?

Most people understand primaries as the elections to elect who will run in the elections; the semi-finals of the political process, if you will. But, of course, it's never that simple – especially in America where every state always does its own thing.

Firstly, not all primaries are primaries. Some primaries are caucuses and others are actually primaries. A primary, strictly speaking, is a vote as most people would imagine it – a day-long process where people turn up to cast a secret ballot. A caucus is a meeting (or usually a series of simultaneous state-wide meetings) where voting is conducted publicly by a show of hands, standing on specific sides of a room or other such student-representative-council-esque systems. These are, for obvious reasons, quicker and cheaper but result in vocal or popular groups easily swaying other voters. Alaska, Colorado, Hawaii, Iowa, Kansas, Maine, Minnesota, Nevada, North Dakota and Wyoming all use a caucus system.

Each state has a number of delegates' votes to be won. These are effectively votes for a candidate. These can be allocated on a proportions system, a winner-takes-all system, or some strange hybrid.

Lets take the Republican primaries as an example. Iowa has 30 delegates in the Republican primaries, out of 2,472 nationwide (and there are almost twice as many Democrat delegates at 4,763). These are awarded proportionally, so when Ted Cruz won with 28% of the vote he won 8 delegates, followed by Donald Trump and Marco Rubio on 7 apiece, Ben Carson with 3 and five other candidates each getting one.

New Hampshire is also proportional, but only allocates delegates who get over 10% of the vote. Thus, although Donald Trump gained 35.3% of the vote he got 11 of the 23 delegates.

South Carolina is a winner-takes-all system, so Trump gained all 50 delegates by taking 32.5% of the vote – 10 percentage points above his closes rival Marco Rubio.

Several states are proportional, but become winner-takes-all if a candidate polls over 50%. In Arkansas, anyone who gets over 15% gains a delegate, with an opponent polling over 50% taking the rest or, failing this, the remainder being divided proportionately. Mississippi, Oklahoma and Tennessee are among the states even more complicated with various thresholds and divisions, and Colorado, Missouri, North Dakota, Wyoming and several US territories like Guam and the Virgin Islands do their own thing. The list goes on.

Some states divide their state-wide (at-large), district and convention delegated using different systems. Some votes are open to the public (open), others are restricted to party members (closed) with variations including semi-open, semi-closed and mixed systems in some states.

Then there are the Republican unpledged delegates and Democrat superdelegates who get to vote however they like without primaries. And then there are all the delegates who are won by a candidate who then drops out, with various state rules on how these work.

All in all, the system is a mess. For simplicity, we'll just be looking at voting results as a percentage, rather than calculating the delegates won. This could be misleading, as 49% may mean walking away with a few hundred delegates (e.g. the California Democratic primary) or none. But lets at least test our predictive systems before we get ahead of ourselves.

Cherry-picking Season

The Infographinomicon is all about shortcuts. There is no perfect way to predict voting at an election (with a few exceptions). Instead, it is more useful to aim is to find the most accurate predictive tool possible using the simplest methods. This means cherry-picking the data to use – a task which can skew the data horribly if done wrong (and might be impossible to do right). However, since the objective is to remain as accurate as possible, it is never advisable to deliberately ignore or factor in data to achieve a desired result.

So, it is with a clear conscience that we can choose to ignore the complexities of whether a state holds an open primary or semi-closed caucus for our predictions. Factoring these in would over-complicate models beyond any practical use and mean no result, historical or over a state border, can help inform our prediction.

A more problematic issue is the shortage of data, with three contests per party completed so far. For this reason we won't make any predictions until after Super Tuesday. This ignores the issues of voter shift over time or the elimination of certain candidates, which we'll have to ignore and compensate for respectively. However, the focus of the media has been on demographics – how will Trump fare against Rubio in states with larger Latino populations, and why Sanders is more popular with young voters. So we will pick demographics of race, age and sex, look at how each candidate has polled so far, and apply that to future contests.

The 2012 voter turnout data can be found here. While election turnout is not the same as primary participation, and 2012 was a very different race with very different candidates, this data is easy to use and well collaborated so we'll use that as a very broad tool for profiling voter demographics for each state. After Super Tuesday we'll plot each major candidate's result against various demographics, find a line of best fit, and make a series of predictions for each candidate in each remaining state.

Sound good? Good.

