With Super Tuesday just around the
corner (March 1) it is probably worth dusting off the old
Infographinomicon and checking the engine will still turn over.
![]() |
Ah... just as I left
Here we go...
What are Primaries and (How) Do They Work?
Most people understand primaries as the
elections to elect who will run in the elections; the semi-finals of
the political process, if you will. But, of course, it's never that
simple – especially in America where every state always does its
own thing.
Firstly, not all primaries are
primaries. Some primaries are caucuses and others are actually
primaries. A primary, strictly speaking, is a vote as most people
would imagine it – a day-long process where people turn up to cast
a secret ballot. A caucus is a meeting (or usually a series of
simultaneous state-wide meetings) where voting is conducted publicly
by a show of hands, standing on specific sides of a room or other
such student-representative-council-esque systems. These are,
for obvious reasons, quicker and cheaper but result in vocal or
popular groups easily swaying other voters. Alaska, Colorado, Hawaii,
Iowa, Kansas, Maine, Minnesota, Nevada, North Dakota and Wyoming all
use a caucus system.
Each state has a number of delegates'
votes to be won. These are effectively votes for a candidate. These
can be allocated on a proportions system, a winner-takes-all system,
or some strange hybrid.
Lets take the Republican primaries as an example. Iowa has 30 delegates in
the Republican primaries, out of 2,472 nationwide (and there are
almost twice as many Democrat delegates at 4,763). These are
awarded proportionally, so when Ted Cruz won with 28% of the vote he
won 8 delegates, followed by Donald Trump and Marco Rubio on 7
apiece, Ben Carson with 3 and five other candidates each getting one.
New Hampshire is also proportional, but
only allocates delegates who get over 10% of the vote. Thus, although
Donald Trump gained 35.3% of the vote he got 11 of the 23 delegates.
South Carolina is a winner-takes-all
system, so Trump gained all 50 delegates by taking 32.5% of the vote
– 10 percentage points above his closes rival Marco Rubio.
Several states are proportional, but
become winner-takes-all if a candidate polls over 50%. In Arkansas,
anyone who gets over 15% gains a delegate, with an opponent polling
over 50% taking the rest or, failing this, the remainder being
divided proportionately. Mississippi, Oklahoma and Tennessee are among the states even
more complicated with various thresholds and divisions, and Colorado,
Missouri, North Dakota, Wyoming and several US territories like Guam
and the Virgin Islands do their own thing. The list goes on.
states divide their state-wide (at-large), district and convention
delegated using different systems. Some votes are open to the public
(open), others are restricted to party members (closed) with
variations including semi-open, semi-closed and mixed systems in some
Then there are the Republican unpledged
delegates and Democrat superdelegates who get to vote however they
like without primaries. And
then there are all the delegates who are won by a candidate who then
drops out, with various state rules on how these work.
All in all, the system is a mess. For
simplicity, we'll just be looking at voting results as a percentage,
rather than calculating the delegates won. This could be misleading,
as 49% may mean walking away with a few hundred delegates (e.g. the
California Democratic primary) or none. But lets at least test our
predictive systems before we get ahead of ourselves.
Cherry-picking Season
The Infographinomicon is all about
shortcuts. There is no perfect way to predict voting at an election
(with a few exceptions).
Instead, it is more useful to aim is to find the most accurate
predictive tool possible using the simplest methods. This means
cherry-picking the data to use – a task which can skew the data
horribly if done wrong (and might be impossible to do right).
However, since the objective is to remain as accurate as possible, it
is never advisable to deliberately ignore or factor in data to
achieve a desired result.
So, it is with a clear conscience that
we can choose to ignore the complexities of whether a state holds an
open primary or semi-closed caucus for our predictions. Factoring these in would over-complicate models beyond any practical use and mean no result, historical or over a state border, can help inform our prediction.
A more problematic issue is the
shortage of data, with three contests per party completed so far. For
this reason we won't make any predictions until after Super Tuesday.
This ignores the issues of voter shift over time or the elimination
of certain candidates, which we'll have to ignore and compensate for respectively. However, the focus of the media has been on
demographics – how will Trump fare against Rubio in states with
larger Latino populations, and why Sanders is more popular with young
voters. So we will pick demographics of race, age and sex, look at
how each candidate has polled so far, and apply that to future
The 2012 voter turnout data can be
found here. While election turnout is not the same as primary participation,
and 2012 was a very different race with very different candidates,
this data is easy to use and well collaborated so we'll use that as a
very broad tool for profiling voter demographics for each state.
After Super Tuesday we'll plot each major candidate's result against
various demographics, find a line of best fit,
and make a series of predictions for each candidate in each remaining
Sound good? Good.