Thursday, 3 March 2016

The Sexy Statistics

Now that Super Tuesday has passed and results have been collated, lets begin with an analysis of the demographic of sex/gender. The source data outlined in the previous post provides several number sets we could work with, but we'll pick just two: female and male registered voters, and female and male voters actually present in 2012. The former seems to be the most relevant, so we'll save that for later. However, the actual 2012 voters are not as irrelevant as might first be thought. These are likely to include the most politically active (who are more likely to attend primaries and caucuses) and those who have not become jaded in the decades since they've registered.

This post will be very cisnormative because of the nature of the source data.

2012 Voter Gender

14 states have held their Democrat primaries or caucuses. All of these states had over 50% of their voters at the last election recorded as female.

ALABAMA 1009 1145 53.16%
ARKANSAS 531 593 52.76%
GEORGIA 1899 2269 54.44%
IOWA 733 816 52.68%
MASSACHUSETTS 1576 1807 53.41%
NEVADA 511 536 51.19%
NEW HAMPSHIRE 322 366 53.20%
OKLAHOMA 656 775 54.16%
SOUTH CAROLINA 940 1247 57.02%
TENNESSEE 1172 1434 55.03%
TEXAS 3925 4719 54.59%
VERMONT 146 162 52.60%
VIRGINIA 1709 2069 54.76%

When plotting the current Democrat primary results against these numbers, we see that as the number of female voters last election increases, so does support for Hillary Clinton. This might seem unsurprising to many, but I was not expecting an increase this dramatic:

Line of best fit (Clinton): y = 5.64x - 2.46
Line of best fit (Sanders): y = -5.64x + 3.46

Of course I'm not foolish enough to believe that women are more likely to vote for Clinton because she's a woman. This slope may appear more dramatic because of a small sample size, or because the range from 51.19% to 57.02% is so narrow (only 5.83 percentage points) that a small vertical change can result in a large angular shift. Alternatively, it would not be surprising if Clinton had better cut-through and engagement with women. And, while I don't think there are many women who would vote for Clinton because she's a woman, I think there probably are a lot of men who will NOT vote for Clinton based on her sex.

(N.B. the candidate's result is recorded as a percentage of votes out of those won by the two candidates shown. Votes for other minor candidates etc. are eliminated, primarily so that in the Republican graphs we can ignore candidates who have already, or are likely to soon, drop out of the race.)

For the Republican race, the number of female voters at the last presidential election has an insignificant correlation with the candidates' success. The numbers for voters in states which have held a Republican contest already are again consistently female dominated:

ALABAMA 1009 1145 53.16%
ALASKA 140 149 51.56%
ARKANSAS 531 593 52.76%
GEORGIA 1899 2269 54.44%
IOWA 733 816 52.68%
MASSACHUSETTS 1576 1807 53.41%
MINNESOTA 1374 1485 51.94%
NEVADA 511 536 51.19%
NEW HAMPSHIRE 322 366 53.20%
OKLAHOMA 656 775 54.16%
SOUTH CAROLINA 940 1247 57.02%
TENNESSEE 1172 1434 55.03%
TEXAS 3925 4719 54.59%
VERMONT 146 162 52.60%
VIRGINIA 1709 2069 54.76%

And the graph, comparing Republican 2016 results with this gender distribution is disappointingly (that is to say, uninterestingly) flat:
Line of best fit (Trump): y = -0.25x + 0.56
Line of best fit (Cruz): y = 0.03x + 0.28
Line of best fit (Rubio): y = 0.22x + 0.16

There's not much to be said about this except that the Trump juggernaut appears completely indifferent to sex or gender. This may be, however, because the dynamic of the Trump candidacy is so removed from the dynamic of the 2012 presidential election that there's really no correlation to be found. As will be seen, a look at registered voters tells a different story.

Registered Voters

When it comes to registered voters in already-Democrat-contested states, we again see the female population being more politically engaged.

NEW HAMPSHIRE35339853.00%
SOUTH CAROLINA1096138255.77%

Applying the same methodology and plotting Democratic primary results against the % of registrations female, we again see far greater support for Clinton in states with more female registered voters.

Line of best fit (Clinton): y = 6.39x - 2.85
Line of best fit (Sanders): y = -6.39x + 3.85

The conclusions here are the same as already stated for the Democratic contest previously. The republican contest, however, looks very different this time:

ALABAMA 1201 1354 52.99%
ALASKA 181 180 49.86%
ARKANSAS 637 739 53.71%
GEORGIA 2178 2589 54.31%
IOWA 838 906 51.95%
MASSACHUSETTS 1750 2009 53.45%
MINNESOTA 1496 1589 51.51%
NEVADA 574 602 51.19%
NEW HAMPSHIRE 353 398 53.00%
OKLAHOMA 835 970 53.74%
SOUTH CAROLINA 1096 1382 55.77%
TENNESSEE 1432 1778 55.39%
TEXAS 4977 5772 53.70%
VERMONT 168 189 52.94%
VIRGINIA 1931 2279 54.13%

Interestingly, for the first time, we have found a category where male participation is greater than female: in voter registration in Alaska. The really interesting data, however, is in the resulting graph:

Line of best fit (Trump): y = 0.80x + 0.01
Line of best fit (Cruz): y = -1.22x + 0.94
Line of best fit (Rubio): y = 0.42x + 0.05

The candidate to take a beating from an increase in registered female voters in Ted Cruz, the ultra-right-wing senator from Texas. The m value of his linear equation dropped from neutral (0.03) to severely negative (-1.22) between the two Republican graphs. This is a loss of 1.25 in the m value, which equates to 1.25% of the vote lost per 1% increase in female registration as a proportion of the population.

To put that another way, in a field of 100,000 voters, if a 1,000 male voters unregistered (hypothetically) and 1,000 female voters replaced them, Cruz would lose 1,250 votes. This is, of course, absurd. Even if all of the male voters supported him and none of the females did, the most votes he could lose should be capped at 1,000. There is probably some error arising from the assumptions and approximations inherent in this calculation, the small data set etc., but this also ignores the reality that states with higher female registration may also have different attitudes to various issues, which may make up the other 250 lost votes.

The key point here is that as female voter registration increases, Cruz takes a bigger and bigger hit. Most of that support goes to Trump, whose m value jumped by 1.05 (a gain of 1,005 votes in the above scenario of 1,000 deregistered men and 1,000 registered women).

Sexy Predictions

If we extrapolate from these lines of best fit, we can obtain some (VERY) rough estimates for the Democrat and Republican results in the remaining states:

Democrat linear equation based on 2012 voter turnout

Republican linear equation based on 2012 voter turnout

Democrat linear equation based on voter registration

Republican linear equation based on voter registration

This seems to look in sure-fire wins for Trump and Clinton. However, as a minor point, the republican predictions are only valid as long as Cruz and Rubio remain in the race. If one of these drops out (on the numbers most likely to be Rubio, though tactically the Republicans would prefer it was Cruz) it may result in a combined anti-Trump vote of over 50% (particularly after considering how ingrained anti-Trump sentiment must now be in those camps). On a more major point, this data is worthless. The deviations from the linear equations in all graphs is significant. The correlation is poor to non-existent. If we apply this same "predictive" model to the won states, it seems little better than guesswork:

The linear equation for Democrats based on voter registration would have called over a third of the primaries incorrectly due to errors of up to 39%. This is exceptionally poor (8/13) for a method directly based on the results it is trying to predict.

Basing predictions of voters in 2012 gets one more primary right (Iowa) but this is just pure luck.

Similarly, the Republican equations which give Trump every remaining primary also give Trump every primary so far, with very static results:

These are only accurate 9 times out of 15 (= 60% of the time) - wrong again in more than one third of all cases with errors of up to 21% of the vote.

In Summary

TL;DR: Sex or gender demographics are a very poor means of determining voting outcomes in primaries. There is little correlation, if any, between the sex/gender of constituents and voting patterns. This is is unsurprising given the male:female ration is always close to 50:50 (it seems this is lightly but consistently skewed towards female participation) while voting patterns vary wildly from state to state.

It remains to be seen whether more variable demographics like age and race may have a stronger correlation with voting patterns.

N.B. Oklahoma's Democratic results became available during this post's writing. Statistics have not been recalculated to accommodate these numbers.

No comments:

Post a Comment