Features:
Watching the Results Change
The New York Times’ Jacob Harris on Election Projections
When you are in the business of reporting election results, nothing is quite so unbearable as a blowout election. We prefer the nailbiters, when it’s close enough that you don’t call it immediately when the polls close but not so close that it’s impossible to call at all. The general elections of 2013 were terrible in that regard, with massive margins for the victors in the NYC mayor and New Jersey governor races. Luckily for results jockeys like me, Virginia’s gubernatorial race turned out to be closer than expected. Democrat Terry McAuliffe had held a consistent but weakening lead in the polls leading up to the election, but for most of the night his Republican opponent, Kenneth T. Cuccinelli II, was ahead in the tally only to see it slip away in the end.
This was a bit surprising to some of our readers. Early in the night when there were only 10% of precincts reporting, it seemed liked Cuccinelli held an impressively solid lead. The confusion stemmed from how we report results. Almost every news organization uses a simple metric to calculate “percent reporting”: divide the number of precincts reporting by the total precincts in the state. The problem with calculating the percent of precincts reporting is that it is not a good measure of the percent of the population reporting. If all precincts were equally sized and distributed across a state, it would serve as a decent approximation, but the number of voters in a precinct can vary dramatically from location to location. Scott County in Virginia’s rural west recorded 5127 votes across 18 precincts for an average of 285 voters per precinct. Densely urban Arlington in eastern Virginia recorded 67,190 votes across 53 precincts for an average of 1,268 voters per precinct. Rural precincts report earlier not just because they have fewer votes to record; rural precincts also usually close on time, while crowded precincts will stay open past poll closing time to accommodate anybody who was waiting in line at that point.
Most state elections follow this pattern. Rural precincts that generally favor Republicans will report earlier in the night, while more populated counties that may favor Democrats will trickle in later in the night. It can be confusing to readers to thus see a candidate who was losing in the polls start with an early lead in the polls. This confusion is only made worse by the sports metaphors we often use in our coverage. Once we start reporting the results, a given candidate doesn’t “come out of the gate strong” like a horse or “come back from behind” in a vote deficit to win like a football team; the polls are closed and there is no legal way for a campaign to influence the final outcomes. The dramatic narratives we think we see in tabulation are often just an artificial result of how votes are counted, although there are compelling stories in how campaigns move the polls and bring out the votes to get to that final result.
These delays in tabulation are why early numbers often paint a deceptive picture. This is why calling desks value exit polls so highly. They are by no means perfect, but they attempt to provide a better picture of the whole state’s voting preferences when the early results are still heavily skewed rural. I also have found a crude projection can help when results are being reported from almost every county:
- For each county reporting, scale up projected vote counts for each candidate by dividing their tallies by the percent of precincts reporting in that county.
- Create a statewide projected vote total for each candidate by summing up their county projections.
- Compute new vote percentages for each candidate based on their totals here.
This approach is relatively crude compared to more sophisticated projections that might include likely voters or exit poll data, but it’s also easy to compute. We’ve already seen that the number of voters in a precinct might vary dramatically from county to county, but it’s safer to assume there’s less deviation within a county. So, this projection uses the votes recorded in precincts reporting to scale up projected votes in heavily-populated areas even when only a few precincts have reported. No projection is perfect, but this one showed McAuliffe taking the lead a full half hour before that became apparent in the statewide vote counts.
Showing What Changes
Our narratives about vote counts might be illusions, but the actual change of vote totals during the course of a night is interesting in itself. This year, we were able to diagram it with a beautiful chart showing how the vote totals changed minute-by-minute for each candidate. The chart is interesting in itself, but it’s also an example of how advance preparation can make future data journalism possible.
People looking at election results want to know the current state of each race. My election loader thus overwrites old vote counts with newer values when counts are updated. But news also happens in elections when things change: when a candidate is declared the winner or picks up delegates or is now the leader in the vote counts. I wanted to supplement the current votes with a log of when the votes changed in a specific race.
One way to do this would be to snapshot the entire DB on a regular basis. This would be comprehensive but also overkill. Vote counts update in fits and spurts, and most don’t change from race to race. My loader already has a powerful mechanism for detecting when races change, so it was easy for me to piggyback on that to record changes into a new race_diffs
table (with an associated result_diffs
table recording the vote counts for each diff in the other table). Each of these tables records the votes for a race at that instance associated to the load that detected those changes. Playing back the vote changes for a single race is as simple as running this SQL query:
SELECT
r.load_id AS load_id,
r.changed_at AS changed_at,
r.total_votes AS total_votes,
r.precincts_reporting_pct AS report_pct,
dem.vote_count AS dem_votes,
rep.vote_count AS rep_votes
FROM race_diffs r
LEFT OUTER JOIN result_diffs dem ON dem.race_kind = r.race_kind AND dem.race_id = r.race_id AND dem.load_id = r.load_id AND dem.party_id = 'Dem'
LEFT OUTER JOIN result_diffs rep ON rep.race_kind = r.race_kind AND rep.race_id = r.race_id AND rep.load_id = r.load_id AND rep.party_id = 'GOP'
WHERE
r.nyt_race_id = '#{nyt_race_id}' AND
r.race_kind = 'state'
Our internal election_results API uses this SQL to return a simple JSON file that lists all the times the vote counts changed for the democratic and republican candidates in any race (we could generalize this further to support key independent candidates when it matters). I built this code for the 2012 general elections, but other technical problems from that night meant it got only limited use in our liveblogging.
Fast forward a year later to a relatively quiet election night filled mostly with boring blowouts. But that gubernatorial race in Virginia was narrowing, and it gave us something to do in the tedium of all our programs working like they’re supposed to and not crashing horribly. So, I dusted off the old API request, pointed it at the race_diffs for the governor’s race and 15 minutes later, Matt Ericson had a chart showing how the vote counts changed during the night. Doing this from scratch would’ve been more than our addled brains could handle, but a little advance planning makes new lines of reporting possible.
Credits
-
Jacob Harris
Jacob Harris is a Senior Software Architect who works with a kickass team of fellow newsroom developers at the New York Times.