How wrong could the polls be? 📊 June 28, 2020
How likely is a 6 or 7 percentage-point swing to Trump?
Welcome! I’m G. Elliott Morris, a data journalist and political analyst who mostly covers polls, elections, and political science. Happy Sunday! This is my weekly email where I write about news and politics using data and share links to what I’ve been reading and writing.
Thoughts? Drop me a line (or just respond to this email). If you like what you’re reading, tap the ❤️ below the title; it helps the post rank higher in Substack’s curation algorithm. If you want more content, I publish subscriber-only posts 1-2x a week.
I have a short but relevant newsletter for you today. The conventional wisdom about the 2020 presidential election has shifted dramatically over the past week. Betting markets (to the extent that they are the conventional wisdom) have moved from offering Trump a 50-50 shot at re-election to a 40-60 one. I still think that’s far too high. For now, I’d like to ask the data: how wrong do the polls need to be for Trump to win the election? How likely is that error?
How wrong could the polls be?
How likely is a 6 or 7 percentage-point swing to Trump?
Election-forecasting models rely on our ability to put the current political environment in the context of the past. By definition, the underlying statistical models only work if the data we’re using to measure those environments are similarly predictive over time. If the connection between the economy and election outcomes breaks down, for example, the models will fail.
The hope is that polls this time will fit into that historical context. Forecasting models work by simulating a reasonable amount of error (“reasonable” based on the historical distribution of error in the polls) around the average of public polling. If something happens to the polls or the electoral environment to cause them to not be predictive of peoples’ preferences—say, a pandemic that causes a 2-3 percentage point dip for the incumbent president, or polls that overestimate the share of Republicans or Democrats in the electorate—then using that historical distribution error might be misguided.
But we can still rely on that historical distribution to ask the question of how likely a large polling error is to occur. Right now, with Joe Biden up about 9 points in a simple average of the polls, Trump would need a roughly 6-7 point swing in his favor (accounting for a 2-3 bias toward Republicans in the electoral college) to win the election. That swing is a product of two types of error in how well polls now predict election-day results: (a) error from potential movement in the national political environment and (b) error from the potential failure of polls to anticipate the demographic and political composition of voters.
So, how likely is a 6-7 point error in today’s polling average? We can derive that likelihood from the historical distribution of error in polling averages calculated four months out from election days between 1980 and 2016. That distribution takes the shape of normal distribution centered at 0, with a standard deviation of the root-mean-square error of the past polling averages. That root-mean-square error looks like this:
The lines above show the estimated root-mean-square error in the polling averages for three sets of election cycles: each since 1980, 2000, and 2008. One thing you can immediately notice is that polls today are about 1.5-2x as accurate as they were if you include the misses in the 80s and 90s, too.
From here it’s a simple task of statistics and calculating the area under a curve. At 130 days before the election…
The chance that a candidate who is 9 percentage points ahead in national polls would end up with a vote margin below 3 points is 15%, conditional on the distribution of polling error since 1980
The chance that a candidate who is 9 percentage points ahead in national polls would end up with a vote margin below 3 points is 10%, conditional on the distribution of polling error since 2000
The chance that a candidate who is 9 percentage points ahead in national polls would end up with a vote margin below 3 points is 3%, conditional on the distribution of polling error since 2008
Today, according to the full-range of polling errors in the dataset, we give the polling average a margin of error of 11 percentage points. One other way to look at the current election is: what would the root-mean-squared polling error have to be to give Trump a reasonable (>25%?) chance of winning the election? According to this math, about 9 percentage points—or 1.5x the amount of error in polling averages since 1980 and 3x the expected polling error since 2008.
This all to say that it’s pretty unlikely Trump will win. He would have to pull off an upset an order of magnitude better than his 2016 victory to be re-elected, according to the historical reliability of political polls. I don’t think such a swing is likely (by definition of the margin of error defined above) but I’m willing to hear data-driven arguments about why polls will misfire again this year.
Posts for subscribers
What I'm Reading and Working On
No reading to report this week. But PS, in case you missed it: I announced this week that I’m writing a book about polls and democracy for W. W. Norton. You can read more about it here.
Thanks for reading!
If you want more content, I publish subscriber-only posts on Substack 1-3 times each week. Sign up today for $5/month (or $50/year) by clicking on the following button. Even if you don't want the extra posts, the funds go toward supporting the time spent writing this free, weekly letter. Your support makes this all possible!