Objectivity and judgment in election handicapping 📊 August 23, 2020
The real strength of quantitative analysis does not lie in impartiality, but in formalizing theories about behavior
Last week’s Brady Bunch-style Democratic National Convention took six months to plan and still often had some hiccups and technical glitches. This week’s virtual RNC was planned in a month. What glitches will plague the GOP infomercial? And this is my weekly newsletter.
As always, I invite you to drop me a line (or just respond to this email). Please hit the ❤️ below the title if you like what you’re reading; it’s our little trick to sway Substack’s curation algorithm. If you want more content, I publish subscriber-only posts 1-2x a week.
I could have written about the small bounce in favorability ratings and voter enthusiasm that Joe Biden looks to have received from the Democratic National Convention last week, but others have already covered the subject well enough and I have some meta thoughts about election forecasting that I’d prefer to think through with you today. Thus the extremely boring title of this week’s newsletter. It is not a sexy subject, but it is an important one.
I hope you are all enjoying the political party infomercials and doing your best to avoid the summer heat and dangerous pandemic.
Objectivity and judgment in election handicapping
The real strength of quantitative analysis does not lie in impartiality, but in formalizing theories about behavior
There are many traps that a person can fall into when they try to forecast an event or phenomenon. Aside from the obvious forms of bias, such as political and ideological, there is an array of psychological pressures and heuristics that can obscure the future from our view. These are unavoidable traps for the vast majority of us, especially when it comes to making judgments about outcomes that are uncertain.
By and large, I think consumers of news have gotten smarter about these inhibitors over the last decade or so. That is especially true when it comes to political punditry and made-for-TV election forecasting. One of the more prominent developments in journalism over the last 20 years has of course been the rise of poll aggregators and election forecasters, who do appear to be more adept at handicapping than the pundit class.
At the same time, I think we have tended to learn the wrong lessons from the successes (and failures) of election forecasters. Let’s take Nate Silver, perhaps one of the best-known statisticians in the world, as our operative example here. That is not to treat Nate as the idol or bulls-eye election handicapper and empirical journalist, but as an archetype for members of both fields.
Nate is an adept forecaster. He is skilled at combining many sources of information using an array of statistical techniques and extracting signals out of noise, as the title of his famous book proclaims. Of course, Nate is no prophet—because none of us can be—but he has done better than a replacement-level handicapper at most of the things his site FiveThirtyEight deals in (sports, elections, the Oscars, etc).
Let’s focus on his election predictions. People like to characterize Nate’s successes as finding the “secret sauce” to predicting election outcomes. He has developed an algorithm to find for the optimal combination of election predictors, a layman might conclude, and has done so using nearly superhuman rationality and objectivity in pursuit of understanding The Truth about (American) elections.
But this is not Nate’s contribution to forecasting—neither, adjacently, is it the success of (good) election handicappers, data journalism or quantitative analysis more generally. The true strength of empirical journalism is in applying sound judgment (what some people might call subjectivity but which I would defer to my colleague Andrew Gelman in calling context-dependence and perspectivity) to the many facets of statistical learning (such as in model design or overall analytical formulation) and in formulating our hypothesis about why things happen. Nate’s election models make predictions based on the state of the economy, for example—that’s because he believes in the theory that voters punish presidents when the economy is bad and reward them when they are good. And building a model on top of that theory can tell you how big that punishment or reward is in terms of votes.
The idea that Nate has helped popularize is that when we rely on a close empirical study of both data and domain knowledge, we can out-perform pundits and snap decision-makers at anticipating events and analyzing explanations. That requires both deploying objective analysis and making judgments about the world around us.
And this is where people go wrong in both (a) indicting pundits and (b) praising Nate Silver and his acolytes. The problem with traditional media analysts is not that they ignore the real data that predict an outcome, the true explanations for why something did or did not happen ahead of time that Nate in his infinite wisdom has provided for us; it’s that their necessary judgments are often flawed or swayed by unconscious biases and heuristics. And it’s really, really hard to control for some of the self-rationalizing doom loops that we find ourselves in when justifying those bad judgments. Nate’s writing-off of Trump’s ability to win the 2016 Republican primary because of his political background and [*gestures in the air*] everything about him is a good example of said doom loops.
And while quantitative analysts fall prey to these judgments, too, one of the ways we adjust for their effects is by testing theories about—and explanations for—target phenomena. Polls don’t just tell us who is up or down, for example, they can also tell us why; Approval ratings tell us whether a president is unpopular and we infer from that unpopularity that people might be unwilling to vote for them.
If we know about the why of a pattern or outcome, we can be more reassured that we’re analysing the what and the how correctly. Empiricism allows us to ground our analysis both in consensus and impartiality (again, Andrew’s language) and in context and perspectivity in a way that people making quick predictions about things on TV typically don’t or can’t.
There are several scientific explanations for why pundits’ tendencies to make snap judgments or rushed decisions about political outcomes exposes them to errors.
Malcolm Gladwell explains two of them in his 2005 book Blink. One of those explanations is “analysis paralysis,” or more commonly “overthinking,” when a pundit or handicapper is drowning in information and can’t find a common thread to pull on. This barrier to decision-making is especially hard when information about an outcome is conflicting, causing us to jump on indicators that prey on our biases or to simply throw our hands up, forcing an embrace of false uncertainty. Modeling—both in empirical analysis and in applying judgments—provides a way to formalize the impacts that all those data have on the target phenomena and find something useful in the chaos.
Another big problem with applying snap judgment to politics is that there are far too many factors for us to be experts in all of them. And the power of the Blink comes in internalizing enough domain knowledge about a subject that your gut tells you something useful. Concerning politics, a cable TV pundit who is an expert in, say, political psychology might not be well prepared to opine about international negotiations on climate change or how political systems might respond to pandemics, but a temptation to cover the news might force them to make a bad snap judgment anyway.
Finally, there is the intellectual contribution of Philip Tetlock to this debate. Tetlock echoes this concern about domain knowledge and says that pundits and “political experts” are bad at forecasting because they typically only study “one big thing” instead of a diverse set of variables that can better explain a pattern or predict an outcome. Media analysts also face other psychological pressures, Tetlock says, in that they are rewarded handsomely for overconfident predictions that actually come true. “What do you know that nobody else does,” a viewer might wonder, “and what can I learn from that that few other people will?”
Let’s wrap up by talking about election forecasting again. One of the things I have noticed on social media recently is a temptation for people to give more of their attention to forecasters that appear more objective or facially correct, ones that apply no judgments or (statistical) assumptions in their work. Given the discussion above, I hope you can see why that’s a wrong default tendency to internalize.
Especially in election-forecasting, there are so few factors about which we have a robust set of data on which to formalize models that if we limited ourselves to them we would likely only ever produce incomplete models. Take two examples:
First, my assertion that polarization is a key variable in election forecasting that should be included in many stages of prediction. I hold this belief not only because I have an n=18 dataset showing a pretty strong correlation between polarization and the probability of landslide elections and large movements in the polls, but because external sources of data have led me to believe that polarization has fundamentally altered voter psychology in America. I don’t base that the inclusion of a polarization variable entirely on the particular datasets that go into the model, but also because we have good reason otherwise to think that this is a good factor to control for. As far as what data the models sees, that’s a subjective assertion.
Second, take Nate Silver’s insistence that the width of New York Times headlines is an effective proxy for uncertainty because more full-width headlines -> more news -> more events to changes peoples minds. Nate certainly has some amount of objective evidence that makes him think this is true—the correlations between the number of headlines and how bad his forecast performs in back-testing, I presume—but I doubt he has enough that it justifies including the variable in his model on its own. Instead, there’s also a subjective rationalization going on here—Nate thinks that some elections are simply less fixed than others, and NYT headlines are a good proxy for that because when there’s more news, there are more factors to change people’s minds. He likely does not have actual evidence about the causal effects of newspaper headlines on opinions, but it is a reasonable (satistical) assumption to make.
There is power in admitting that we cannot quantify everything. A good model (and a good modeler) will know that and incorporate other judgments about reality into their formulations of the world around them.
Posts for subscribers
Plus, I wrote this free-to-read post yesterday about what we can learn when election forecasts—and forecasters—disagree.
What I'm Reading and Working On
Please direct your internet browser to two pieces I wrote on postal voting last week. First, “How Donald Trump polarised postal voting,” and second “The postmaster always rings twice: More mail-in voting doubles the chances of recounts in close states.”
As mentioned in the newsletter I’ve been reading Malcolm Gladwell’s Blink this week. It has been doubly helpful as it both inspired some of the thoughts in this letter and is a great template for non-fiction storytelling.
Thanks for reading!
If you want more content, I publish subscriber-only posts on Substack 1-3 times each week. Sign up today for $5/month (or $50/year) by clicking on the following button. Even if you don't want the extra posts, the funds go toward supporting the time spent writing this free, weekly letter. Your support makes this all possible!
Josh sends along pictures of Franklin and Leo this week. They remind me of how my cats used to like each other—until we got a cat tower that they fought over unceasingly.
For next week’s contest, send in a photo of your pet(s) to firstname.lastname@example.org.