The headline from a Monmouth University poll out Monday is this: Half of likely voters support Democratic presidential nominee Hillary Clinton in a four-way race:
- Hillary Clinton 50%
- Donald Trump 38%
- Gary Johnson 5%
- Jill Stein 3%
The strong result follows an NBC News/Wall Street Journal poll released a day earlier showing Clinton with an 11-point lead over Republican nominee Donald Trump in a national four-way race among likely voters. A Washington Post/ABC News poll also released Sunday suggests a much tighter race, however, with Clinton leading Trump by just 47% to 43% among likely voters.
The Monmouth poll has a margin of error of 3.6 percentage points after talking to 805 registered voters. The margin of error is 3.3 points in the NBC News/Wall Street Journal result, which interviewed 905 people. At the same time, the margin of error in the Washington Post/ABC News survey is 4 points, which happens to be exactly the margin separating the two major-party candidates.
As we’ll see, it’s better for a candidate to be ahead even within the margin of error. And it’s possible this poll is way off and support in reality is outside that margin.
So what does this mean for Trump and Clinton? Answering that requires a clear sense of how polls work, and looking closer tells you everything about what we can and cannot trust.
It depends on whom you ask
In 1936, a magazine called The Literary Digest ran one of the biggest opinion polls of all time. It asked 2.4 million people whether they planned to vote for the incumbent Democratic, Franklin D. Roosevelt, or his Republican challenger, Alfred Landon.
It trumpeted this prediction:
- Landon: 57%
- Roosevelt: 43%
The poll must have had one of the smallest margins of error in polling. But it was dead wrong.
Error margins apply only to the population a pollster is sampling.
This is what actually happened in the election:
- Roosevelt: 62%
- Landon: 38%
The Literary Digest fell prey to selection bias. That massive sample was made up of its subscribers and members of groups and organisations that tended to skew wealthier than the average American.
Today’s pollsters are savvier, but there are still many ways that bias seeps in. For instance, a poll that calls only landlines may leave out a whole demographic of younger, mobile phone-only households. Some polls are opt-in, where users of a specific website answer questions. That’s less reliable than a random sampling.
“Far more important than dialling down the margin of error is making sure that whatever you’re aiming at is unbiased and that you do have a representative sample,” says Andrew Bray, an assistant professor of statistics at Reed College.
Some polls have well-known biases. Rasmussen, for instance, is known to skew Republican.
Lee Miringoff, the director of the Marist Institute for Public Opinion — which produces polls for NBC News, The Wall Street Journal, and McClatchy — says polls are as much art as science.
“Scientifically, we should get the same result,” he says.
Modern polls are not immune to these issues. Some potential voters are harder to reach, and some polls skew more educated. And polls with a high percentage of potential voters who are undecided can lead to more uncertainty.
So how much can we trust today’s results?
Margin of error
Pollsters and journalists tend to highlight the headline numbers in a poll. In July, before the Democratic convention, a Rasmussen survey showed Trump leading Clinton, 43-42.
Rasmussen didn’t help matters by describing Trump as “statistically ahead.”
It’s actually not that simple.
First, you have to consider the margin of error. Rasmussen pollsters interviewed 1,000 people to represent the views of 320 million Americans. Naturally, the poll results might not perfectly match what the whole population thinks.
That Rasmussen poll has a 3-point margin of error. Here’s what that actually means.
Let’s take that Trump number: 43% is something called a point estimate. This is basically the polling firm’s best educated guess of what the number would be if it had asked the whole population. But it’s not guaranteed to be right.
The margin of error accounts for this:
- Because the margin of error is 3 points, the pollsters are confident that support for Trump in the total population is between 40% and 46% — or 43% plus or minus 3 percentage points.
- Support for Clinton is between 39% and 45%.
The point estimate (the dots in the chart above) is like fishing with a spear; you’re stabbing for the right answer. The margin of error is like fishing with a net; somewhere in your catch is the true figure.
But this is not the whole story, either.
Feeling confident
Before the 2016 Michigan primary, it looked as if Clinton had it made. FiveThirtyEight aggregated several polls and predicted that she had a 99% chance of winning the primary. Many polls had Clinton ahead of challenger Bernie Sanders by double digits.
The polls were wrong.
Sanders eked out a narrow victory. Of the many reasons pollsters might have been off, this may be one of them: There’s more to polling than the margin of error.
“The margin of error is a guidepost but not a foolproof” one, Miringoff says.
Here’s what the margin of error really means.
Pollsters typically ask roughly 1,000 people a question like: Whom do you plan to vote for? Their goal is to be 95% sure that the real level of support in the whole population is captured in the sample’s range, from the low end of the margin of error to the high end.
That range is called a “confidence interval.”
Let’s say a pollster like Miringoff were to run that same poll 100 times. Each time, he would randomly select different groups of 1,000 people. Miringoff would expect that the true proportion — the candidate’s actual support — would be found within the margin of error of 95 out of the 100 polls. That’s why he’d say that he’s 95% confident in the results.
Those five outliers are one reason elections don’t always turn out the way pollsters predict.
Remember that Rasmussen poll in July showing Trump with 43% support? That 43% is thought to be the most likely reflection of reality. But the pollster is still only 95% confident that Trump’s true amount of support is found between 40% and 46%.
The further you get from that point estimate, the less likely it is that you are seeing the true number. So it’s more likely to be 42% than 41% — and 40% is even less likely.
The chance that what’s happening in reality is captured by a number outside the 95% confidence interval is, as you might expect, quite unlikely. The more outside it is, the more minuscule the likelihood. But it’s still possible for a poll to be way off.
“If you really want to be 100% confident in your estimate, you’re either going to have to ask every American or be satisfied with a huge margin of error,” Bray, the Reed College statistics professor, says.
The whole point of polling is to extrapolate what a large group believes by asking a randomly selected subset of that group.
In the era of modern polling, most pollsters agree that being 95% confident in the margin of error is “good enough.”
“It’s a reasonably high number,” Bray says. “That means we’re going to be wrong one in 20 times, but for most people that’s acceptable.”
Many polls, such as those from the Pew Research Center, bury the margin of error in the fine print. Far fewer highlight the confidence interval. But anytime you see a poll, remember: There’s a 5% chance that the poll is far different from the headline number.
Keeping it 1,000
Look closely, and you’ll notice that most polls question roughly 1,000 people. That holds true whether pollsters are trying to approximate voter opinion in Rhode Island (about 1 million residents) or the entire US (nearly 320 million residents).
Why 1,000?
It’s a big enough number to be reasonably confident in the result — within the margin of error 19 out of 20 times. There’s a lot of variety in a group of 1,000 people, so it captures many of the elements in the larger group.
Asking more people than 1,000 leads to diminishing returns of accuracy.
For instance, sampling 2,000 people is not twice as precise as sampling 1,000. It might bring the margin of error from roughly 3 points to about 2.2 points.
In modern polling, most statisticians see sampling 1,000 people as a good compromise between a manageable sample size and acceptable confidence.
What and when
Results differ among pollsters for many reasons.
There are simple explanations, like when the polls were conducted. It can take days or weeks to conduct and analyse a poll. A lot of news can happen between the dates on which the questions were asked and the date of the results’ release.
This is especially a problem with polls close to Election Day. They’re generally a snapshot in the week before the election. If something happens in the final days of campaigning, those final polls may not be as predictive.
It also matters how a pollster phrases and orders questions, and whether it’s a phone interview, in-person interview, or online survey. Even the interviewer’s tone of voice can matter.
Then, pollsters have to decide how to analyse and weight the data, and those methodologies can vary.
But it’s not just pollsters analysing data, and that’s where we get another big problem.
Drilling down
When Miringoff releases his Marist polls into the wild, they are quickly consumed by journalists, commentators, and a public looking for trends that create headlines. This drives him crazy.
“It’s too often to throw up your arms,” Miringoff says.
Here’s the problem. Let’s say his team interviews 1,000 people to represent the general population. In that 1,000 there are subgroups: men, women, minorities, immigrants, young people, old people.
It’s tempting to pull out those subgroups and draw conclusions about, say, support for a candidate among Latinos or women.
But each of those subgroups is, in effect, its own sample, and those samples can be very small. That means the margin of error for each subset can be huge.
Take this poll from Pew: In the sample, there were only 146 black respondents. The margin of error for that subgroup is more than 9 points!
You can’t learn much by looking at a group with a 9-point error margin.
Why aggregating is good
If you combine results from multiple polls taken at the same time, you can think of it as one huge poll. That drives down the overall margin of error and can make you more confident in the predictive power of the polls.
In the real world, different polls are conducted in different ways, so you can’t think of an aggregated poll as truly one big sample. But this is also a virtue because it reduces the effect of pollster biases and errors. FiveThirtyEight, The New York Times, and RealClearPolitics all run averages with different weightings and methodologies.
Ahead or tied?
OK, so now that you know a lot more about polls, what should you think when a race is tight? The answer is not straightforward.
Let’s say that a poll comes out showing Clinton with 51% support and Trump with 49%. The margin of error is plus or minus 3 points. Are the two candidates statistically tied, or is Clinton slightly ahead?
In purely statistical terms, most would consider this example a “statistical dead heat.” Either candidate could be ahead.
“It’s pretty significant editorially,” Miringoff says. “It’s not significant statistically.”
That said, that doesn’t mean Clinton’s lead in this hypothetical example is completely insignificant. “If I was running for office, I’d rather have 51 than 49,” Miringoff says.
Remember point estimates? In this scenario, 51% is still the pollster’s best guess at Clinton’s true level of support. That’s higher than Trump’s. If a series of polls shows Clinton with a slight edge — even within the margin of error — then it can suggest an advantage. A series of polls is more convincing than any single poll.
Feedback loop
Finally, for as much as we want to believe that polls are a scientific reflection of reality, polls can also affect reality.
Here’s one example: Polls that show candidates falling behind can galvanize their supporters to get out to vote.
The media may also focus on polling trends, leading to changes in public opinion about which candidates are viable or worth supporting.
Polls don’t happen in a vacuum.