Sunday, August 17, 2008

What everyone should know about polls

If a candidate's lead in a poll is within the margin of error, that does not mean that it is meaningful to say that the candidates are "in a statistical dead heat". As Kevin Drum once explained:

...what we're really interested in is the probability that the difference is greater than zero — in other words, that one candidate is genuinely ahead of the other. But this probability isn't a cutoff, it's a continuum: the bigger the lead, the more likely that someone is ahead and that the result isn't just a polling fluke. So instead of lazily reporting any result within the MOE as a "tie," which is statistically wrong anyway, it would be more informative to just go ahead and tell us how probable it is that a candidate is really ahead. As a service to humanity, here's a table that tells you:
So, for example, if the margin of error for a poll was 5%, and Candidate A had a lead of 3%, then that means that the probability of Candidate A really being in the lead is 73%. To say that the candidates are in a "statistical tie" or "statistical dead heat" makes you think that there is an equal chance that either candidate could be in the lead--which is wrong.

The moral of the story: when you see Obama ahead in a poll, but the lead is still within the poll's margin of error, don't worry--it's still likely that Obama really is ahead.

2 comments:

Alex said...

Ahh! Okay, of course I'll get disagreeable here. I disagree, apparently with what Kevin Drum says. I *don't* think it's interesting to ask the probability that a candidate is in the lead. Take a hypothetical scenario where we had 100% confidence via our polling methods (we polled everyone) that a candidate was in the lead by 10 votes. This would not be interesting, because people's intended votes change over time, new voters come in and leave, etc. In fact, I would still easily call this a dead heat, for this reason.
The margin of error does not, as far as I know, capture the likelihood of correctness of a given person-poll. All it reflects is the discrepancy between what we would get from sampling some small fraction of people, and asking everyone, assuming we sample with uniform probability.

So this means that the values in your table aren't inherently that interesting to me. They tell me the likelihood that a candidate is at least one vote ahead of the other candidate. What I'd really need for a good picture of the situation is a 3d table, where I see the likelihood that the candidate is at least one vote ahead, at least two votes ahead, etc. Of course, this could be computed, and I can have some reasonable intuition for what it would look like from your table. But it wouldn't be that much better intuition than what I get from the margin of error.

David Morris said...

Hah! You're intense.

I think Drum's main concern was clearing up the wrong idea that most people have (myself included before I read the post), which is that a "statistical dead heat" of 52%-48% with a 5% MOE should be considered the same as a 50%-50% tie.

Of course, if someone busted out with a 3D representation of the latest Pew survey, that would be awesome. And certainly one would think they could be doing something along these lines with Flash in web articles already.