« Twice As Good | Main | That's Rich »

September 30, 2007

Our Statistician is Marge Innovera

by Nicholas Beaudrot of Electoral Math

TNR's Noam Schieber writes, "the margin of error for the likely-voter portion of the poll, where Obama leads Clinton by 4 points, is plus or minus 7--i.e., Obama's lead is statistically meaningless." Not true! A 7% margin of error means that there is a 95% confidence that a candidate's true level of support is within 7% of the reported value. In practice, if the pollster's screen of likely caucus-goers is accurate, it means that among likely caucus-goers there's a 67% chance that Obama is ahead. Good odds, but not a mortal lock.

Kevin Drum produced this table that can help you read the polls.

September 30, 2007 | Permalink


67% chance is good odds?

So what would 50% chance that Obama is ahead be?

Posted by: Meh | Sep 30, 2007 7:22:01 AM

Caucus polls are worthless. As nice as it would be to see Obama win Iowa (he's not Hillary, and he's more likely to win NH after Iowa than Edwards is), I am very skeptical of this poll. It's likely voter screen seems to allow through a large amount of people who never voted in a caucus before. I doubt we will see that kind of increase in turnout.

Posted by: soullite | Sep 30, 2007 8:33:01 AM

That's assuming their model correctly identifies "likely caucus goers," if their model is incorrect the margin of error is meaningless. That's why Pollster.com's campaign for polling transparency is important, but, the statistical point is worth remembering.

Posted by: AJ | Sep 30, 2007 10:18:05 AM

I'd love to finish reading this post, but Pikop Andropoff is here with my ride!

Posted by: Daniel Munz | Sep 30, 2007 10:32:00 AM

Ohmigod. The MOE wars are back.

MOE has nothing to do with the design or content of your poll. The crappiest push poll and the most careful designed neutral poll with rotated responses will have the same MOE if they have a truly random sample of your intended population with an identical sample size. It is not a quality measure, it's math.

I took a Stat for Non-Majors class twenty five years ago and the first thing they had us do was sit down and figure out the proof for MOE. Because the concept is so inherently implausible. There is no way that you can take 1000 randomly selected people and end up with a result that is +/- 3.5% of what the total population would be 95% of the time when answering the same questions in the same time frame. Except that you can.

Everytime I see the words 'statistical dead heat' where the actual poll shows a result at the just inside the margin of the MOE my hands start twitching (start putting things with sharp edges or which go boom where Bruce can't reach them). Because there is a probability distribution within the MOE. If I have a one point lead with a 3.5% MOE I am pretty nervous. If I have a three point lead? Not so much.

Prof Pollkatz has a numerically based lesson on this. Its titled Rudimentary Statistics, you just need to scroll past the links list on his main page.
(Don't forget to check out 'Bush Approval' and 'Flush Bush' on your way in or out)

Posted by: Bruce Webb | Sep 30, 2007 10:57:34 AM

an MOE, at least this is what I've heard, of greater than 5 percent makes the poll meaningless.

Posted by: akaison | Sep 30, 2007 11:12:26 AM

Please show your work. Note that the poll is of about 200 likely voters, of whom 24% picked/leaned Clinton and 28% picked/leaned. You're trying to compare about 48 and 56 from a sample of 104 votes subject to several selection filters and rounding error in reporting.

Posted by: rilkefan | Sep 30, 2007 1:14:26 PM

One question that has always bothered me is how do professional pollsters know, or make sure, that their actual sample is a reasonable proxy for a true random sample?

If they stratify the sample, making sure that, say, the proportion of whites, women, Democrats, suburbanites, professionals, unemployed, or any other type category, which is predictive of political sentiment, is within some bound, then they are, in effect, reducing the variance in the actual sample from the variance expected in a true random sample.

For example, say, the pollster checks the race of respondents, and in a particular sample, there are only 5% African-American respondents, instead of the expected 12%, and the pollster adjusts by deliberately oversampling for African-Americans to add African-Americans to the actual sample to bring the proportion up nearer to the expected 12%. His actual sample may no longer be a true random sample (if it ever was) but, since race is a good predictor of voting sentiment, he may actually be substantially reducing his sample error on responses to questions, like "which Party's candidate are you going to vote for?"

I don't see how, as a practical matter, you can ever get a true random sample, or know, exactly, how your actual sample relates to a true random sample. But, some of the ways, you might check and adjust are likely to improve the accuracy of your polling, in the sense of reducing error bounds. Is that not so?

Posted by: Bruce Wilder | Sep 30, 2007 6:25:24 PM

Good post, Nicholas. Correct treatment of the issue, including both the MOE point and that the pollster's screen is crucial.


"(Obama is) more likely to win NH after Iowa than Edwards is"

If you delve into the internals of NH polling, you'll find that Edwards, Obama, and Clinton all have the kind of favorability numbers to leverage an Iowa success into winning New Hampshire.

If Edwards manages to win Iowa, I think we'll find him quite likely to win New Hampshire as well.

The second week of January is going to be fun.

Posted by: Petey | Sep 30, 2007 6:56:09 PM

I don't see how, as a practical matter, you can ever get a true random sample, or know, exactly, how your actual sample relates to a true random sample. But, some of the ways, you might check and adjust are likely to improve the accuracy of your polling, in the sense of reducing error bounds. Is that not so?

This is a pretty huge part of how professional pollsters spend their days. A pretty important part of the field of statistics, also, is figuring out how to get acceptably high levels of accuracy; you're never going to get a perfect microcosm of the population (or, you could, theoretically, but that would be hugely improbable), but you can (for example) be 95% certain that the actual findings fall within a certain margin centered around your own findings. 95% is usually considered acceptable (depending on the size of the margin of error) for polls, sociological studies, etc.; for medical studies the accetable percentage will be much higher.

There's also a lot of mathemagical (not a typo) stuff happening in terms of correcting for overrepresentation, underrepresentation, low response rates, and all that good stuff, but that is not covered in intro statistics. heh.

(I did really poorly, incidentally, in intro statistics--poorly enough that I have to take a make-up final in like two weeks--so if someone wants to a) correct anything I said or b) point me to a handy way to learn statistics fast, please feel free).

Posted by: Isabel | Sep 30, 2007 9:56:53 PM

The comments to this entry are closed.