Sorting Out the Numbers of Political Polls

Subscribe to Science Friday

As the presidential election race intensifies, so do predictions from political polls. Polls employ a range of methodologies to forecast the favored candidate, including random sampling, weighting for political party, and aggregating data from other polls. Scott Keeter from the Pew Research Center offers tips on deciphering the stats, and discusses how big data and social media might be used to gauge the sentiments of voters in the future.

Segment Guests

Scott Keeter

Scott Keeter is Senior Survey Advisor at the Pew Research Center in Washington, D.C.

Segment Transcript

IRA FLATOW: I’m Ira Flatow. This is Science Friday from PRI, Public Radio International. This is Science Friday.. I’m Ira Flatow. The election season is mercifully drawing to a close. We can all breathe a sigh of relief. This means that the pundits and the pollsters are ramping up their presidential predictions.

538.com is forecasting an 86% to 13% split for Secretary Clinton. The New York Times gives Hillary a 92% chance of winning. Fox News is putting Donald Trump up by one point. What does any of this tell us? Can we pull out any useful information from these stats? And how can we make polling more predictive in the future?

My next guest is here to walk us through all of this. Scott Keeter is a Senior Survey Advisor at the Pew Research Center in Washington, DC. Welcome to Science Friday..

SCOTT KEETER: Thanks for having me.

IRA FLATOW: Now, Pew does a lot of polling of people. How does the data for these presidential polls differ from any of the other public opinion surveys?

SCOTT KEETER: It’s really very similar. You start with a sample of the general public, and then you hone down the sample until you find people who are registered to vote and hopefully likely to vote. And that’s where you make your election estimates. But you’re talking to the same people that you would talk to for any other kind of survey, if you were asking them about their health status or economic condition or opinion on issues.

IRA FLATOW: Mm-hmm. There are lots of poll aggregators out there, basically polls of polls. Do these tend to be more accurate than any individual poll?

SCOTT KEETER: Well, they’re only as good as the input. And so to the extent that polling in general is accurate, then the aggregators are going to be accurate. The real advantage to the aggregators is if they work on the assumption that any individual poll can have various kinds of random errors that cause them not to be perfectly accurate. And by combining polls, you sort of offset those random errors.

Just in those basic terms, if you’re stacking 10 1,000-person polls on top of each other, you end up with 10,000 people in your sample. And by the laws of probability, you should have a more precise estimate of whatever it is that you’re trying to measure. And of course, that is going to work as long as what you’re putting in there is not garbage.

The fact is that polling is still pretty accurate, especially in presidential elections in this country. And aggregations of polling tend to be very, very accurate. So I do trust them. I look at them myself pretty much every day, like everybody else right now.

IRA FLATOW: How do you how do you explain the occasional outlier? For example, there was an LA Times poll that put Donald Trump way ahead that broke from what other polls were saying? It was an outlier. How does that happen?

SCOTT KEETER: We don’t know whether in the end the LA Times poll is going to turn out to be more accurate than all of the others or is actually going to be an outlier. But we know why it’s coming in the way it is. And it’s largely a result of the weighting decision that they make.

All polls weight the data to try to make them match up with what’s known to be true about the population. But they make one particular adjustment that no other pollster that I’m aware of does, which is to try to weight the data to correspond with the actual margin of the 2012 presidential election. And for a variety of reasons, that seems to make the poll more Republican in its sample compositions than all of the other polls that don’t include that adjustment.

I don’t know if that’s the right thing. I don’t think it’s the right thing. But that is a very respectable poll. And it has very good people working for it, and so we’ll just have to see whether that turns out to be the aberration or prescient.

IRA FLATOW: How do we know that people being polled are telling us the truth about what they’re doing or telling us how they’re really voting one way or the other? And I’m thinking also something called the “hidden Trump effect” or the “shy Trump effect.”

SCOTT KEETER: It’s long been a worry in the polling profession that people are hiding their opinions. And the fact that the track record in American presidential elections, in particular, has been so accurate, and that it’s not where the errors have occurred, they have tended not to be consistently in a conservative or a liberal direction over the years, gives us a lot of assurance that people are not shading the truth, or at least not systematically doing so.

Of course, this was a big controversy in the 1980s and 1990s when something similar seemed to be happening with African-American candidates, where the polls would overestimate how well they were going to do. I was polling in Virginia in 1989 when Douglas Wilder ran for governor there, and the polls consistently overestimated how well he was going to do. He won, but only by a fraction of a point.

But then Barack Obama came along and pollsters were really worried about it, but the polling in 2008 and in 2012 was remarkably accurate. So I don’t put a lot of stock in the idea that people who support Donald Trump are afraid to admit it.

IRA FLATOW: Do you think that the kind of voter is not fearful of saying, I’m a supporter?

SCOTT KEETER: Well, anecdotally I certainly see that. It’s possible that there are some people, given how much controversy there is about his candidacy and the fact that many people in his party, at least at the elite level, have rejected him that you could imagine some people not owning up to it. But we just don’t have much evidence that that’s happened in other cases.

So at this point– and I guess we could also point to the primaries. The Republican primary polling was not bad. Primary polling is difficult, period, just because of the nature of the electorate. But there were no serious biases of underestimating Trump’s support. The underestimation of Trump’s support in the primaries came from pretty much the pundits.

IRA FLATOW: So the polling was accurate then?

SCOTT KEETER: Yes. Yes, there’s no sign at all– he led from the very beginning. It was very clear that he had a solid plurality within the Republican electorate. And a lot of us, including myself, in the beginning just didn’t think that in the end, as people got to know more about him, that he would stay atop. And he did.

So this is a situation in which I’m not going to put myself in the class with Nate Silver. But Nathan silver didn’t believe it. I didn’t believe it. And a lot of people didn’t believe it. But eventually, you know the polling was proven right.

IRA FLATOW: Our number (844)724-8255, talking about polling. Is there a time when history shows that we have reached the spot, where the polls are not going to change certain number of days or weeks out before the election?

SCOTT KEETER: I’m a political scientist by training. And all of us in the profession who study elections tend to think that the fundamentals that drive American politics tend to lock people into their choices quite early. It’s not to say that there aren’t some people who can be persuaded towards the end. And this election, more than others perhaps, has a lot of that because both of the major party candidates are not very well-liked.

We see unusually large percentages of people picking third-party candidates, for example. But generally speaking, when you get within two, three weeks of an election, you don’t see very much movement, particularly in the absence of major events. We’ve just had the third of the three debates. It’s hard to imagine what could happen at this point that would radically change the thinking of the public at this point.

IRA FLATOW: Early voting has started in many of the states. I was watching some long lines already in North Carolina. And one of the analysts was saying that by election day, 75% may have been voted already. Does that mean we would know answer to perhaps some of these states by exit polls before the balloting begins that day?

SCOTT KEETER: The pre-election polls are picking up the fact that there’s a lot of people voting early. But it won’t be 75%. It will be less than 50%, somewhere between 25% and 40% perhaps. This is a trend. It’s growing.

The people who are trying to analyze this right now are doing it by looking at some big data sources, such as the voter files, which are national–

IRA FLATOW: Did we lose him?

SCOTT KEETER: I’m still here.

IRA FLATOW: OK, go ahead. I just–

SCOTT KEETER: The folks are looking at early voting are looking at the national voter files, these publicly available records of whether people are registered to vote and whether they’ve voted. And they’re telling us that in some states the Democrats are doing better than they did in 2012. So we may get a clearer picture of how the election is going on the basis of that kind of information, but it’s still pretty sketchy.

IRA FLATOW: There’s a lot of data out there on people– Twitter and Facebook, social media. Can you use that data to get a better, clearer idea who the voters are and what they’re thinking, than just by plain polling?

SCOTT KEETER: There’s a lot of brain power being put into answering that question. But I think the answer is not clear yet. We certainly know that the big data, Twitter, Facebook, other sources, so-called organic data that doesn’t come just from questioning people directly, gives us a way to look at a lot of phenomena that polling doesn’t do well.

For example, we can study networks among people much better with something like Twitter or Facebook than we can with polling because we may have the entire corpus of the data to work with. And so that tells us a lot about how networks work. We also can measure things that people don’t report very accurately, which might even include things like their media consumption.

People remember seeing things or they don’t remember seeing things that they did. But big data in the form of what apps people are looking at or what things they’re viewing in their Facebook feeds may give us better records of that.

IRA FLATOW: Scott, thank you for taking time to be with us. I learned a lot today. And we’ll keep an eye on the polls and on what you guys are doing over at Pew. Scott Keeter is a Senior Survey Advisor at the Pew Research Center in Washington, DC.

Copyright © 2016 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of ScienceFriday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producer

About Alexa Lim

@AlexaLim22

Alexa Lim was a senior producer for Science Friday. Her favorite stories involve space, sound, and strange animal discoveries.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description