Twitter Polling and Sample Bias: A Case Study

Grade Level

9-12

minutes

15 min - 1 hr

subject

Mathematics

Activity Type:

data analysis, sample bias, Case study, polling

Conducting a survey or a poll seems straightforward enough at first pass. Say, for example, you want to know the percentage of people in a population who like peanut butter and jelly sandwiches. You could just ask each member of the population a very simple question, like “Do you like peanut butter and jelly sandwiches? Answer yes or no,” and then record their answers and calculate the percentage that is PB&J-loving. Or, if the population is huge (the entire state of California, for example, or all members of the international league of sandwich eaters), you could ask a manageable number (a sample) of individuals within the population the same question and assume that the percentage of people in your sample who like PB&J is roughly the same as the proportion in the entire population.

Science Friday’s Tongue-Curling Twitter Poll

Inspired by our Science Club challenge to #TakeASample, we asked Science Friday’s Twitter audience of roughly 640,000 (our “population”) whether or not they could curl their tongue, having heard that about two in five people can’t curl their tongues. Just a small fraction of our followers responded, giving us a “sample size” of approximately 1,619. Here are the results:

Let’s #TakeASample right now!

How many of you can curl your tongue?

— Science Friday (@scifri) April 21, 2016

Before getting carried away and rashly touting to the world that 81% of Science Friday’s Twitter followers (640,000 humans x 0.81 = 491,000 humans!) can curl their tongue, we thought it prudent to ask a few data scientists who specialize in polling and social media data if there was anything we overlooked. Is there a hidden influence or a flaw in our sample design that could bias our results? As it turns out, like any kind of survey or poll, polling in the Twittersphere is just not that simple, and most experts agreed that our Twitter sample was probably marred by bias and therefore unrepresentative of our total Twitter follower population.

In no particular order, here’s a handful of reasons why our Twitter poll doesn’t quite pass muster:

People didn’t know what, exactly, we were talking about.

Curl your tongue? How? Side-to-side like a taco or front-to-back like a wave? We didn’t offer a picture or description of what we meant by “curl your tongue,” so it’s possible that some people didn’t respond because they didn’t know what we meant, or they responded incorrectly because they misunderstood what counted as tongue-curling and what didn’t. Some of the replies to our tweet suggest that our question could have been clearer:

@scifri photo example please?

— El Boxeador (@returntovendor) April 21, 2016

@scifri Does this count? pic.twitter.com/j3jkmDbfKf

— Brendan A. Niemira (@Niemira) April 21, 2016

@scifri please specify to which axis your query is in reference

— Hal (@HalSherman) April 21, 2016

People didn’t respond because they were embarrassed or didn’t care.

Twitter followers who can’t curl their tongue may be less likely to respond to a poll question about tongue curling than people who can curl their tongue because they’re a bit embarrassed. Similarly, people who can’t curl their tongue might be less likely to respond because they just don’t care as much about tongue curling. Both scenarios would sway our results considerably pro-curler.

People chose the answer that made them look good.

People sometimes answer polls dishonestly, especially if the results are public. If for some reason the Twitter followers sampled in this poll thought they would appear cooler or more interesting if they could curl their tongue, they might have lied and claimed that they could.

People just clicked the first answer.

When folks are pressed for time or don’t feel like reading all the choices in a survey, they are more likely to select the first answer provided. With two answers possible, this shouldn’t be an issue, but we can’t rule it out because the two answers provided were presented in the same order to all poll respondents.

People couldn’t recall or didn’t feel like finding out the real answer.

It’s possible that respondents who answered this question in a public place couldn’t recall whether or not they’re able to curl their tongue or didn’t want to try it in public or seek a mirror to verify. Survey questions that draw upon memories of past events or that require respondents to perform a physical action (like getting up to test the batteries in a smoke alarm, for example) are less likely to be answered honestly, if at all.

Just by asking the question, we changed people’s answer.

Tongue curling is no longer believed to be a fully genetically-determined trait as it once was at the start of the 20th century, largely because people can teach themselves how to do it with practice. It’s possible that just by asking this question in a Twitter poll, we inspired our followers to give it the old college try, and “non-curlers” converted into “curlers.” In political polling, this can happen when poll creators describe in detail a ballot measure before asking respondents whether they will support a measure they may know little about. In this way, pollsters may change respondent opinions just by educating them about ballot measures at the start of the questioning process.

People chose the answer that they thought most people would choose.

Though this is more commonly a source of bias with survey questions that are perceived as having a “right” or a “wrong” answer, it’s possible that our respondents chose the answer they thought most people would choose in order to fit in.

People answered multiple times to change the outcome of our poll.

It’s common enough for one person to have multiple Twitter accounts, so it’s possible that someone wanted to try to sway the results of our survey by responding multiple times. Larger sample sizes help to dilute this effect, but without extraordinary effort, we cannot rule out this source of bias.

When, how, and whom we polled may be fundamentally biased.

People who respond to Twitter polls conducted by Science Friday after 4 p.m. on a Thursday might not represent a typical population of people or even a typical sampling of Twitter users. As a final experiment, we conducted a new Twitter poll at roughly the same time and day of the week as our tongue-curling poll, but this time we looked for an indication that @scifri followers are a biased sampling of the general Twitter audience. We asked our followers whether or not they follow pop icon Katy Perry (@katyperry, 88.2M followers) and/or astrophysicist and science communicator Neil deGrasse Tyson (@neiltyson 5.14M followers). Here’s what we found:

Let’s #TakeASample of all of you!

Do you follow @katyperry?

— Science Friday (@scifri) May 5, 2016

Do you follow @neiltyson? #TakeASample

— Science Friday (@scifri) May 5, 2016

If poll respondents from @scifri’s Twitter following are an unbiased sample of Twitter users, we would expect that respondents to our poll would follow Katy Perry and Neil deGrasse Tyson at a ratio of about 17 to 1, because there are about 17 times as many followers of Katy Perry as there are of Neil deGrasse Tyson. In other words, if our polled population were an unbiased sample, we would expect more of our followers to follow Katy Perry, since more Twitter users on the whole follow her than follow Neil deGrasse Tyson. Instead, we found the opposite. What do these poll results suggest about our tongue-curling sample, or about Science Friday’s Twitter audience in general? Tweet your ideas to @scifri with the hashtag #TakeASample!

Special thanks to Mark Dredze, assistant research professor of Computer Science at Johns Hopkins University and Cliff Lampe assistant professor in the School of Information at the University of Michigan for assistance with this article.

Meet the Writer

About Ariel Zych

@arieloquent

Ariel Zych was Science Friday’s director of audience. She is a former teacher and scientist who spends her free time making food, watching arthropods, and being outside.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description