Our Audience Feedback Survey Was Overrun By Bots. Here Are 5 Lessons We Learned.

3d rendering humanoid robots working with headset and notebook — Humanoid robots working with laptops and headsets. Credit: Shutterstock

SciFri Findings is a series that explores how we understand the impact of science journalism, media and programming on our audiences. Sign up for our newsletter to get the latest reports!

Getting feedback from those who we serve has always been an integral part of Science Friday’s objective in making science more accessible. Audience research is no different— I am always curious to know what value does the work we do provide our audiences? Where can we make things better? How can we have deeper engagement and impact? In June 2023 we launched an audience survey across multiple platforms (radio, social media, newsletters, donors, etc). This audience survey was informed by in depth interviews on radio programming conducted in Fall 2022. I excitedly waited as the survey went into the world, hoping 200-300 people would care enough to complete it.

The joy and the surprise as the numbers started trickling and then monsooning in— 1, then 100, 500, 2500, all the way up to over 6800! Our Director of Audience, Ariel Zych, and I started playing with the data–we wanted to use ChatGPT to theme and summarize some of the qualitative data. We copied some open responses into Chat and started noticing something odd here and there in the data we were pasting over.

There, in our own raw data, were a damning number of clearly AI-generated responses, shamelessly self-disclosing with responses to open preference questions like “As an AI language model, I do not have personal preferences…” Scrolling more through the data it became clear…AI bots had struck our study, HARD! We looked at each other and couldn’t help but laugh at the irony. Here we share some lessons learned (and some we had forgotten) after we wiped away our tears and started cleanup.

Tips for Your Next Online Survey

Use survey software that has a CAPTCHA: “Completely Automated Public Turing test to tell Computers and Humans Apart ” or CAPTCHAs are questions we have all likely seen. These programs differentiate humans from bot respondents. Many online software companies provide CAPTCHA options for surveys but only for paid subscriptions.
No CAPTCHA? Trap ‘em: You may not have the budget for licensing survey tools with a CAPTCHA feature. Trap questions are an alternative to a CAPTCHA that can help provide some coverage against bots. They are used to identify respondents who are not paying attention to survey questions (e.g. someone choosing “Strongly Agree” for all questions). A trap question can take many forms, including a question to identify an object in an embedded picture, a prompt to type specific words into a text box, etc. Once the data is collected, you can filter out any respondents with incorrect answers. Trap questions not only protect against bots, but also bad actors such as trolls with an agenda, or those who don’t actually know about the product or program but who want to receive the cash incentive. By providing a small number of trap questions, you can ensure your target audiences are the ones providing you with good data, and eliminate the rest.
Trap question used as part of Science Friday radio programming survey. Credit: Nahima Ahmed/Science Friday

We incorporated a trap question during the design phase of our audience survey. Participants were asked to identify Science Friday’s host, Ira Flatow. Answer choices included only other male science journalists and communicators so that all options could be viable options and limit the number of bots/bad actors in the data. We used this type of trap question because we wanted to survey existing audiences who should know the host, not new audiences. This one step eliminated almost 20% (N=1357) of our sample!

More money, more bots, more problems: Cash is king in the survey world. Participants are often rewarded with cash or gift cards for each completed survey. Even the chance for a lottery incentive has shown to increase response rates for online research. We chose to provide a $50 e-gift card lottery incentive to balance the length of the survey and motivate more audience members to complete it. Money is great, but with more money comes higher incentives for bot creators, bad actors, and trolls to participate for cash alone. We quickly realized $50 was a lot to offer for a ~12-13 min survey. It made me think: How can I make sure to value the participants time while still making sure I get the information I need? Next time, we will consider lowering the threshold of our cash incentive. Perhaps it could have been limited to $25 instead? If this didn’t yield enough participants, maybe a second recruitment waive would be in order? In the future, particularly for audience surveys, we might consider offering other things of value, such as merchandise or free event tickets instead. Non-cash offers might reduce the number of people interested in just being paid for survey completion. It can also provide value by giving participants tangible materials and/or deeper engagement with your organization.
Segment audiences: Whenever feasible, use different utm or referral links for different recruitment pathways for your surveys. We used different links for each platform (i.e. Twitter, Newsletters, Donors) to understand where traffic was coming from, look for differences in the preferences between audiences, and to capture the possible universe size for our sample. We had more than half our respondents come from Facebook, which is disproportionately higher than we usually see for surveys. Generally, we find our radio audiences to be the largest referrals so seeing so many come from Facebook was a red flag. Additionally, segmenting audiences can identify any strange patterns in the data. For example, if you have previously surveyed audiences, you may already have demographic data to check against new data. If you know your organization primarily serves older adults, and see that your survey consists of only young participants the data may be compromised. Consider whether it could be the topic of the survey, recruitment, or if this anomaly is a potential bot.

Cleaning Up The Data

After a few laughs and tears, I had the task of figuring out how exactly to clean up the jumbled mess of data we had. With a filtered dataset (thanks trap question!), I started cleaning the data using

Impossible timestamps: Responses submitted within the same second of each other were removed. Many of the most suspicious responses were submitted with nearly the same time stamp late at night (12-3 am) or early morning (4-7 am) which are unlikely times for our US-based audiences to complete surveys.
Obvious AI language: I had a number of open-ended questions for the survey. Any responses that had very obvious language (“As an AI language model, I do not have personal preferences…”) were removed.
Non human sounding responses: Some of our open-ended questions included asking why participants preferred certain broadcast formats. We eliminated any responses that didn’t sound authentic to an audience voice. For example, “Live call can increase the audience’s sense of participation and loyalty…” It is doubtful that an audience member would be discussing loyalty.
Human-sounding, but identical, open responses: There were some responses that repeated often. This includes phrases like “It can create memorable moments for both the host and the audience” and “Maintained the authenticity of the program”. It was highly unlikely that multiple individual respondents used the exact same phrasing.

Designing audience centered content is an inherently inclusive process. Audience surveys are an opportunity to listen to the needs and concerns of our audiences. Surveys are just one tool we use to help gather audience feedback at Science Friday. When all the cleaning was said and done, we were still left with 1200+ survey participants in our sample! This was significantly higher than the 200-300 we initially anticipated. As online research continues to grow, so does the potential for AI bots. I am appreciative of having discovered new ways to improve my practice even if it cost me hours of work and some new gray hairs.

collage of three photos. from left to right, a child stands up to a mic to ask a question, a man stands on stage speaking to a full theater, a young girl stands in the middle of a crowded theater speaking in a mic as a spotlight shines down on her — Your voices have shaped our show. From left to right, a young audience member asks a question at SciFri Live in San Francisco, Ira stands on stage in Salt Lake City, another young listener asks a question at SciFri Live in San Antonio. Credit: Alexander Lim/Benjamin Altenes/Cindy Kelleher/Science Friday

Meet the Writer

About Nahima Ahmed

@EmaculateGirl

Nahima Ahmed was Science Friday’s Manager of Impact Strategy. She is a researcher who loves to cook curry, discuss identity, and helped the team understand how stories can shape audiences’ access to and interest in science.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description

Tips for Your Next Online Survey

Cleaning Up The Data

Meet the Writer

About Nahima Ahmed

Explore More

What Do Two Anesthesiologists Do For The Fears Of A General Audience?

What’s That Smell? An AI Nose Knows