Joel Anderson’s Post

View profile for Joel Anderson

Chief Data Science Officer @ Dig Insights

I hear a lot about bots and bad respondents in market research. I'm a big proponent of taking respondent quality seriously because it can impact certain types of studies systematically (e.g., segmentations). We made an automated bad respondent detector that looks for internal consistency in survey responses. The last study we used it on it only detected only 1 completely random respondent (e.g., a bot answering completely randomly). But it still detected 364 bad responses (20% of the data) using a more sophisticated method. We detect based on a combination of low internal consistency in responses and low variation. (All of this is in addition to traditional detection methods like speeding.) For example, someone answering 2s and 3s for everything with no discernable pattern would get caught. This is respondent behaviour that would avoid traditional algorithms for bad data quality. I think either our field partners are catching all the random bots, OR the bots are "smart enough" to answer like lazy humans who use subsets of scales and anchoring. The 20% bad respondents are also likely to be many real humans who aren’t paying much attention. This figure may be conservative, but you have to tweak detectors to be the right balance between false positives (throwing away real respondents) vs. false negatives (not throwing away a bot). But it’s interesting that >99% of what we’re catching is due to behaviour that would avoid less sophisticated methods. Either humans evolved from straightlining to minimal (but random) variation or bots have evolved from fully random variation to minimal (but random) variation. (Our tool uses a bunch of past datasets, both real and synthetically created, extracts features that are markers of internal consistency, and uses a neural network on top of all that to make a final prediction of good/bad respondent. Internal consistency is a relative measure and is derived automatically study to study based on available data. We trained the model to differentiate between high quality real data and randomly created data to match known and expected patterns of bad responses.) Dig Insights, Upsiide, by Dig Insights

Karine Pepin

✨Nobody loves surveys as much as I do ✨ Data Fairy ✨No buzzwords allowed🏆 Quirk's Award & Insight250 Winner

7mo

Oh fun! Is this a sort of Mahalanobis distance method? Bots are not as common as people think. Kantar's estimate is 13% of the fraud. The majority of the fraud comes from offshore centers, aka click farms/humans. In my experience, they're usually not outliers; they blend in. They make the data look 'flat'. They just add noise. Those types of internal consistency checks are super helpful because they are subtle. They don't annoy the participants, and it's much harder to cheat the system. In my experience, though, they work well when the questionnaire is designed in a way that promotes a diversity of answers. If all the statements are worded positively and nearly synonymous, even an honest person may select 4 on everything. Are you finding that your model works better with certain datasets?

Amy Knowles

Senior Vice President at Research Strategy Group Inc.

7mo

Professional respondents are the biggest issue and very little is being done. In fact most panels seem to encourage them!

Saul Dobney

Founder of Cxoice Insight Systems and dobney.com market research

7mo

We've been quality scoring respondents for a long-time as we check every response (complete and screen out) and we know that even good respondents in top quality survey design give poor answers from time to time (eg phone-based B2B studies where we know what an answer 'should be' from purchase records). However, modern panel-based surveys suffer from click-farms, not bots. And click-farms are very difficult to screen out as they give 'reasonable' answers to avoid being spotted and learn the screen-outs. They also complete a survey via multiple devices simultaneously which allows them to slow down to avoid speeder traps. You'll see them hit the survey in blocks of simultaneous attempts (which is why you need to quality score both completes and screen outs). If you want quality surveys, the core place to look is then on panel recruitment and verification. Survey design and quality checks then comes second, because if you know you are reaching out to genuine people, you don't want to lose their good will. Put simply, the first step is buy better sample...

Nishant Kaushal

Scaling a new-age insights engine | Founder & CEO at ADNA Research | Top Voice for MR & Consumer Behaviour

7mo

Thanks for sharing the details of your approach. However, I don't think that labelling the 20% you encountered as "Bad" is apportioning the blame appropriately. Few questions to reflect on: - How good was the questionnaire in terms of engaging the respondents? - How much were these end respondents paid for completing the survey and what exactly was the questionnaire length? Calculate $/hr earn out rate and then ask again - Were they really "bad" or were they the more "normal" ones who saw the real worth of their time? The choices they made when taking that survey are just symptoms of a deeper malaise that many in our industry want to keep locked-up in the closet

Like
Reply
Max Bloom

Head of Insights & Analytics @ KQED

7mo

While I think they oversell the value of RDD, Pew's study on the phenomenon in opt-in panels was pretty great. https://www.pewresearch.org/methods/2020/02/18/assessing-the-risks-to-online-polls-from-bogus-respondents/

LP Porter

Insights | Strategy | Innovation

7mo

Omg. So much detail. Thank you. But I couldn’t get through it.

See more comments

To view or add a comment, sign in

Explore topics