How to Get Value from Your Verbatims with Autocoding

Market researchers all know that different types of questions give different types of information.

Scale and choice-based questions are great at quantifying how people feel when they rate stimuli, and can be compared normatively, but open-ended questions help us understand why stimuli are rated this way. In an ideal world, to fully understand our consumer, we would ask both open and closed questions.

But there is a problem: Open-ended questions generate responses in the form of free text — unstructured data — and this type of data demand arduous, expensive processing, which often gives inconsistent or inaccurate results.

The Value of Open-Ended Questions

One of the main reasons to include open-ended questions is to reduce the bias that is inevitably introduced when configuring or designing a questionnaire — and this bias is most prominent in lists of response options for structured questions. Open-ended questions open up the research to the full diversity of views held by the wide range of consumers taking part.

Open-ended questions make research more accurate and enable researchers to pick up on critical findings they may not have expected.

For example, qualitative insights can warn researchers of unintended consequences; a respondent might say of an ad,

“I really liked the music, but thought the joke was offensive.”

Finding this out at the testing stage could help us avoid serious consequences, such as negative press coverage, but this feedback would have been lacking without an open-ended question.

Open questions encourage more honest, thoughtful answers from consumers, and they make for a better survey experience. Respondents feel that what they think is important, and know that the questionnaire is about more than just validating an existing opinion; the researchers — and the brand — really want to explore, discover, and learn.

For researchers who conduct multiple tests of varying instances of content, such as ad testing or concept testing, open-ended questions are versatile — you don’t have to change the rest of the question when, for example, testing ads with different key messages.

Instead, simply substitute the key message but keep the question metric the same. This is convenient and, more importantly, enables comparisons across different ads or concept, which is invaluable for conducting meta-analysis.

Overall, research generated with open-ended questions can go deeper into the topic than would be possible with structured questions alone, creating opportunities to find true, innovation-sparking insights. 

The Cost of Coding Verbatims

The value in open-ended questioning is clear, but the costs of processing and deriving insights from the responses are high.

In order to draw conclusions quantitatively, these answers must all be coded. Historically, coding of open-ended verbatim survey responses has been done manually and is a long, slow and methodical process.

Coding involves a researcher or insights manager reading through all comments and developing a code frame — a list of appropriate codes (topics) for each. Part of the challenge is in deciding which answers to group together, and which merit a separate code. This is a judgement call and different researchers make different choices, making it hard to compare between projects.

The margin for error here is large — misreading the data, human biases, or simple mistakes — and it’s easy for the researcher to find themselves with misrepresented data, and potentially an incorrect insight.   

“Autocoding allows us to listen to customer’s unbiased thoughts. It lets us get to the truth of what we’re testing without needing hours of manual coding that was previously required.”

Kevin Evans, Senior Manager, Pepsico Global Insights

Building a Useful Auto-Coding Model

Over the last 18 months, Zappi has explored and developed autocoding functionality that uses supervised deep learning neural networks to classify verbatim into a set list of topics, at a click of a button.

Neural networks are a computing system that’s loosely modeled after how the human brain works. Although the theory of neural networks has been known since 1943, relatively recent improvements in computing capabilities have enabled its application within market research.

Zappi’s “supervised” neural networks use training data — a large dataset of verbatim responses that have been manually coded against topics. The high volume of training data used means that the algorithm overlooks errors in human coding when making new classifications; this minimises the potential for manually misclassified verbatim to change how the model classifies unseen verbatim.

The neural network was “taught” with the same type of information that is important for humans to know if they are coding verbatim manually, such as the category or content of the stimuli. 

Automatically classifying text is a difficult problem because of the intricacies of language and the differences in how people speak depending on their context.

For example, “what do you like?”, “what do you dislike?”, and “what would you improve?” elicit mentions of different types of topics.

There is a trade off between breadth of application of an autocoding model, and the ability to accurately create meaningful codes. In order to simplify this problem, Zappi has focused on coding a specific set of the most useful questions within a survey context to produce models that give high-quality and relevant verbatim classifications across categories.

This is where expertise comes in.

As an industry, we are entering a second phase of market research automation, moving from automating simple rules and processes to automating expertise. Only by using knowledge of how respondents answer certain market research questions can we choose which questions are the best candidates for autocoding — those that will produce accurate and valuable results to our stakeholders and develop meaningful code frames.

The combination of expertly curated code frames and cutting edge technology illustrates how market research automation will change the role of the researcher, rather than eliminating the need for research professionals, as is sometimes feared.

The Value of Auto-Coding Versus Manual Coding

Coded verbatim is not a novel idea, it’s historically been a core method of interpreting open-end data. However, machine-based coding leverages deep learning neural networks that provides a method of coding that is quicker, less prone to human error, and allows for consistent coding across tests. 

For instance, it’s possible to consistently measure the proportion of respondents who liked an ad or concept because they found it easy to understand. This lends itself to a new form of “quantified qualitative” research. It is qualitative because it uses unprompted responses, and quantitative because it can be aggregated consistently and quickly across tests. This gives more power to open-ended responses as it enables normative and meta-analysis across studies.

For example, a food brand can identify common themes of suggested improvement across all of their menu ideas, such as the desire to use natural ingredients, reduce meat content, or use sustainable packaging. Tracking this over time, with the ability to focus on specific demographics, will inform the next idea, and allow you to target the audiences with which the idea resonates the most.

Auto-coding prevents the human error that blights manual coding, and offers consistent, comparable results.

There is an old engineering adage that you can’t have faster, cheaper and better.

But auto-coding proves that you can. 

Leave a Comment

Scroll to Top