Leadership Series

AI-Powered, Qualitative Research-Driven
Customer Experience Insights


Balancing AI Speed with Strategic CX Value



  • AI-generated insights risk missing the ‘real meaning’ behind customer stories and their needs, leading to misaligned opportunities.

  • Without human-in-the-loop in AI analysis, organisations risk visibility of surface-level issues only, missing the deeper drivers of innovation.

  • AI-generated insights oversimplify complexity; organisations risk acting on AI themes without understanding root causes or strategic context.

  • AI models lack the native capacity to triangulate VoC, ops and financial data with qualitative insights. CX insights can be perceived as unreliable.

  • Organisations risk receiving high-level AI outputs that lack rigour and commercial relevance.

Using AI-Synthesis for Qualitative CX Insights



AI-tech promises instant customer insight — CX intelligence for faster decision-making, improved customer experiences and unlocking innovation opportunities. We tested these claims and found results that contradict the hype.

Read on to see the failure modes that quietly erode insight quality, the real business risks and practical controls that let leaders capture the upside without losing insight rigour.

So we put AI to the test, running an experiment using qualitative research outputs; the AI-insights were generated very quickly; however, the quality was questionable.

For the most part, the insights lacked depth, contextual nuance, missed innovation opportunities (for example, latent needs), and did not provide the strategic framing required to drive robust decisions.

As a customer strategist, I’ve spent the past two years experimenting with AI-generated CX insights, utilising both specialist tech platforms and general-purpose LLMs (large language models).

Interested in investigating further,  I joined 30 fellow researchers for a two-day AI for Researchers* workshop led by Llewyn Paine, Ph.D.

The workshop program facilitated an experiment on AI‑assisted qualitative research synthesis on user interview transcripts.

Our goal, to test GenAI’s inaccuracies and hallucinations in thematic synthesis and theme output using a representative LLM.

Experiment Protocol:


• Custom-built tool with LLM access via an API (200,000 tokens)
• Model parameters were recorded and fixed (model name/version, temperature, top_p, max_tokens)
• 30 researchers under took the experiment simultaneously using the same prompt, transcripts and custom built LLM tool
• Interview transcripts were de-identified
• Best practice guided prompt design
• We began with no prior conversation history and removed long-term custom instruction
• Researchers compared each run’s themes and supporting quotes, scoring for:
• Accuracy – Did the quotes truly belong to the theme?
• Coverage – Were all relevant themes captured?
• Originality – Did new insights appear or just re shuffled wording?


Findings: AI-Generated Synthesis


1. Inconsistency

Discrepancies were found in the quantity of the AI-generated insights across the cohort of 30 researchers; some researchers had five or 6 themes, others had higher numbers.

2. Rigour

AI outputs frequently lacked aggregated supporting evidence. The model commonly used a single comment to support an insight; this can overstate prevalence and hide nuance. Multiple supporting verbatim comments from different participants are needed to demonstrate consistent evidence.

3. Inaccuracy

The research cohort manually validated AI-generated insights to ensure they aligned with verbatim comments; however, the researchers identified inaccuracies such as unsupported themes (maybe plausible but lacking in documented evidence in the transcripts) and misattributed quotes; LLM supplied a verbatim as supporting evidence, but the verbatim supports a different theme, is not contextually relevant, or is from the same respondent so misrepresents prevalence.

4. Risk (position-bias)

We observed position-bias in the AI-generated insights, where supporting verbatim were weighted toward the first ~25% of the transcripts. This could reflect position-bias (the model favouring earlier content).

A special thank you to Dr Llewyn Paine, Principal of Innovation Strategy at Llewyn Paine Consulting, for peer reviewing this paper, advisory and support.

See Rosenfeld Media for Llewyn’s AI for UX Researchers: 2-day virtual workshop.


Next Steps
Why This Matters For Leaders Now


At present, using AI-generated insights requires a proceed with caution and care approach.

Speed and rigour are the holy grail of insight work — but only if we treat models as junior analysts that require clear instructions, chunked workflows and close supervision.

Yes, AI can dramatically accelerate insight generation, but without human oversight, what do we risk losing?

To capture efficiency gains while reducing risk, AI must be combined with human oversight, rigorous prompt design, chunked processing and transparent governance so leaders can capitalise on improving customer experiences and unlocking innovation and growth opportunities.

Discover How Our Qualitative Research-Driven
Customer Journey Intelligence 8-Week Sprint
Can Support your Strategic Goals.