LIVE EXPERIMENT - EDITION 1 of 4

We're building a synthetic user
in public.
This is the build log.

Almost no one is building a synthetic user publicly and telling you what they're finding. So that's what we're doing, across four editions, start to finish.

Tania Clarke
Tania
PMM · Great Question
June 2026~12 min read

Four editions · One live experiment

We're building in public,
start to finish.

YOU ARE HERE
01

Terminology, workflows & hypothesis

The vocabulary, three distinct build approaches from our team, an expert conversation, and the priors I'm taking into the experiment.

02

The build: audit, framework & step-by-step

The data audit, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.

03

The head-to-head

Same study, two panels: a synthetic one and a real one we recruit through Great Question. The results, side by side.

04

The recap & decision tree

When synthetic users earn a spot in your workflow. The full guide, and a Claude skill you can run yourself.

For the past few months, I've been reading every take on synthetic users I can find. NN/G's "If, When, and How" piece. The ACM Interactions article on the people-pleasing problem. Park et al's 86% accuracy paper. The same LinkedIn debate playing out on repeat. A pile of accuracy studies that sometimes contradict each other.

Most of it is people arguing whether synthetic users should exist.

What we're seeing from where we sit

Great Question has a front-row view of this debate. Every day, our customers run extensive research with their own users and panels they recruit and schedule inside our product. And every day, we watch them build AI workflows on top of our MCP that surprise us, including their own synthetic users.

We hold a strong opinion that nothing replaces watching a person fumble through a prototype, or the magic that happens during a conversation.

But those same customers are asking us how to use AI to move faster. A lot of them are sitting on years of interview transcripts, survey data, support tickets, and product usage data. They want to know if any of that can act as a stand-in for an interview when the real one isn't possible.

Our position going in: synthetic personas, synthetic users, and synthetic panels are becoming a core layer of research and product workflows over the next year or less. The synthetic layer makes research cheaper to iterate on and faster to validate, and at its best, surfaces the gaps where new research is still required.

The best ones build on real customer data. The ones you generate from a prompt alone, or pull from a synthetic-user tool that doesn't know your audience, are a statistical impression of a demographic an LLM read about online. They answer like it. For the sake of this series, I am not exploring a ChatGPT or LLM-based synthetic user, as I believe these have no place in any product workflow.

So we decided to experiment with synthetic users and unfold what we find live with you!

First, the vocabulary

Synthetic user. Persona. Panel.
They're not the same thing.

They get used interchangeably across LinkedIn, academic papers, and product docs, but describe completely different things, with completely different methods behind them. Getting crisp on which is which is the first step.

Term 01

Synthetic Persona

An archetype or representation of a group of users grounded in real research evidence. Describes a type of user, not a specific person.

Example: "The senior UX researcher at a B2B SaaS company with 500-5,000 employees."
Term 02

Synthetic User

A specific user with a name, attributes, voice, and behaviour. You either sample one from a persona, or clone a real user as a digital twin.

Example: "Sarah, 38, lead UXR at a 2,000-person SaaS, frustrated by tool sprawl."
Term 03

Synthetic Panel

A group of synthetic users running through the same study together. The synthetic equivalent of a recruited panel.

Example: "10 synthetic users completing the same survey."

Three distinct approaches

Three ways to build, and they're very different.

Between Mark and Jack on our product team, they named three distinct workflows. I'll need to pick one. Or maybe all three.

1

Digital twin.

Take a real user you know well, strip out the personal stuff, store them as a synthetic user doc, and instruct the agent to play that role. Narrow but powerful.

Jack's framing: "Take a real person and complete a study as them." Works well any time you need to replay a specific person's perspective on a product direction.
2

Persona-generated.

Aggregate 8-10+ real users into a synthetic persona document, creating a defensible archetype of a power user vs a casual user, or another segmentation. The aggregation itself protects privacy.

Jack's example: "Generate me a power user and it will make up Bob, who's 53, does this, thinks that." Best when the panel needs to be shareable across a team.
3

Live synthetic retrieval via skill.

The persona doesn't exist until you query it. It polls our MCP live, contextualises whatever it pulls based on the input you feed it (a PRD, a design file, a prototype) and comes back with customer evidence.

The interesting one: no persistent persona. Works in flow with what you're already working on, instead of treating research as a separate exercise. One customer is already running this in production.

Calling in the expert

Five insights from someone who's
read every study.

A week in, I called Caitlin Sullivan, a UX researcher digging into AI workflows for the last few years, writing one of the sharpest newsletters on AI in research, and running her own synthetic-user experiments on pricing and messaging. She's hesitant to be called an expert because no one is really defining this space yet. Which is exactly why I called her.

Caitlin Sullivan

Caitlin Sullivan

UX Researcher · AI workflow specialist
Author, AI Customer Research newsletter
· Running her own synthetic-user experiments on pricing and messaging.

1

Nobody has a settled methodology. Not even the academics.

"If you dig deeply into the methodology, nobody is doing anything the same way. It's all just kind of winging it and putting together their own completely unique methodology for how to go about replicating human behaviours."
Something works in one study and you can't compare it to the next one. Different measurement, different results, every time.
2

The "85% accuracy" headline is misleading.

The famous Stanford/Google paper replicated 1,052 humans. Everyone quotes 85%. Here's what's underneath:

81%
Humans matching their own answers two weeks later
68%
Raw synthetic-user accuracy
85%
68 / 81, normalised against the wobbly baseline
"They were wrong one out of three times. That doesn't sound particularly confidence-inspiring." (Caitlin)
3

Building from an existing pile is harder to measure.

Academic studies design forward: collect interviews, then a "holdout" set, then compare. Working backwards from a pile of existing transcripts breaks the comparison. Different questions from different interviews. Nothing is apples-to-apples.

Her advice: run small experiments, predict what real users will do, then track whether predictions came true. Longitudinal validation, not lab measurement.
4

The lowest-risk use case is the right place to start.

"Using it to test a research study, it's one of the lowest risk use cases. What's the worst that could happen? It's just highlighting weaknesses or things you didn't think about."
Same answer my own team gave me. Stress-test a study or artifact before humans see it.
5

Directional yes. Magnitude no. And logical beats emotional.

Synthetic users can tell you somewhat whether users prefer A or B. They're unreliable on how much. Ask whether you can raise prices: useful directional read. Ask whether to raise by $5 or $10: falls apart.

Better at logical decisions (budget fit, seat licensing, workflow constraints) than emotional ones. The more emotional the call, the less accurate the prediction.

Two of my priors reshaped. Two others confirmed.

An early insight into where synthetic users fit

While Mark and Jack come at building a synthetic user from different angles, they both land on the same recommendation: early directional feedback before you commit human time or budget.

Take any artifact (a survey, a PRD, a design, a concept) and run it past a synthetic panel first. What comes out:

  • Directional answers in minutes, and a clear view of where the synthetic panel and real users would actually diverge.
  • The gaps where the panel can't answer with confidence. These become your next research brief.
If you set up the synthetic user skill to require evidence behind every claim, and the persona can't make a claim because the evidence isn't there, that's actually a really good thing. AI is flagging a research opportunity for you.
Jack · Product Manager, Great Question MCP

Mark independently describes almost the same workflow, from the study-design angle: every study should start with a synthetic round before you go to humans. At worst, it stress-tests the questions and surfaces issues with the study design. At best, the AI panel comes back unanimous on a low-stakes question and you answer it without spending a dollar on recruitment. Most of the time it sits somewhere in the middle: cheaper iteration, and a much sharper study that you put in front of real people.

His framing on the economics: you've spent a little on tokens, but nothing on recruitment fees, incentives, or any of that overhead. It's a good way to raise the bar of quality across every study you run.

My priors going in

Four beliefs I'm taking into the experiment.

These aren't technically hypotheses. They're documented thoughts. The point of the next three editions is to find out which of these survive contact with actual data.

1. Can synthetic users be trusted?
Prior
Trustworthy enough for early, directional signal. Not for high-stakes product decisions.
Risk
False confidence and wasted time.
Updated after Caitlin: her framing of the magnitude problem makes me more confident this is right for directional questions and wrong for any "by how much" question.
2. What's the best mix of underlying data?
Prior
Interview transcripts + survey data + behavioural data. Transcripts do the heavy lifting on language and pain points.
Volume
At least 8-10 interviews to back any meaningful claim. Below that, flag claims as low-confidence.
Updated after Caitlin: the richer the data, the better, but format matters too. A satisfaction score and the reason behind that score are not the same prediction problem.
3. Where and when should you use them?
Prior
Synthetic personas earn their keep early in the product-building process, as a testing mechanism, not a replacement for real conversations.
Risk
Low. Both Mark and Jack are already seeing this in practice.
4. Static or dynamic? Does consistency matter?
Prior
Genuinely unsure. It can't be ideal that everyone gets a slightly different answer if they're relying on live retrieval.
Risk
Live retrieval drifts across teammates. Static docs go stale and stop matching reality. Neither is clearly better yet.
Caitlin Sullivan

If you're already building something like this, I want to hear what you've tried and where you've hit walls.

See you in edition 2!

- Tania, PMM @ Great Question

The build continues.
Follow along.

Edition 2 lands soon: the data audit I'm running, the rigor framework, the step-by-step build, and what I'm learning as I go.

01
Now · Terminology & hypothesis
Vocabulary, three workflows, Caitlin's insights, priors going in.
02
Soon · The build
Data audit, rigor framework, step-by-step, what I learn.
03
Coming · Head-to-head
Same study. Synthetic panel vs real recruited panel.
04
Coming · Recap & guide
Decision tree, full guide, and a Claude skill you can run yourself.