Almost no one is building a synthetic user publicly and telling you what they're finding. So that's what we're doing, across four editions, start to finish.
Four editions · One live experiment
The vocabulary, three distinct build approaches from our team, an expert conversation, and the priors I'm taking into the experiment.
The data audit, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.
Same study, two panels: a synthetic one and a real one we recruit through Great Question. The results, side by side.
When synthetic users earn a spot in your workflow. The full guide, and a Claude skill you can run yourself.
For the past few months, I've been reading every take on synthetic users I can find. NN/G's "If, When, and How" piece. The ACM Interactions article on the people-pleasing problem. Park et al's 86% accuracy paper. The same LinkedIn debate playing out on repeat. A pile of accuracy studies that sometimes contradict each other.
Most of it is people arguing whether synthetic users should exist.
Great Question has a front-row view of this debate. Every day, our customers run extensive research with their own users and panels they recruit and schedule inside our product. And every day, we watch them build AI workflows on top of our MCP that surprise us, including their own synthetic users.
We hold a strong opinion that nothing replaces watching a person fumble through a prototype, or the magic that happens during a conversation.
But those same customers are asking us how to use AI to move faster. A lot of them are sitting on years of interview transcripts, survey data, support tickets, and product usage data. They want to know if any of that can act as a stand-in for an interview when the real one isn't possible.
Our position going in: synthetic personas, synthetic users, and synthetic panels are becoming a core layer of research and product workflows over the next year or less. The synthetic layer makes research cheaper to iterate on and faster to validate, and at its best, surfaces the gaps where new research is still required.
The best ones build on real customer data. The ones you generate from a prompt alone, or pull from a synthetic-user tool that doesn't know your audience, are a statistical impression of a demographic an LLM read about online. They answer like it. For the sake of this series, I am not exploring a ChatGPT or LLM-based synthetic user, as I believe these have no place in any product workflow.
So we decided to experiment with synthetic users and unfold what we find live with you!
First, the vocabulary
They get used interchangeably across LinkedIn, academic papers, and product docs, but describe completely different things, with completely different methods behind them. Getting crisp on which is which is the first step.
An archetype or representation of a group of users grounded in real research evidence. Describes a type of user, not a specific person.
A specific user with a name, attributes, voice, and behaviour. You either sample one from a persona, or clone a real user as a digital twin.
A group of synthetic users running through the same study together. The synthetic equivalent of a recruited panel.
Three distinct approaches
Between Mark and Jack on our product team, they named three distinct workflows. I'll need to pick one. Or maybe all three.
Take a real user you know well, strip out the personal stuff, store them as a synthetic user doc, and instruct the agent to play that role. Narrow but powerful.
Aggregate 8-10+ real users into a synthetic persona document, creating a defensible archetype of a power user vs a casual user, or another segmentation. The aggregation itself protects privacy.
The persona doesn't exist until you query it. It polls our MCP live, contextualises whatever it pulls based on the input you feed it (a PRD, a design file, a prototype) and comes back with customer evidence.
Calling in the expert
A week in, I called Caitlin Sullivan, a UX researcher digging into AI workflows for the last few years, writing one of the sharpest newsletters on AI in research, and running her own synthetic-user experiments on pricing and messaging. She's hesitant to be called an expert because no one is really defining this space yet. Which is exactly why I called her.

UX Researcher · AI workflow specialist
Author, AI Customer Research newsletter
· Running her own synthetic-user experiments on pricing and messaging.
"If you dig deeply into the methodology, nobody is doing anything the same way. It's all just kind of winging it and putting together their own completely unique methodology for how to go about replicating human behaviours."
The famous Stanford/Google paper replicated 1,052 humans. Everyone quotes 85%. Here's what's underneath:
Academic studies design forward: collect interviews, then a "holdout" set, then compare. Working backwards from a pile of existing transcripts breaks the comparison. Different questions from different interviews. Nothing is apples-to-apples.
"Using it to test a research study, it's one of the lowest risk use cases. What's the worst that could happen? It's just highlighting weaknesses or things you didn't think about."
Synthetic users can tell you somewhat whether users prefer A or B. They're unreliable on how much. Ask whether you can raise prices: useful directional read. Ask whether to raise by $5 or $10: falls apart.
Better at logical decisions (budget fit, seat licensing, workflow constraints) than emotional ones. The more emotional the call, the less accurate the prediction.
While Mark and Jack come at building a synthetic user from different angles, they both land on the same recommendation: early directional feedback before you commit human time or budget.
Take any artifact (a survey, a PRD, a design, a concept) and run it past a synthetic panel first. What comes out:
If you set up the synthetic user skill to require evidence behind every claim, and the persona can't make a claim because the evidence isn't there, that's actually a really good thing. AI is flagging a research opportunity for you.
Mark independently describes almost the same workflow, from the study-design angle: every study should start with a synthetic round before you go to humans. At worst, it stress-tests the questions and surfaces issues with the study design. At best, the AI panel comes back unanimous on a low-stakes question and you answer it without spending a dollar on recruitment. Most of the time it sits somewhere in the middle: cheaper iteration, and a much sharper study that you put in front of real people.
His framing on the economics: you've spent a little on tokens, but nothing on recruitment fees, incentives, or any of that overhead. It's a good way to raise the bar of quality across every study you run.
My priors going in
These aren't technically hypotheses. They're documented thoughts. The point of the next three editions is to find out which of these survive contact with actual data.

If you're already building something like this, I want to hear what you've tried and where you've hit walls.
See you in edition 2!
- Tania, PMM @ Great Question
Edition 2 lands soon: the data audit I'm running, the rigor framework, the step-by-step build, and what I'm learning as I go.