Synthetic Data Is Rising, But What Happens to Real Fieldwork?

It promises speed, control, and scalability. But it also raises a more uncomfortable question about what we are actually measuring.

Synthetic data is no longer a side topic in market research. It is becoming a real option, sometimes even the default starting point for certain types of studies.

Over the last year, more tools have started to generate respondents instead of recruiting them. Profiles are simulated, behaviours are modelled, and datasets are produced without any direct interaction with real people.

On paper, the logic is hard to ignore. Fieldwork is often slow, expensive, and unpredictable. Synthetic data removes most of these constraints in one move.

But removing constraints also changes the nature of the process.

Fieldwork has always been where research meets reality in its most imperfect form. Recruitment does not always match expectations, respondents interpret questions in their own way, and answers are often less clean than what the framework initially assumes.

That friction is not incidental. It is part of the signal.

When people struggle to answer, hesitate, or contradict themselves, they are not breaking the methodology. They are revealing something that the structure alone could not anticipate.

Synthetic data, by design, removes that layer. It produces answers that align with patterns, that remain internally consistent, and that can be processed without resistance.

The result is easier to handle, but also less exposed to the unexpected.

Controlled data is not necessarily grounded data

It is easy to associate control with quality, especially when dealing with large datasets. If everything is coherent and statistically stable, the output appears reliable.

But reliability in appearance is not the same as grounding in reality.

Real respondents introduce noise, but that noise often reflects context, misunderstanding, or lived experience. These elements are difficult to standardise, yet they are precisely what gives research its depth.

When they disappear, the data becomes more abstract. It reflects a model of behaviour rather than behaviour itself.

Keeping a link between models and reality

Synthetic data can be useful when it is used for what it is: a model, not a substitute. It can help explore scenarios, test assumptions, or extend existing datasets.

The problem begins when that distinction fades.

Right now, different players in the industry are integrating these tools in very different ways, often without making explicit what part of the process is still grounded in real observation and what part is generated.

Over time, this lack of clarity makes it harder to evaluate the quality of what is being produced.

What becomes necessary, then, is not to slow down adoption, but to make the underlying approach more explicit. If synthetic data is part of the workflow, its role has to be framed, its limits understood, and its relationship to fieldwork clearly defined.

Without that, research risks drifting away from the reality it is supposed to describe, while still maintaining the appearance of rigour.