ao link
Business Reporter
Business Reporter
Business Reporter
Search Business Report
My Account
Remember Login
My Account
Remember Login

Building high quality synthetic data

Linked InXFacebook

Karl Weaver at Yonder Consulting argues that a synthetic future needs an organic core 

 

We live in an increasingly data-dependent world. It informs financial decisions, business strategies, and marketing plans. And with our increasing reliance on AI, we will only see the demand for quality data grow.

 

But data is not a limitless resource nor is it always accessible in the ideal format and timeframe. Getting the right insights for whether an advertisement worked on a particular social group or if the latest McDonald’s burger is popular with the over-65s takes analysis.

 

The promise of synthetic data is that it will generate these insights quickly and cheaply, potentially improving business decisions. But is that really the case?

 

Synthetic data is set to be a $2.3bn market by 2030, but technically it’s not a new concept. We’ve been using forms and techniques of synthetic data in different ways for years, but the accessibility has recently vastly improved. It’s now generated in one of three ways: taking real-world data and reassembling it to replicate similar statistical properties, using machine learning to replicate relationships between variables, or through deep-learning neural networks.

 

There are lots of potential uses for synthetic data; too many to list here. In general, we consider there to be two broad classes of valid use for synthetic data. The first is for initial hypothesis development prior to testing, using specific data. The second is experimentation using synthetically generated data that has its foundation in quality primary data.

 

 

A synthetic future 

One place where it may be effective is in providing insights on hard-to-access groups. If we want to form a greater understanding about a particularly niche demographic, for example rural 40-50 year-olds who don’t drive, then by using high quality, primary data on this hard-to-reach audience to train synthetic data, we have the opportunity for almost limitless further analysis, iteration and experimentation.

 

However, using synthetic data to understand audiences for which there is no primary data will not yield any meaningful insight, especially if the source data on which it is based has not been rigorously examined. Without that scrutiny, synthetic data can be as unreliable as pure speculation.

 

It’s also hailed as a great means of ensuring data privacy and compliance are maintained. Increasing regulation and consumer opinion is bringing these conversations to the forefront, which means that data-reliant industries are having to adapt. Synthetic data is free from Personally Identifiable Information (PII), so is suitable for many different uses without the usual GDPR and security concerns. This could be a boon for businesses wishing to do more with data while maintaining strict adherence to data privacy regulation.

 

 

A hollow dataset?

But while proponents of synthetic data may want to sing its praises without caveat, it is not a silver bullet to all the problems of data acquisition. Accuracy is a major concern for most clients when it comes to using synthetic data for their insights, a concern that I share. Businesses rely on quality market research data and insights to make informed decisions, and perceived ‘shortcuts’ can make people skittish.

 

Accuracy is not an issue when synthetic data is high-quality, and properly deployed and analysed, but sourcing data from people themselves does give a depth and richness to the narratives where synthetic data may struggle to capture the nuances of human insight. There is also the question of bias: synthetic data has the potential to correct biases, but where it is generated from human insight it could inherit existing biases or be affected by the explicit or implicit bias of an analyst or models used.

 

Additionally, synthetic data will tend to prioritise trends and underlying patterns while dismissing outlying data points. In terms of pure statistics, outliers are often worth disregarding, but they can also be where the golden nuggets lie that explain why an advert is working or why a product isn’t selling.

 

 

No insight without asking questions

The growing demand for data and the growing sophistication of the modelling techniques have not only precipitated the synthetic data boom but also continue to fuel it. While not all data is created equal, synthetic data does have its place in the world of insights generation and therefore will be informing business decisions with an ever-increasing level of penetration.

 

We’ve come a long way in the past few years but we’re still in the early stages of development. An experimental approach is required to continue to assess where it works and where it doesn’t, and crucially where it can be improved. The insights landscape of the future will look markedly different to today, and synthetic data has a role to play, but the fundamental importance of understanding consumers’ opinions through quality research-data will continue.

 


 

Karl Weaver is Managing Partner at Yonder Consulting

 

Main image courtesy of iStockPhoto.com and Just_Super

Linked InXFacebook
Business Reporter

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

© 2025, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543