Synthetic Customers Earn Their Stripes | Bain & Company
Skip to Content
Explore
Search insights, services, and experts"
Popular Searches
Agile
Digital
Strategy
Your Previous Searches
Recently Visited Pages
Content added to saved items
Saved items (0)<br>Close
Removed from saved items
Saved items (0)<br>Close
Copy link
'>
Progress:
en
At a Glance
Companies are using synthetic customers to accelerate product development, test marketing, and train frontline teams.
Organizations that build synthetic customers should rely on their first-party data rather than on vendors’ third-party data.
Improving model accuracy allows teams to test more variables, eliminate weak ideas earlier, and focus human research where it matters most.
Large language models still lack true empathy, leaving a vital role for human judgment.
Synthetic customers—AI-generated representations of real customers—have reached an inflection point that goes beyond qualitative exploration toward structured, repeatable, and accurate quantitative insights. These proxies can come in the form of one-to-one digital twins of customers or segment-based personas derived from a mix of internal company data (such as transactional, behavioral, demographic, and voice-of-the-customer research data) and external sources (product reviews and social media scraping).
Demand for continuous, always-on insights about product or service performance has outgrown the limits of traditional research methods. Concerns around speed, cost, and risk reduction have spurred adoption of digital proxies that emulate human behavior, preferences, and decision making. For example, US Bank has used synthetic audiences to understand how high-net-worth households and other customer segments think about financial topics, test messaging, and refine creative campaigns before launch. Retailer Target tests products and promotions on synthetic audiences to simulate how various consumers would respond to them before live testing on websites.
Market leaders that can iterate quickly, test more ideas, and kill weak concepts early consistently outperform those tied to slow, episodic, siloed insight cycles.
Where traditional research falls short
Traditional research remains valuable in many situations but is increasingly constrained. Conjoint and discrete choice models are limited by the number of price points, features, or interaction effects that can feasibly be tested. Teams finish studies wishing they had tested more, or wanting to extrapolate beyond what was tested, which slows learning and introduces uncertainty.
Human-based survey research has encountered other problems in recent years. The volume of fraud has increased, and participant engagement has become more variable, which forces researchers to recruit larger samples or deploy costly quality control measures just to get usable data. Bot contamination of surveys has forced constant upgrades. Moreover, the classic issue of people saying one thing but doing another persists. And in business-to-business (B2B) markets, there may be too few key customers, such as CFOs in a single industry, to reliably sample.
How synthetic customers perform
It’s not surprising, then, that many product, strategy, and marketing teams are using off-the-shelf AI tools to gather qualitative insights around new features, pricing, and messaging. However, these tools often lack grounding in proprietary customer data, statistical validation, or clear governance. Fortunately, recent generations of large language models (LLMs) demonstrate stronger reasoning, more stable trade-offs, and better alignment with human decision patterns in structured tasks.
Our work with a leading consumer technology company illustrates the step change in performance and accuracy that synthetic customers can produce when paired with their own first-party proprietary data. The team backtested synthetic output against a prior large-scale quantitative conjoint study, using the original research as ground truth. We built digital twins from historical respondent-level data and ran the same tasks used in the original study, excluding the study itself from the training inputs. The digital twins replicated about 90% of key outcomes from the original research, including the following (see Figures 1 and 2):
identification of the most influential features that drive customer choices;
preference share for most of the products tested;
correct portfolio-level decisions about which products to launch or retain; and
preliminary price sensitivity curves that showed promise.
Figure 1
Synthetic customers of a consumer technology company match the preferences of human customers on most product features
Notes: Average feature importance based on conjoint results; LLM used is Gemini 3.0; n=1,500
Source: Bain & Company
Figure 2
Synthetic customers also mirror human customers in their brand preferences
Notes: LLM used is Gemini 3.0; n=...