By Christopher Berry in newsletter — Nov 18, 2025

Into the Market’s Mind

This week: Digital Twins of Customers, scaling collaborative effort, centaur, causal AI scientist, causal data science meeting, The Hinton Lectures

Predicting Behaviors with Large Language Model (Llm)-Powered Digital Twins of Customers

Using review data!

“From a managerial perspective, this consumer digital twin methodology, with LLMs’ generative capability, domain adaptability, and fine-tuning potential, enables firms to gain consumer insights, forecast behavioral responses, and optimize marketing mix. This approach mitigates the high cost of real-world marketing actions by enabling pre-deployment testing of campaigns, products, and offers through virtual consumer agents. Consequently, firms can enhance personalization, improve consumer experience, and ultimately increase marketing efficiency and return on investment. At the same time, our proposed approach offers a privacycompliant solution for consumer analytics in an era of diminishing third-party data access and rising regulatory constraints (e.g., General Data Protection Regulation and California Consumer Privacy Act). As cookie-based tracking becomes increasingly restricted, conventional consumer analytics face growing challenges in acquiring necessary data (e.g., page views, clicks, session duration, and ad interactions). Consumer digital twins, built through first-party data in consumer relationship management (CRM) systems and publicly available user-generated content (UGC), allow firms to personalize, forecast, and refine marketing decisions without invasive data collection. The consumer digital twin framework provides a scalable and adaptive solution for understanding preferences and predicting behaviors at an individual level, informing various marketing decisions while fully complying with contemporary data regulation standards.”

“The raw data is structured at the individual review level. To construct digital twins that reflect individual consumer’s behavior and preferences, we first aggregate reviews at the reviewer level, resulting in a pool of 54.5 million unique consumer IDs. Review counts follow a right-skewed distribution, with a median of 136 per consumer (SD = 158).”

“First, digital twins can significantly enhance the development and execution of marketing decisions. Because they simulate the behavior of real consumers, they can serve as synthetic testbeds for various forms of consumer research. For example, during new product development, marketers can expose digital twins to product descriptions, branding concepts, or price points to assess likely reactions and refine offerings before investing in full-scale launches. Similarly, email campaigns, ad copy, or promotional content generated via GenAI can be iteratively tested on these digital twins to evaluate emotional tone, relevance, or likelihood of purchase. This fusion of synthetic consumers and synthetic content enables a new form of marketing automation, where entire campaign pipelines can be optimized in silico before reaching real consumers. The feedback loop between generative content and simulated consumer response enables consumer targeting and personalization that traditional A/B testing or focus groups cannot match. Operationally, our findings on the trade-off between computational cost and performance offer guidelines for deployment. For instance, marketers with limited resources may opt for lower-epoch fine-tuning or focus on high-value consumers with rich behavioral history. Moreover, the robustness of the digital twin performance across varying category breadths suggests that these models are applicable to both niche and broad-market consumers.”

Li, B., Wei, Q. O., & Wang, X. S. (2025). Predicting Behaviors with Large Language Model (Llm)-Powered Digital Twins of Customers.

https://www.msi.org/working-paper/predicting-behaviors-with-large-language-model-llm-powered-digital-twins-of-consumers/

Completion ≠ Collaboration: Scaling Collaborative Effort with Agents

Whoever optimizes for this balance across capabilities may realize a significant competitive advantage

“Our results suggest that current agents are not merely underperforming—they are fundamentally misaligned with the dynamics of real collaboration, suggesting opportunities to rethink agent design.”

“A sweet spot exists in the effort distribution. When we consider performance ratings in Figure 5 (right), we find a nuanced relationship between effort balance and task success. For each model, there appears to be an optimal range of agent-to-user effort ratios where performance peaks. When either the user contributes disproportionately more effort (low agent-to-user ratio) or the agent dominates the interaction (high agentto-user ratio), joint performance tends to degrade. Notably, this sweet spot is model-dependent: claude-4.0- sonnet achieves strong performance across a broader range of effort ratios, while gpt-4o and llama-3.1-70b show more pronounced performance degradation outside their optimal ranges. This finding underscores the importance of calibrating collaboration patterns to match the underlying model’s capabilities.”

Shen, S. Z., Chen, V., Gu, K., Ross, A., Ma, Z., Ross, J., ... & Sontag, D. (2025). Completion $\neq $ Collaboration: Scaling Collaborative Effort with Agents. arXiv preprint arXiv:2510.25744.

https://arxiv.org/abs/2510.25744

Centaur: a foundation model of human cognition

A benchmark driven approach

“Centaur can simulate human behavior in (almost) real-time. For example, running an open-loop simulation of a typical two-step task experiment takes around 30 minutes, while it takes around 20 minutes for the average human participant. We believe that inference time could be further optimized to fully close this gap.”

“When the idea of a unified model of cognition was first proposed, researchers expressed concern that established areas of cognitive science might react negatively to such a model. In particular, they feared that the new approach might be seen as unfamiliar or incompatible with existing theories, just like an “intruder with improper pheromones” [69]. This could lead to an “attack of the killer bees”, where researchers in traditional fields would fiercely critique or reject the new model to defend their established approaches. To mitigate these concerns, the concept of a cognitive decathlon was proposed: a rigorous evaluation framework in which competing models of cognition are tested across ten experiments and judged based on their cumulative performance in them. In the current work, we applied Centaur to the equivalent of sixteen such cognitive decathlons, where it was tested against numerous established models and consistently won every competition. This outcome suggests that the data-driven discovery of domain-general models of cognition is a promising research direction. The next step for future research should be to translate this domain-general computational model into a unified theory of human cognition as envisioned by Newell [2].”

Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., ... & Schulz, E. (2024). Centaur: a foundation model of human cognition. arXiv preprint arXiv:2410.20268.

https://arxiv.org/abs/2410.20268

https://huggingface.co/datasets/marcelbinz/Psych-101

Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

CSV in, causes out

“In this work, we introduce Causal AI Scientist (CAIS), an end-to-end tool that maps natural language queries and datasets to formal causal inference tasks by automatically selecting appropriate methods and interpreting results. When evaluated across diverse causal inference tasks using CauSciBench, CAIS consistently outperforms baseline prompting strategies in method selection and achieves competitive performance in causal effect estimation, particularly on structured datasets such as QRData and synthetic examples. These results highlight the value of CAIS’s decision-tree-based approach, which decomposes complex reasoning into interpretable steps. This not only improves estimation accuracy but also enhances robustness and transparency—qualities critical for researchers and practitioners in social science, healthcare, and related fields.”

Verma, V., Acharya, S., Simko, S., Bhardwaj, D., Haghighat, A., Janzing, D., ... & Yang, Y. (2025). Causal AI Scientist: Facilitating Causal Data Science with Large Language Models. In NeurIPS 2025 AI for Science Workshop.

https://huggingface.co/spaces/CausalNLP/causal-agent

https://openreview.net/forum?id=EDWTHMVOCj

Causal Data Science Meeting

Thank you Paul Hünermund, Jermain Kaminski, Carla Schmitt, and Beyers Louw for organizing the event again this year

https://www.causalscience.org/

The Hinton Lectures

Evans did the work to simplify the ideas for the public and it paid off. There are conclusions on the slide below.

Hinton explained that aligning AI to think of humanity as a baby, and it as our mother, is among one of the better outcomes.

https://owainevans.github.io/hinton.html

Reader Feedback

“It’s as though there are no consequences for weak privacy policies.”

Footnotes

I’ve been Demo’ing a platform that enables anybody to build a panel of Digital Twin of Consumer’s (DTOC) and learn from them.

The feedback has been fascinating.

Many of the problems in marketing, in product management, and in the balance sheet is related to Product-Market-Fit (PMF). Revenue collapsing? Check PMF. Messy roadmap? Check PMF. Losing market share? Check PMF.

PMF is upstream from all the intermediate artifacts an organization routinely produces. The annual strategy. The quarterly OKR’s. The weekly sprint. The YouTube Activation for Christmas.

One key feature of PMF is that it’s amorphous. What is it? Is it a quality attribute of the organization that emerges as a result of the Market Strategy and the Product Strategy? Is it deliberately designed? If so, where is it expressed? Who can update it? How is it updated?

I’ve identified a segment defined by the goal to define and align on PMF. What is defined can be used to inspire, empower, and iterate. For this group, technology that reduces the uncertainty of PMF is a wanted solution.

Work continues on understanding… the PMF of PMF, the core data and the interface. That interface has been particularly vexing. More to come.

Never miss a single issue

Be the first to know. Subscribe now to get the gatodo newsletter delivered straight to your inbox