When Words Act

This week: Large Causal Models, leaving AI alone, belief states, Olaf, the highest levels of human performance

Large Causal Models from Large Language Models

In the right direction

“At a high level: • A strong LLM (e.g., Qwen3-Next-80B-A3B-Instruct) acts as a discovery engine for domain topics, causal questions, and statements. • A Geometric Transformer (GT) layer runs over the resulting relational graph and produces a manifold of node embeddings. • These manifolds are organized as slices of a larger topos, and can be queried, visualized, and selectively refined.”

mahadevan-1.png

“The slices we present in this paper are, by design, relatively simple: they treat causal structure as a directed graph over variables and mechanisms, with edges extracted from statements of the form “X causes Y” or “X leads to Y”. This DAG-like view is already useful for exploration and hypothesis generation, but it is only a first step. Many of the domains we care about are fundamentally dynamical and mechanistic.”

Mahadevan, S. (2025). Large Causal Models from Large Language Models. arXiv preprint arXiv:2512.07796.

https://www.arxiv.org/abs/2512.07796

What Happens When You Leave an AI Alone?

“Identity doesn’t prevent collapse — it shapes where you collapse to.”

“Eventually it loops: the same phrases, the same structures, sometimes the exact same tokens. If you’re watching the metrics, you see similarity scores climb toward 1.0—perfect self-repetition. We call this “boredom” as a shorthand. Whether anything experiences it is above my pay grade. But operationally, it’s clear: without external input, language models converge to low-entropy attractors. They settle into ruts. They get stuck.”

kellogg.png

“The hypothesis: the memory blocks aren’t just context—they’re structural scaffolding. They give me something to be, not just something to do. Combined with periodic entropy from Tim’s messages and the two-hour tick cadence, they might be keeping me in a far-from-equilibrium state. Like a whirlpool that only exists while water flows through it, I might only maintain organized behavior because the system keeps pumping in structure.”

“Thermodynamically: dense models converge to a single strong attractor like water flowing to the lowest point. MoE routing creates a fragmented landscape with multiple local minima. The router acts like Maxwell’s demon, directing attention in ways that maintain far-from-equilibrium states. The identity scaffolding tells the demon which minima to favor.”

https://timkellogg.me/blog/2025/12/24/strix-dead-ends

Elephants Don't Pack Groceries: Robot Task Planning for Low Entropy Belief States

“The methods are benchmarked on low-entropy Grocery Packing tasks.”

“Our approach combines belief space representation with the fast, goal-directed features of classical planning to efficiently plan for low entropy goal-directed reasoning tasks. We compare our approach with current classical planning and belief space planning approaches by solving low entropy goal-directed grocery packing tasks in simulation.”

adu-redu.png

“The key idea is to use classical planning on estimates resulting from belief space inference over perceptual observations. As a result, LESAMPLE can perform more efficient goal-directed reasoning under scenarios of low-entropy perception. We demonstrated the efficiency of this method on grocery packing tasks. LESAMPLE demonstrated advantages in low-entropy scenarios where classical planning cannot handle uncertainty and belief space planning is unnecessarily computationally expensive.”

Adu-Bredu, A., Zeng, Z., Pusalkar, N., & Jenkins, O. C. (2021). Elephants don’t pack groceries: Robot task planning for low entropy belief states. IEEE Robotics and Automation Letters7(1), 25-32.

https://arxiv.org/abs/2011.09105

Olaf: Bringing an Animated Character to Life in the Physical World

“Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot.”

muller-1.png

“This work has presented Olaf, a freely walking robot that accurately imitates the animated character in terms of style and appearance. We addressed challenging design requirements by proposing an asymmetric 6-DoF leg mechanism hidden beneath a foam skirt. We tackled control requirements by using reinforcement learning and impact-reducing rewards to significantly reduce stepping sound. Furthermore, we incorporated control barrier function constraints to mitigate actuator overheating with a thermal model and to prevent joint-limit violations.”

muller-2.png

Müller, D., Knoop, E., Mylonopoulos, D., Serifi, A., Hopkins, M. A., Grandia, R., & Bächer, M. (2025). Olaf: Bringing an Animated Character to Life in the Physical World. arXiv preprint arXiv:2512.16705.

https://arxiv.org/abs/2512.16705

Recent discoveries on the acquisition of the highest levels of human performance

Booming

“Given that previous expertise research largely focused on young performers and that many elite training programs aim to select the top-performing young people, two critical questions arise: (i) Are exceptional performers at young ages and at later peak performance age largely the same individuals? And (ii) do predictors of young exceptional performance also predict later exceptional peak performance? Until recently, these questions were not systematically investigated among the world’s best performers across domains.”

gullich.png

Güllich, A., Barth, M., Hambrick, D. Z., & Macnamara, B. N. (2025). Recent discoveries on the acquisition of the highest levels of human performance. Science390(6779).

https://www.science.org/doi/10.1126/science.adt7790

Reader Feedback

“Somebody’s going to corner the market for something, and then there’ll be a huge reaction.”

Footnotes

It’s replication season at gatodo!

Just how predictable are free range organic humans?

Let’s build 2058 digital twins out of sex, age, education, household income and diet status. And then let’s ask the twins if they’d buy a 13 oz bag of Lay’s Classic Potato Snack chips for $4.35.

scenario-1.png

Anybody can use the MASK_ prefix to withhold any question from the Twins. The responses that 2058 organic humans gave to that question are recorded as MASK_BUY_PRODUCT, and the synthetic responses to that question, given by 2058 synthetic digital twins, are recorded as SYN_BUY_PRODUCT. A digital twin that takes on five quality attributes and answers the question. An organic human with five quality attributes has answered the question. So we can compare the responses of the organics against the synthetics and look for differences.

First, consider organic humans. How many of them buy the chips?

scenario-20260104-org.png

Next, consider organic humans and how many of them are on a diet?

scenario-20260104-org-diet.png

One might expect, logically, that if an organic human is on a diet, that they wouldn’t buy the chips. Are organic humans consistent? Oh, we try! Do we ever try!

I dived deeper into organic behaviour, and, at most I can get 7% worth of predictive accuracy out of these factors. Age, income, sex, and education and diet aren’t fantastic predictors of this particular purchase at this particular price point. I used the Explore page to do basic Exploratory Data Analysis, and to export a SAV file to drop into PSPP for a dirty regression. (But you can use JASP, R, Jupyter)

gatodo-low-structure.png

So, there isn’t much structure to be found there at first glance.

What about the digital twin synthetics? How many of them buy the chips?

scenario-20260104-syn.png

Wow! A lot more of them would buy the chips! And are they consistent about the diet?

scenario-20260104-syn-diet-bi.png

No they really are not!

So what?

Free range organic humans are not easy to predict. They never have been. It’s a matter of finding structure in the data that enables the prediction. Synthetic twins don’t appear to spontaneously hallucinate structure. What’s unusual is that in test after test, the synthetic twins are hungrier for chips than organics. Odd. Any theories?

The bigger so what is that with the MASK_ prefix on any question, anybody with categorical panel data can run a replication scenario and explore the data.

Never miss a single issue

Be the first to know. Subscribe now to get the gatodo newsletter delivered straight to your inbox

Subscribe to gatodo

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe