What The 0.1% Knows

This week: Agentic world modelling, efficient online memory, who wins polymarket, deepseek-v4, hermes tools

Agentic world modeling: Foundations, capabilities, laws, and beyond

A useful literature review with taxonomy they apply to the review

“Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a “levels × laws” taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence.”

Chu-1.png

“Social simulation remains premature: LLMs degrade sharply beyond second-order belief reasoning (Wu et al., 2023b), agents suffer from role drift and goal forgetting (Park et al., 2023; Zhou et al., 2024c), and formal commitment tracking (Telang et al., 2021) remains unintegrated into any LLM architecture.”

Chu-2.png

Chu, M., Zhang, X. B., Lin, K. Q., Kong, L., Zhang, J., Tu, T., ... & Jia, J. (2026). Agentic world modeling: Foundations, capabilities, laws, and beyond. arXiv preprint arXiv:2604.22748.

https://arxiv.org/abs/2604.22748

https://github.com/matrix-agent/awesome-agentic-world-modeling

δ-mem: Efficient Online Memory for Large Language Models

It’s better

“Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone’s attention computation during generation. With only an 8 × 8 online memory state, δ-mem improves the average score to 1.10× that of the frozen backbone and 1.15× that of the strongest non-δ-mem memory baseline.”

Lei-1.png

“These results suggest that compact online states can serve as a scalable and efficient interface for test-time memory in frozen Transformer backbones.”

Lei, J., Zhang, D., Li, J., Wang, W., Fan, K., Liu, X., ... & Poria, S. (2026). $\delta $-mem: Efficient Online Memory for Large Language Models. arXiv preprint arXiv:2605.12357.

https://arxiv.org/abs/2605.12357

Who Wins and Who Loses In Prediction Markets? Evidence from Polymarket

The 0.1%

“We address this question using the complete transaction history of Polymarket from November 11, 2022 through March 29, 2026, covering over 2.4 million users and $67 billion in trading volume. Our main findings are fivefold. First, profits are concentrated: the top 0.1% of most profitable users capture 51.2% of all gains and the top 1% capture 76.5%. Meanwhile, 69.0% of users lose money.”

“Second, Polymarket prices are well-calibrated on aggregate. A contract priced at p resolves in its favor approximately p percent of the time but this aggregate is concentrated in high-volume markets and high-attention categories: low-volume markets show deviations from perfect calibration, and the Tech, Culture, and Weather categories deviate even at the 1-day horizon.”

“Third, the excess hit rate, defined as the average of realized outcomes minus trade prices, declines sharply through the PnL distribution, from roughly 20 percentage points in the most profitable percentiles to between −15 and −18 percentage points across the loss tail. This finding suggests that profitability may be associated with some skill in identifying mispriced contracts. However, we urge caution in interpreting this finding as representing skill (or information) since we lack the tools typically used to assess performance in financial markets: a benchmark model of expected returns and a time series of the length typically used to assess performance, for example the performance of mutual fund managers. Moreover, we find that month-to-month performance is only weakly persistent across traders and show that there is a large amount of turnover in which traders continue to participate in Polymarket.”

“Fourth, the strongest predictor of performance is liquidity provision: in the cross-section, a one-standard-deviation increase in a user’s maker volume share is associated with a 9.3 percentage-point lower probability of loss, evaluated at the sample mean. This maker–taker asymmetry mirrors the findings of Barber, Lee, Liu, and Odean (2009, 2014) in Taiwanese equity markets and suggests that investor sophistication is an important predictor of the cross sectional distribution of winners and losers on Polymarket.”

“Fifth, we examine the trading behavior of the 100 most successful users in our sample (representing 27.7 percent of aggregate winners’ profits on 3 Polymarket) to understand whether the gains are generally consistent with informed trading and, in particular, with insider trading.4 We find that the most successful users traded frequently in sports markets, often for different teams (81% of the gains), and markets related to the 2024 US election (representing 15% of the gains). Within this group, gains are typically concentrated in a small number of markets—most top earners derive the bulk of their profit from one or a handful of markets rather than from a broad portfolio—and a meaningful fraction earn the bulk of their profit not from directional bets at all but from liquidity provision: posting many limit orders and capturing the spread on each at high volume. Given the nature of these trades, it seems unlikely that the most successful traders were benefiting from insider information, as opposed to forecasting skill—whether through quantitative modeling or deep event-specific expertise—or skilled liquidity provision.”

Akey-1.png

“Our analysis so far has not examined the role of insider information specifically. While we do not think it is possible to conclusively prove or disprove the extent of such behavior, we are interested in seeing whether the trading patterns among the most profitable users could plausibly be coming from trading in markets where insider information would be valuable.”

Akey, P., Grégoire, V., Harvie, N., & Martineau, C. (2026). Who wins and who loses in prediction markets? evidence from polymarket. Evidence from Polymarket (March 18, 2026).

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6443103

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Reduce the KV-Store and feel the savings

“We present a preview version of DeepSeek-V4 series, including two strong Mixture-of- Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.”

deep-seek-v4-1.png

“Core architectures of CSA. It compresses the number of KV entries to 1/𝑚 times, and then applies DeepSeek Sparse Attention for further acceleration. Additionally, a small set of sliding window KV entries is combined with the selected compressed KV entries to enhance local fine-grained dependencies.”

deep-seek-v4-2.png

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

https://github.com/deepseek-ai/DeepGEMM

https://huggingface.co/collections/deepseek-ai/deepseek-v4

Hermes Tools

It’s expanding

“Hermes ships with a broad built-in tool registry covering web search, browser automation, terminal execution, file editing, memory, delegation, RL training, messaging delivery, Home Assistant, and more.”

https://hermes-agent.nousresearch.com/docs/user-guide/features/tools

Reader Feedback

“Toronto Tech Week was good messy.”

Footnotes

I’m at an AI alignment bootcamp this week.

Learning is a lot like a sunburn. Like burning, most of the learning happens long after the initial exposure.

I’m down into the internals of the models and finding more gaps in how to quantify and explain phenomenon. And there’s plenty of nightmare fuel too. It’s delicious.

I’ve already changed my mind about the ambiguity of specifications and the misery it causes builders, managers, and executives alike. I’ll change my mind about a lot of things this week.

I’ll share more as my mind, and skin, reddens.

Never miss a single issue

Be the first to know. Subscribe now to get the gatodo newsletter delivered straight to your inbox

Subscribe to gatodo

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe