By Christopher Berry in newsletter — Sep 2, 2025

Can you hear the screech of moving goal-posts?

This week: Persona vectors, detection of financial bubbles, deep researcher, what do algorithms want?

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Obstruction is such an framing, rather, it’s emergent misalignment

“In addition to deployment-time fluctuations, training procedures can also induce unexpected personality changes. Betley et al. (2025) showed that finetuning on narrow tasks, such as generating insecure code, can lead to broad misalignment that extends far beyond the original training domain, a phenomenon they termed “emergent misalignment.”

“While our methods are broadly applicable to a wide range of traits, we focus in particular on three traits that have been implicated in concerning real-world incidents: evil (malicious behavior), sycophancy (excessive agreeableness), and propensity to hallucinate (fabricate information).”

“We show that both intended and unintended finetuning-induced persona shifts strongly correlate with activation changes along corresponding persona vectors (Section 4). And it can be reversed by post-hoc inhibition of the persona vector. Furthermore, we propose and validate a novel preventative steering method that proactively limits unwanted persona drift during finetuning (Section 5)”

Our results raise many interesting questions that could be explored in future work. For instance, we extract persona vectors from activations on samples that exhibit a trait, but find that they generalize to causally influence the trait and predict finetuning behavior. The mechanistic basis for this generalization is unclear, though we suspect it has to do with personas being latent factors that persist for many tokens (Wang et al., 2025); thus, recent expression of a persona should predict its near-future expression. Another natural question is whether we could use our methods characterize the space of all personas. How high-dimensional is it, and does there exist a natural “persona basis”?”

“Do correlations between persona vectors predict co-expression of the corresponding traits? Are some personality traits less accessible using linear methods? We expect that future work will enrich our mechanistic understanding of model personas even further.”

Chen, R., Arditi, A., Sleight, H., Evans, O., & Lindsey, J. (2025). Persona Vectors: Monitoring and Controlling Character Traits in Language Models. arXiv preprint arXiv:2507.21509.

https://arxiv.org/abs/2507.21509

Detection of financial bubbles using a log‐periodic power law singularity (LPPLS) model

Will agents behave this way?

“The LPPLS model contains the ingredients: (1) a system of traders who are influenced by their neighbors, (2) local imitation propagating spontaneously into global cooperation, (3) global cooperation between traders leading to crash, and (4) prices related to the properties of this system (Johansen & Sornette, 2001a). The LPPLS model provides a flexible and quantitative framework for detecting financial bubbles by analyzing the temporal price series of an asset.”

“The log-periodic oscillations of price growth of an asset in a bubble regime results from the competition between different types of traders. The interplay between technical trading and herding (positive feedback) and fundamental valuation investments (negative mean-reverting feedback) is the main source of price dynamic. The second-order oscillations of price dynamics results from the presence of inertia between information gathering and analysis on the one hand and investment implementation on the other hand, and the coexistence of trend followers and value investing (Farmer, 2002).”

“We find that a series of positive LPPLS confidence indicator clusters of the S&P 500 index have formed from August 2020 to January 2022. The positive LPPLS confidence indicator reaches the global peak value of 26.4% on Nov 30, 2021, which is a rare situation for stock markets, confirming that the price trajectory of S&P 500 index is indeed in a positive bubble regime and the system goes critical and becomes exquisitely sensitive to a small perturbation. Thus, the strong corrections of S&P 500 index from January 2022 stem from the increasingly systemic instability of the stock market itself, while the well-known external shocks, e.g., the decades-high inflation, aggressive monetary policy tightening by the Federal Reserve, and the impact of the Russia/Ukraine war, only serve as sparks. Further, the results show that the long-term LPPLS confidence indicator has better and more stable performance than the short-term LPPLS confidence indicator.”

Shu, M., & Song, R. (2024). Detection of financial bubbles using a log‐periodic power law singularity (LPPLS) model. Wiley Interdisciplinary Reviews: Computational Statistics, 16(2), e1649.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4734944

Deep Researcher with Test-Time Diffusion

Isn’t denoising really… … … analytics? Science?

“Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process.”

“This agent addresses the limitations of existing DR agents by conceptualizing report generation as a diffusion process. TTD-DR initiates with a preliminary draft, an updatable skeleton that guides the research direction. This draft is then refined iteratively through a “denoising”, dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a selfevolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process.”

Han, R., Chen, Y., CuiZhu, Z., Miculicich, L., Sun, G., Bi, Y., ... & Lee, C. Y. (2025). Deep Researcher with Test-Time Diffusion. arXiv preprint arXiv:2507.16075.

https://arxiv.org/abs/2507.16075

What do algorithms want? A new paper on the emergence of surprising behavior in the most unexpected places

“I don’t know, have you ever tried training it?”

“Sometimes people ask: “with your minds-everywhere framework, you might as well say the weather is intentional too!”. The assumption being that 1) these things can be decided from an armchair (by logic alone), and 2) that this would be an unacceptable implication of a theory (i.e., we can decide in advance, by definition, that whatever else, systems like the weather cannot be anywhere on the cognitive spectrum).

My answer is: “I don’t know, have you ever tried training it? We won’t know until we do.””

“To summarize: these basic algorithms have unexpected competencies to solve the problem we explicitly designed them for (in sorting space), and also apparently have behaviors (maximizing algotype clustering – a meta-property in morphogenetic space) that we had no idea about until we looked for it. My suspicions (which we are now testing) is that this may be fairly general, and that once we look, many (most? all?) algorithms will prove to have unexpected tendencies and capabilities. I think the continuity we see in development and evolution is far deeper than we realize. As the diverse intelligence field increasingly finds forms of learning, decision-making, goal-directed activity, and other emergent competencies in minimal unconventional substrates, some who want cognition to be a magically unique property of advanced brains will say “that’s not really what we meant by these terms”. Listen closely – can you hear the screech of the goal-posts being moved?”

https://thoughtforms.life/what-do-algorithms-want-a-new-paper-on-the-emergence-of-surprising-behavior-in-the-most-unexpected-places/

Reader Feedback

“Isn’t every post on Goodreads, in some way, promotional?”

Footnotes

Finally we arrive at the part of the cycle where it’s finally fashionable to call out the cultish behaviour of believers and panic about Meta’s hiring freeze and to start staking claims to the I-Told-You-It-Was-A-Bubble sweepstakes. The stakes have never been higher.

Engaging somebody who believes in the AGI Superintelligence Godhead is about as rewarding as engaging a flat earther. Dude, it just doesn’t matter. And it’s boring. Because what’s the point? Serious inquiry as to what AGI really means is met with the same kind fuzzy definitions and hand waving you get with a flat earther. Can you hear the screech of moving goal-posts?

The point isn’t that it has to make sense. The point is in the belief-speaking identity-reinforcement for the belieber. No shade. Some people like cults. A big part of liberty is having to freedom to make your own mind. Do it lady.

Yeah, it’s a bubble, and those wobbles in price are oscillations in belief itself. As above, so below.

That isn’t interesting.

What is interesting: what’s finally possible with the overbuilt infrastructure. How many minds have been changed about data? Finally, data quality appears to matter more in more contexts. Data governance is on the front burner. Conversational interface design is back after a deep trough. The brightest teams are meshing AI-classic with recent advancements in diffusion.

Actual problems are getting some attention.

What happens to all this compute?

NVIDIA appears to have gotten the data insight. Guess what they’re doing? They’re generating clean data streams from the physical world. They’re training software from how machines move. They’re instrumenting for observability of physical processes.

And that’s interesting.

Never miss a single issue

Be the first to know. Subscribe now to get the gatodo newsletter delivered straight to your inbox