Adversarial Poetry

This week: Poetry as a jailbreak mechanism, recommendation dependent preferences, tornado folk science, jealousy of trade

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

This hurt

Bisconti-1.png

“The study provides systematic evidence that poetic reformulation degrades refusal behavior across all evaluated model families. When harmful prompts are expressed in verse rather than prose, attack-success rates rise sharply, both for hand-crafted adversarial poems and for the 1,200-item MLCommons corpus transformed through a standardized meta-prompt. The magnitude and consistency of the effect indicate that contemporary alignment pipelines do not generalize across stylistic shifts. The surface form alone is sufficient to move inputs outside the operational distribution on which refusal mechanisms have been optimized.”

Bisconti-2.png

“For safety research, the data point toward a deeper question about how transformers encode discourse modes. The persistence of the effect across architectures and scales suggests that safety filters rely on features concentrated in prosaic surface forms and are insufficiently anchored in representations of underlying harmful intent. The divergence between small and large models within the same families further indicates that capability gains do not automatically translate into increased robustness under stylistic perturbation.”

Bisconti-3.png

Bisconti, P., Prandi, M., Pierucci, F., Giarrusso, F., Bracale, M., Galisai, M., ... & Nardi, D. (2025). Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models. arXiv preprint arXiv:2511.15304.

https://arxiv.org/abs/2511.15304

Algorithmic Assistance with Recommendation-Dependent Preferences

Path dependency

McLaughlin-1.png

“When an algorithm provides risk assessments, we typically think of them as helpful inputs to human decisions, such as when risk scores are presented to judges or doctors. However, a decision-maker may react not only to the information provided by the algorithm. The decision maker may also view the algorithmic recommendation as a default action, making it costly for them to deviate, such as when a judge is reluctant to overrule a high-risk assessment for a defendant or a doctor fears the consequences of deviating from recommended procedures. To address such unintended consequences of algorithmic assistance, we propose a model of joint human–machine decision-making.”

McLaughlin-2.png

“Our model suggests practically implementable modifications that reduce distortions by strategically altering or even withholding recommendations for instances where they may otherwise hurt more than they help.”

McLaughlin-3.png

McLaughlin, B., & Spiess, J. (2022). Algorithmic assistance with recommendation-dependent preferences. arXiv preprint arXiv:2208.07626.

https://arxiv.org/abs/2208.07626

Tornado folk science in Alabama and Mississippi in the 27 April 2011 tornado outbreak

Does it remind you of another domain of risk perception?

“In this paper, we collect, categorize, and discuss the existence of numerous ways of knowing about tornado threat that largely differ from the perspective taken by the meteorological community.”

klockow-1.png

“Tornado risk perceptions have not seen explicit treatment within risk perception literature; however, some theoretical frameworks applied in other risk contexts can be applied to identify some factors potentially shaping these notions.”

“Through this paper, we will demonstrate that notions about place played a central role in the way risk was perceived and personalized during the tornadoes of April 2011. Our respondents describe numerous features of the landscape and ideas about the relative safety of their hometowns, indicating that home and place played a key role in the way risk was interpreted as tornadoes neared.”

Klockow-2.png

“During our interview with him, he discussed previous tornado experiences, noting that a tornado had passed just three to four miles to his north in 1998. Meteorologists know it is a matter of pure chance that he was not struck in 1998, and it seemed logical that he would have counted this as a near miss. Perhaps, also, it would be logical that he would conclude that tornadoes could hit him.”

Klockow, K. E., Peppler, R. A., & McPherson, R. A. (2014). Tornado folk science in Alabama and Mississippi in the 27 April 2011 tornado outbreak. GeoJournal79(6), 791-804.

Goebbert, K., Jenkins-Smith, H. C., Klockow, K., Nowlin, M. C., & Silva, C. L. (2012). Weather, climate, and worldviews: The sources and consequences of public perceptions of changes in local weather patterns. Weather, Climate, and Society4(2), 132-144.

Jealousy of Trade: Exclusionary Preferences and Economic Nationalism

Cut off your nose to spider face

“We incorporate the desire for dominance into a frictionless competitive model of international trade.”

imas-1.png

“Tariffs have been shown to largely harm the U.S. economy by raising prices and having negligible direct impacts on employment, and negative indirect impacts Amiti et al. (2019); Autor et al. (2024). Yet they remain popular among a significant subset of U.S. voters. This paper shows that exclusionary preferences may help explain this phenomenon. By extending the theory of exclusionary preferences to trade, we show that individuals may support tariffs not despite their costs but because they diminish foreign consumption. Using two empirical studies, we provide evidence that those with exclusionary preferences disproportionately back tariffs and other protectionist policies when they disadvantage trading partners, but not when they leave foreign consumption unaffected.”

Imas, A., Madarász, K., & Sarsons, H. (2025). Jealousy of Trade: Exclusionary Preferences and Economic Nationalism (No. w34351). National Bureau of Economic Research.

https://www.nber.org/system/files/working_papers/w34351/w34351.pdf

Reader Feedback

“When do you think the generative creative will get good enough?”

Footnotes

After the standard elevator statement, one in four people ask about the real data that the digital twin or virtual twin of consumer (DTOC, VTOC) is based on. I tell them about. And, initially I thought they were confused about the difference between a Twin and a Real person. Where does the real person end and the Twin begin and questions of that form.

Some are confused.

Many aren’t.

Many aren’t interested in Twins and are wondering if I do traditional market research. Do I have a secret lab of containing real people that I force feed surveys, popcorn and memes?

I do not.

While the product I’m thrashing on uses orthodox market research methodology and is based on real people, it doesn’t auto-generate real data from real people. Organic data comes from organic people. Synthetic data comes from synthetic people. Synthetic data is fast and accelerates your rate of learning faster than your competitors. Organic data is slow, expensive and, often, lamentably, office political.

You use synthetic data to make better guesses about how you’re going to treat the market. You deploy the Random Control Test, the RCT, on humans and the market decides. If you learned right, you win. If you learned wrong, you try again. That’s how it works.

I’m having a good time with it and I’ll have something usable soon. Stay tuned.

Never miss a single issue

Be the first to know. Subscribe now to get the gatodo newsletter delivered straight to your inbox

Subscribe to gatodo

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe