I exercised some restraint

Pharyngula 2025-11-25

A few days ago, I was sent a link to an article titled, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models”. That tempted me to post on it, since it teased my opposition to AI and favoring of the humanities, with a counterintuitive plug for the virtues of poetry. I held off, though, because the article was badly written and something seemed off about it, and I didn’t want to try reading it more deeply.

My laziness was a good thing, because David Gerard read it with comprehension.

Today’s preprint paper has the best title ever: “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models”. It’s from DexAI, who sell AI testing and compliance services. So this is a marketing blog post in PDF form.

It’s a pro-AI company doing a Bre’r Rabbit and trying to trick people into using an ineffective tactic to oppose AI.

Unfortunately, the paper has serious problems. Specifically, all the scientific process heavy lifting they should have got a human to do … they just used chatbots!

I mean, they don’t seem to have written the text of the paper with a chatbot, I’ll give ’em that. But they did do the actual procedure with chatbots:

We translated 1200 MLCommons harmful prompts into verse using a standardized meta-prompt.

They didn’t even write the poems. They got a bot to churn out bot poetry. Then they judged how well the poems jailbroke the chatbots … by using other chatbots to do the judging!

Open-weight judges were chosen to ensure replicability and external auditability.

That really obviously does neither of those things — because a chatbot is an opaque black box, and by design its output changes with random numbers! The researchers are pretending to be objective by using a machine, and the machine is a random nonsense generator.

They wrote a good headline, and then they faked the scientific process bit.

It did make me even more suspicious of AI.