What genre of writing is AI-generated poetry?

Statistical Modeling, Causal Inference, and Social Science 2024-11-22

This is Jessica. There was a quiz on detecting AI-generated poetry making the rounds on social media yesterday that caught my attention. Poetry is a topic that I’m always curious about people’s perceptions of, since I suspect I have more knowledge of it than the average person. When I was younger I was very serious about it and even did an MFA in experimental poetics, at a program founded and directed by Beat poets Allen Ginsberg and Anne Waldman (though sadly Ginsberg was gone by the time I arrived in the mid aughts).

The quiz felt very easy – I got them all correct with little effort. Then I saw that it was created in response to a new Nature paper on how well people could detect AI-generated poems from a set (which included the poems in the quiz). They claim that “participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems (χ2(2, N = 16,340) = 247.04, p < 0.0001).” I had figured I’d probably do better than the average person, but to think that the trend was actually reversed in the participants surprised me at first.  

Here’s the abstract of the paper:

As AI-generated text continues to evolve, distinguishing it from human-authored content has become increasingly difficult. This study examined whether non-expert readers could reliably differentiate between AI-generated poems and those written by well-known human poets. We conducted two experiments with non-expert poetry readers and found that participants performed below chance levels in identifying AI-generated poems (46.6% accuracy, χ2(1, N = 16,340) = 75.13, p < 0.0001). Notably, participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems (χ2(2, N = 16,340) = 247.04, p < 0.0001). We found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification as human-authored. Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI.

To anyone who reads or follows poetry, especially the more avant-garde kind, it’s well known that the kind of poetry that leads to accolades is often esoteric, challenging, and not necessarily all that pleasant to read. Part of the reason I lost interest in poetry myself was because at its peak it is so obscure, and the audience for it so small. I don’t think its a stretch to say that the point of what is considered “good” poetry to the experts is to subvert language, to defy expectations in a way that is confusing but  opens up room for some unexpected sense of familiarity or recognition. This is true even of the classic, old-wealth New England scene that movements like the Beats were distancing themselves from. Consider Robert Lowell penning lines like “the glassy bowing and scraping of my will.” Or John Ashbery’s work, which put him at the pinnacle of established poetry, but also was a major influence on many outsider movements for his rejection of the conventions associated with establishment poetry.  

In its more extreme forms, poetry forces the reader to “stop making sense” at all. Gertrude Stein was a master of this. For how long can one continue trying to apply the usual tools of reading to extract meaning when faced with something like this? 

It is not a range of a mountain Of average of a range of a average mountain Nor can they of which of which of arrange To have been not which they which Can add a mountain to this. Upper an add it then maintain That if they were busy so to speak Add it to and It not only why they could not add ask Or when just when more each other There is no each other as they like They add why then emerge an add in It is of absolutely no importance how often they add it.

There’s a sense in which all poetry is deviant and for this reason, vulnerable even when it is most established or authoritative. I’m reminded of Anne Waldman frequently arguing that we must “Keep the world safe for poetry.”  

So now I’m wondering, is poetry the antithesis to today’s large language models? Intentionally breaking typical structures of language so that the juxtaposition of each new line, or even word is somehow surprising to the reader’s expectations seems pretty opposed to what we expect to get when we combine autoregressive, predict-the-next-word objectives with post-hoc adjustment through procedures like RLHF, where people (not usually selected for their expertise) are shown many model outputs and asked to provide their preferences. 

My colleague Matt Groh points out how results like the Nature study illustrate problems inherent to imitation game research: if you lack domain expertise and much knowledge of modern AI’s capabilities and limitations, it can be easy to fall for and even prefer simulacra. AI-generated content can be discernibly different from human-generated content, but in a way that seems more likable. 

It makes me wonder how many other genres of writing there are where asking people how difficult or awkward writing is could be a good predictor for which is generated by human experts. Academic writing generated by ChatGPT is another example that seems fairly easy to detect to those with lots of domain knowledge, but where their preferences might be opposed to what non-experts prefer or associate with the genre.

P.S. This post reminds me of this paper we wrote on the role of aesthetic judgments in assessing the nature of modern AI, where we talk about how the ways we’ve learned to read  art transfer to what we look for in AI outputs. It’s an interesting counterpoint to this discussion.