hydrogen jukeboxes: on the crammed poetics of "creative writing" LLMs – @nostalgebraist on Tumblr
hydrogen jukeboxes: on the crammed poetics of "creative writing" LLMs
This is a follow-up to my earlier brief rant about the new, unreleased OpenAI model that's supposed "good at creative writing."
It also follows up on @justisdevan's great post about this model, and Coagulopath's comment on that post, both of which I recommend (and which will help you make sense of this post).
As a final point of introduction: this post is sort of a "wrapper around" this list of shared stylistic "tics" (each with many examples) which I noticed in samples from two unrelated LLMs, both purported to be good at creative writing.
Everything below exists to explain why I found making the list to be an interesting exercise.
Background: R1
Earlier this year, a language model called "DeepSeek-R1" was released.
This model attracted a lot of attention and discourse for multiple reasons (e.g.).
Although it wasn't R1's selling point, multiple people including me noticed that it seemed surprisingly good at writing fiction, with a flashy, at least superficially "literary" default style.
However, if you read more than one instance of R1-written fiction, it quickly becomes apparent that there's something... missing.
It knows a few good tricks. The first time you see them, they seem pretty impressive coming from an LLM. But it just... keeps doing them, over and over – relentlessly, compulsively, to the point of exhaustion.
This is already familiar to anyone who's played around with R1 fiction – see the post and comment I linked at the top for some prior discussion.
Here's a selection from Coagulopath's 7-point description of R1's style in that comment, which should give you the basic gist (emphasis mine):
1) a clean, readable style
2) the occasional good idea [...]
3) an overwhelmingly reliance on cliche. Everything is a shadow, an echo, a whisper, a void, a heartbeat, a pulse, a river, a flower—you see it spinning its Rolodex of 20-30 generic images and selecting one at random.
[...]
5) an eyeball-flatteningly fast pace—it moves WAY too fast. Every line of dialog advances the plot. Every description is functional. Nothing is allowed to exist, or to breathe. It's just rush-rush-rush to the finish, like the LLM has a bus to catch. Ironically, this makes the stories incredibly boring. Nothing on the page has any weight or heft.
[...]
7) repetitive writing . Once you've seen about ten R1 samples you can recognize its style on sight. The way it italicises the last word of a sentence. Its endless "not thing x, but thing y" parallelisms [...]. The way how, if you don't like a story, it's almost pointless reprompting it: you just get the same stuff again, smeared around your plate a bit.
Background: the new OpenAI model
Earlier this week, Sam Altman posted a single story written by, as he put it:
a new model that is good at creative writing (not sure yet how/when it will get released)
Opinions on the sample were... mixed, at best.
I thought it wasn't very good; so did Mills; so did a large fraction of the twitter peanut gallery. Jeanette Winterson (!) liked it, though.
Having already used R1, I felt that that this story was not only "not very good" on an absolute scale, but not indicative of an advance over prior art.
To substantiate this gut feeling, I sent R1 the same prompt that Altman had used. Its story wasn't very good either, but was less bad than the OpenAI one in my opinion (though mostly by being less annoying, rather than because of any positive virtue it possessed).
And then – because people who follow AI news tend to be skeptical of negative human aesthetic reactions to AI, while being very impressed with LLMs – I had some fun asking various LLMs whether they thought the R1 story was better or worse than the OpenAI story. (Mostly, they agreed with me. BTW I've put the same story up in a more readable format here.)
But, as I was doing this, something else started to nag at me.
Apart from the question of whether R1's story was better or worse, I couldn't help but notice that the two stories felt very, very similar.
I couldn't shake the sense that the OpenAI story was written in "R1's style" – a narrow, repetitive, immediately recognizable style that doesn't quite resemble that of any human author I've ever read.
I'm not saying that OpenAI "stole" anything from DeepSeek, here. In fact, I doubt that's the case.
I don't know why this happened, but if I had to guess, I would guess it's convergent evolution: maybe this is just what happens if you optimize for human judgments of "literary quality" in some fairly generic, obvious, "naive" manner. (Just like how R1 developed some of the same quirky "reasoning"-related behaviors as OpenAI's earlier model o1, such as saying "wait" in the middle of an inner monologue and then pivoting to some new idea.)
A mechanical boot, a human eye: the "R1 style" at its...