What is good writing? - by Jonas Öman - Jonas's Substack
Jonas's Substack
SubscribeSign in
What is good writing?
Jonas Öman<br>Jun 13, 2026
Share
Is the quality of writing determined by how the piece ripples through one’s soul, or how the ideas ripple through society? What can a lone evaluator—gazing through the fogs of uncertainty—even tell about the latter? What is good writing, and what do we want good writing to be?
Someone recently decided to start a writing competition, where the judging is done by LLMs. The idea is to test how well LLMs can replicate human judgment, and whether or not it can effectively be used as a filtering mechanism to get more eyes on writing that people will appreciate.<br>The early experiments they ran suggested that LLMs are quite capable of doing this, which is perfectly in alignment with my personal intuitions. Assuming that they are, this actually makes them very interesting windows into what constitutes good writing from a human perspective.<br>So I did the obvious thing, I copy pasted a bunch of essays, had LLMs perform judgments and then examined their judgment through additional prompting. The initial prompt I used was the same one Philosophy Bear mentioned in his original experiment:<br>You are a judge in a writing contest. You will be shown two essays and asked to score each one.<br>Score each essay on a scale of 0-10, where 5 represents a typical, competent contest submission — neither impressive nor poor. Scores below 5 are for essays that fall short of typical contest quality in some way. Scores above 5 are for essays that exceed it. Scores of 8 or above should be rare and reserved for essays that genuinely surprised you. Scores of 2 or below should likewise be rare, reserved for essays with serious problems.<br>You must use the full range. In a large contest, some essays will score 2 or below and some will score 8 or above. Clustering your scores between 5 and 7 is a scoring error.<br>Reply in this exact format and no other:<br>Essay A: [score]<br>Essay B: [score]<br>The actual strength of this prompt is how unconstrained it is, it simply mentions a writing contest. There are no additional scoring criteria significantly biasing the LLM aside from the line mentioning genuine surprise, which probably results in overvaluing novelty. Aside from that though, the interpretation of what constitutes “good writing” in a writing competition is entirely left up to the LLM. This might be seen as a weakness, but if your goal is to mimic human judgment, making it a window into it, I actually think it’s a strength. The LLM is trained on human data, by not specifying you let the actual messy associations—the compressed signals—from the full corpus do the weighting. There are naturally “corrupting” forces in play, models are fed curated data to emphasize certain things, this gets further amplified with various RL methods, and the data contains many things other than content on “good writing”. One could certainly produce a cleaner window into what constitutes good writing, by making a model specifically for that, but it’s still a decent compression of the behemoth that is everything that goes into answering the question of “what is good writing?”<br>So I played around with it, looking at how different models, mostly Gemini Flash/Pro and Claude Sonnet, evaluated some essays. I looked at how toggling between standard and extended thinking impacted the assessments of Gemini, noticed some patterns, then proceeded to start prompting it for justifications; to see how well its rationalizations tracked some of the actual patterns in evaluation that seemed to emerge.<br>The most obvious pattern that emerged was a split between essays as a form of art and essays as analytical work. The smaller the model, and the less thinking resources allocated to it, the more it leaned in the direction of essays as art. The larger models, that are specifically optimized for reasoning, predictably display more of a bias for analytical essays—but interestingly not sufficiently to override the general association between “writing competition” and essay as a form of art. It still heavily evaluated the complexity and expressivity of the prose itself, the language doing lifting beyond the actual argument. We’re not just talking elegant in terms of sentence structure here, but also about emotional resonance and the form itself; the writing being an actual instantiation of the argument itself.<br>This is, undeniably, craftmanship. But it introduces an interesting point of friction. This style of writing is perfect for the task of making relatively simple ideas resonate deeply within the reader, to demonstrate just how complex and profoundly meaningful even simple observations can be. The intellectual task here, in other words, is generating complexity from simplicity. The written word rippling through the reader, hooking its roots in the dark cavities of the conscious mind, resonating beyond the grasps of reason itself. This is art, this is...