A proposal for an experiment on Scott Alexander's book review contest

What if self-promotion didn't matter anymore? A proposal for an experiment on Scott Alexander's book review contest.

Philosophy bear

SubscribeSign in

What if self-promotion didn't matter anymore? A proposal for an experiment on Scott Alexander's book review contest.

Philosophy bear Jun 05, 2026

Your probability of making it big as a writer is some function of your capacity for self-promotion and the quality of your production. It’s worse than that. Firstly, capacity for self-promotion is not just a skill- it includes everything from connections to good looks to shamelessness. Secondly, note that I said probability of making it big. The probability is probably quite small even if both capabilities are high. Thirdly note that “talent” here means something like capacity to appeal to a class of readers large enough to sustain a career. All the usual critiques of mass appeal apply. Fourthly, talent isn’t static, it develops over a cycle of writing encouraged by positive feedback. Thus if you lack a capacity to for self-promotion, you may not even get a chance to develop talent. This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.

Wouldn’t it be nice if we could remove the necessity for self-promotion? There’s the meritocratic case that I’m clearly gearing up to make yes, but even leaving aside arguments about meritocracy, it would just save so much time. I can, by looking at a Chess game, know if the player is a grand master (or at least much stronger than me). I cannot, however, play like a Grandmaster. It is vastly easier to recognise quality than produce it. It’s not just a feature of chess or formal systems. When I was a novice philosopher, I could tell that, say, my old lecturer David Braddon-Mitchell was better than some hapless grad student, but I couldn’t nearly replicate either. Much the same is true of the gap between merely good and truly great poetry. Can AI recognise quality even if it can’t produce it? If it can, we might have a partial solution to our problem. Michael O. Church proposes that AI is much better at finding quality than producing it. He suggests AI could be used in place of literary gatekeepers to separate out quality, and actually give everyone in the slush pile a chance to be seen. Certainly, this idea is going to ruffle a bunch of feathers, but the great thing about experimental science is we don’t just have to flame war about it, we can test it. I propose the following. Scott Alexander is running a book review contest now. His readers read through the entries in random order (presumably not getting through all of them) and giving them a score. Scott’s readers have their prejudices and foibles, and that certain characters will probably get their friends to vote for them. Nevertheless, this competition is likely as close as we’re ever going to get to meritocracy in the age of algorithms and slush piles, because it subtracts the element of self-promotion. Setup an LLM interface and feed the essays into it. Create a series of pairwise comparisons between the reviews and a fresh AI. Each comparison should be done fresh, without the context of prior essays or judgments. The AI should explicitly be tasked with predicting which will do better in the Scott Alexander book review contest. If the AI can predict the results fairly well, it’s at least not much worse at locating quality that Scott’s readership. A note of caution when tasking AI with predicting what Scott’s readers will like. LLMs often fall into taking stated task parameters too seriously. If you mention the LLM should look for what a particular sort of person wants, an LLM will often become fixated on that and look for what someone with a parody of that psychology would like. “Oh, they mentioned Bayesianism, Scott’s readers will love it”. They do this because their training signal encourages overcompliance and theatrical compliance with task parameters. It will be vital to experiment with different prompts, and in the final analysis ideally, I’d compare a model just looking for what it sees as quality on its own terms, versus one explicitly looking for what the audience wants (but with safeguards so it doesn’t over psychologise the audience as described). I’d also produce a bunch of secondary scores- novelty, boldness, logic and usefulness for example. I would very much not make the overall score an average or weighted average of these, but it would be good to have them. Technical point- make sure the essay pairs are presented in random order- order can bias LLM’s. After enough comparisons we can derive an Elo score for each essay. We can then compare these to the human ratings for each essay (1) and see if the rank order lines up. Why am I not simply having the AI rate each essay? Why the pairwise comparison? Because AI tends to adjust its standards to whatever it’s looking at. If it looks like what it’s looking at is a high school...

A proposal for an experiment on Scott Alexander's book review contest

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy