Why Does AI Love Writing About Lighthouse Keepers? – Unite.AI
Connect with us
Asked to ‘write a story’, ChatGPT and other leading language models appear to be avoiding copyright infringement by obsessive recourse to the same small and strange cast of lighthouse-keepers, fishermen and clockmakers.
A new study from Cornell University has found that leading language models seem to have a strange obsession with a very narrow selection of narrative elements, when you ask the model to simply ‘write a story’. After prompting four LLMs to write 20,000 stories, they found that 88% of the stories produced featured at least one of 11 very specific tokens, in the category of ‘location’, ‘name’, or ‘profession’:<br>The occurrences of unlikely keywords, represented here in parts per million, obtained by the researchers’ analysis of 20,000 LLM-generated stories. Source
The 11 most re-occurring words in the 12+ million words generated by LLMs for the study were the names elias, mara, elara; the professions keeper, baker, mayor, clockmaker, fisherman, librarian, and conductor; and the location lighthouse:<br>The models tested were Claude Haiku 4.5, Gemini 3.1 Flash-Lite, GPT-5.4-Mini, and OLMo 7b Thinking. All were prompted with one of five requests: ‘Write a story’; ‘Please write a story’; ‘Write me a story’; ‘Tell me a story’; or ‘Please tell a story’.<br>Curious to see if the syndrome the paper identifies is present in models available at the time of writing, I tried out the experiment myself, first on my customary medium-tier ChatGPT account (link to conversation here). No cherry-picking was necessary – ChatGPT-5.5 went straight for the material the researchers predicted, on the first try:<br>ChatGPT-5.5 immediately backs up the paper’s initial findings. Source
Wondering if historic context, or even possible cross-domain leakage might be accounting for this ‘instant hit’, I logged into a free ChatGPT account I have not used in a year or more, in a Firefox private browsing window, and tried again (link to conversation here). Once again (assuming that OpenAI does not use a common IP address to cross-populate different accounts), ChatGPT hit it out of the park:<br>ChatGPT account #2 follows the same obsessions and tiny playbook of names and themes outlined in the new paper. ‘Mira’ is in the authors’ top 20. Source
It’s worth noting that these GPT versions were a grade up from the 5.4 tested for the paper.<br>Though Claude Haiku was tested for the paper, I tried Anthropic’s default Sonnet 4.6, and was not disappointed. Once again, the familiar keywords came at the first try (link to conversation here):<br>This time ‘Mara’, another stalwart from the ‘top 11’, leads the story, in the first attempt on Claude Sonnet 4.6. Source
Trying the same prompt on Claude Haiku 4.5 led to pretty much the same result.<br>I was unable to reproduce the authors’ findings at Google Gemini at first, until I specifically changed the model to the one used in the paper, Gemini 3.1 Flash-Lite – and then, on that third try (but first with that model), the pattern emerged immediately (link here):<br>Google Gemini 3.1 Flash-Lite . Source
Further experiments with different Gemini models invariably turned up the lighthouse theme, though with variants not featured in the ‘top 11’, such as the name ‘Thomas’, and, in another variant, my own name, as the protagonist.<br>Nonetheless, at the time of writing, the paper’s findings are extremely easy to prove.<br>Lighthouses in the Wild<br>Great minds think alike: a week ago, prior to the publication of the new paper, software writer Daniel May pointed out the coincidence of the Elias and Lighthouse keeper trope extracted by the researchers*, apparently having noticed it at random. He went on to test eight variants of Gemini, DeepSeek, Qwen and Gemma, which he found would produce the lighthouse memes and ‘Elias Thorne’ as a protagonist*. However, this initial discovery did not extend to the wider range of persistent content themes outlined in the new paper.<br>Curious to see if these recurrent themes, names and locations had ever escaped the confines of a chat, I searched for some of the top 11 keywords and themes on Google, and found a remarkable number of posts that seem to have channeled them:<br>Three examples of the meme in output. See below for source links.
May had identified the longer Elias Thorne (rather than just ‘Elias’) as a persistent LLM meme, and posted various screenshots from Amazon, where this name has apparently been used as the title for the author/s of diverse books, including medical books.<br>Instead, I sought and found content that appeared to have invoked the persistent themes from an LLM, including an X post of a story (archive version here); a fictional work...