Zork-bench: An LLM reasoning eval based on text adventure games

evakhoury1 pts0 comments

zork-bench: An LLM reasoning eval based on text adventure games

Low Impact Fruit

SubscribeSign in

zork-bench: An LLM reasoning eval based on text adventure games<br>zork a tale as old as time, or at least as old as computers.

John Aiken<br>Apr 23, 2026

12

Share

Growing up in the 90s I would go to the library and find books on computers. Most of these books were already out of date, containing printed Apple BASIC programs that you could try to copy in and get to work. My favorite one was an F-14 Tomcat simulator. I never got this to work. At the time my eight year old brain didn’t conceive that Applesoft BASIC was really just a different language from QBasic which I had on the 386 machine my father built and let me use. QBasic shipped with Gorillas, a Scorched Earth clone where the tanks are gorillas that throw bananas at each other. I have some plan to code by hand a Scorched Earth clone in Rust. If you haven’t heard back from me about this in a couple months feel free to send me a DM on Linkedin reminding me that you feel left out of the Scorched Earth Rust community that I have yet to build.

If you hit the sun with your banana it will have a frowny face.<br>When reading all these old books and magazines from the public library you would hear about something called Zork. Zork was the original text adventure game created in the 1970s at MIT by Tim Anderson, Marc Blank, Bruce Daniels, and Dave Lebling for the PDP-10 mainframe computer. It is a game that is played almost entirely through text and imagination. It is a wonderful game where you traverse a labyrinthine underground in search of treasure. Many of the puzzles are not obvious, but this game has been part of computer culture since computers became a thing. There were many sequels, some with full motion video and graphics!, and the game itself created an entire genre of games called “text adventures” that influenced a lot of modern games and genres such as multi-user dungeons and rogue-like games.<br>I attended a Recurse Center batch recently. Recurse Center is a programming retreat where technically minded people congregate to build cool stuff. I think I better summed it up on LinkedIn where I said that it is basically Hackers, the movie, but in real life:

You too can find this post on linkedin by clicking this hyperlink.<br>But at Recurse I met Mike Cugini who came in wanting to build a Rust implementation (see everyone likes Rust) for the Z-machine emulator. The Z-machine is the original game engine that was built by Infocom, the video game company formed out of the creation of Zork the game, to build more text adventure games. This created a big interest in Zork across the Recurse batch and immediately we had a cohort of people like Fiona Chow and Kevan Hollbach who were interested in playing Zork regularly, understanding what Zork was as a game, and also building tools around Zork. You can see below Fiona debugging an Apple //e that is part of the Recurse Center’s historical computer library (they also have a NeXt computer omg).

Fiona Chow debugging an Apple //e while trying to download a copy of Zork from the internet over a streaming audio cable to write the game to floppy disk because we didn’t have a copy of Zork. Thank god Kevan Hollbach knew how to do this.<br>The first thing this led to was the creation of zulip-zork, a zulip bot (zulip is a group chat program like slack or discord) that allows anyone within a zulip channel to play zork. Actually, I built this on the airplane traveling back from New York City to The Netherlands where I currently live. Missing my Zork friends at Recurse, I thought to myself, what is a Zork project I could do? Initially, I thought, well lets get LLMs to play Zork. But I decided that was too difficult at the moment, so then I decided what would be better than to continue playing Zork online? So, thank god the plane had free WiFi, I was able to hack together a zulip-zork bot built on top of Bot-Builder and docker-zork.

A screenshot of me playing zulip-zork on the Recurse Center’s zulip server.<br>Essentially, dfrotz, a Z-machine emulator, runs in the docker container serving up Zork (it also supports other Infocom games), this in turn is connected to the bot which allows it to connect to a zulip channel, and you play by typing /game as you can see me doing above. This all runs on a VPS that I pay like 8$/month for that hosts random projects like this for me. This all worked quite well and so me and C Stravidis spent many many hours playing on Zulip together. There is a plan to replace the dfrotz emulator with Mike’s Rust implementation in the docker container.<br>But this question of LLMs playing Zork sat in the back of my mind still irritating me. Then one night, late at the Recurse Hub deep in conversation with Walter Min, Walter looked at me and just said, “zork-bench.” And I was inspired. That night I got home at maybe 3am and then woke up early, and got to work. By that evening I had a preliminary MVP for zork-bench...

zork game zulip games recurse text

Related Articles