What OpenAI's Newest Codex Found In The World's Oldest Codex: Hunting in the Bible
When I was a kid, my friend's dad found a new largest prime number. He had added one small step to a continuous record humanity has kept alive since Euclid. I thought it was awesome.<br>Chris Harrison and Christoph Römhild's beautiful visualization renders 63,779 cross-references as colored arcs above the bar of biblical chapters.There are about half a million known valid verse-to-verse cross-references in the Bible. Scholars over two thousand years of scholars have presumably found most of what's there.<br>Could we find actually new ones with Codex 5.5?
How the pipeline works
The Bible has 31,102 verses, which means 483,651,651 possible verse pairs.<br>It would be unreasonably expensive to naively run an LLM through that. Each token in a shared prompt over half a billion verse pairs would cost about $500 with today's cheapest models ($1/million tokens); 500 tokens per prompt to validate the cross reference would cost $250k.<br>My approach was first to build a machine-readable corpus and "zip" along all the different linguistic forms of scripture to get a single verse in the folloiwng forms: translations (NET, NKJV, NASB95, CSB), original-language witnesses (OSHB Hebrew and Septuagint Greek for the Old Testament; SBLGNT Greek for the New).<br>I also assembled a normalized known-reference baseline of 555,423 verse pairs from OpenBible and TSK, so already-known links could be filtered out.<br>Two retrieval branches ran over this corpus: first a BM25 and TF-IDF pass over to find a few hundred pairs; second, ran directly against the Septuagint and SBLGNT looking for rare shared terms and phrases; it produced 2.17 million candidate pairs and retained the top-k (100k) above a threshold. Known pairs were suppressed from both branches; neighbors to existing references were demoted.<br>The survivors went through the LLM (DeepSeek v4 Pro) to fetch the Greek and score each from −10 to +10 with confidence, supporting terms, caveats, and a reject code. 318 passed. A thinking-mode curation pass sorted those into 13 high-confidence findings (plus larger buckets of "promising," "mainstream-or-known," and "rejects"). Then I read and checked every survivor by hand against the Greek and against the apparatuses.<br>There is a novelty filter at the end that filters out many candidates the LLM happily accepts as "real connections". There are stock idioms ("the LORD is good"). There are block-quotation duplicates of existing apparatus entries. There are cases where a much closer parallel exists elsewhere (Hebrews 4:12 is the obvious match for "two-edged sword," not some obscure verse in Proverbs). Almost everything that survives the LLM stage gets killed in the gate.<br>The nine in this essay are not the nine highest model scores but the ones that surfaced after all these stages.
What makes a cross-reference good
Textual bridge in Greek / LXX Structural parallel scene or argument Theological force changes the reading Strong finding all three hold
A strong cross-reference generally has:<br>some textual intersection (we are using the Septuagint here, because the New Testament writers often quoted Greek scripture)<br>a conceptual parallel , meaning the same kind of scene or argument is being replayed<br>and a theological impact , meaning the parallel actually changes how you read the later passage.
483,651,651 possible verse pairs 31,102 verses, all-vs-all STAGE 1 retrieval: BM25 + LXX/Greek two branches 1,000 LXX candidates reviewed STAGE 2 DeepSeek scores −10 to +10 JSON mode, with caveats 318 accepted STAGE 3 thinking-mode curation sorted into buckets 13 high-confidence STAGE 4 defended by hand Greek + apparatuses 9 discussed here
A few are below.<br>I. Isaiah 37:16 ↔ Luke 4:5<br>O LORD of hosts, the God of Israel, who is enthroned above the cherubim, You are the God, You alone, of all the kingdoms of the earth . You have made heaven and earth.<br>Isaiah 37:16 (NASB95)<br>And he led Him up and showed Him all the kingdoms of the world in a moment of time.<br>Luke 4:5 (NASB95)
ISAIAH 37:16 · HEZEKIAH'S PRAYER Yahweh true ruler βασιλείας τῆς οἰκουμένης "kingdom(s) of the world" already his confessed same rare collocation, true claim → counterfeit offer LUKE 4:5 · WILDERNESS TEMPTATION Satan false broker βασιλείας τῆς οἰκουμένης "kingdom(s) of the world" offered if you bow
The year is 701 BC. Sennacherib commands the dominant army of the age and has just rolled through forty-six fortified Judean cities (his own annals brag the number) on his way to Jerusalem. The Assyrian empire is the largest the world has yet seen. By every visible measure, the lord of the inhabited world is the man at the city gate, not the God in the temple. Hezekiah is in a corner.<br>Sennacherib makes the point explicit. He stands beneath the wall and shouts a propaganda speech in Hebrew so the defenders can hear: Yahweh is just another local deity. This civilization had its gods. That one had its gods. None of them...