Parse, Don’t Guess. Three days before going open source, I… | by Eitamos Ring | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Parse, Don’t Guess
Eitamos Ring
4 min read·<br>Just now
Listen
Share
Three days before going open source, I deleted my parser’s smartest feature.<br>333 lines. 6 functions. 12 passing tests. All green, all clever, all gone.<br>The feature worked. That was the problem.<br>My PostgreSQL parser (valk-postgres-parser) extracts structure from SQL text: tables, columns, joins, filters.<br>One of its analysis functions did something more ambitious. Give it a query like this:<br>SELECT * FROM orders o<br>JOIN customers c ON o.customer_id = c.idand it would tell you the foreign key relationship: orders.customer_id is a child pointing at parent customers.id.<br>No schema. No catalog access. Just the query text. It felt like magic, and users love magic.<br>Here is how the magic worked:<br>func isForeignKeyColumn(column, targetTable string) bool {<br>// ...<br>if strings.HasSuffix(column, "_id") {<br>prefix := strings.TrimSuffix(column, "_id")<br>if strings.HasPrefix(targetTable, prefix) {<br>return true<br>// Also check if table contains the prefix (handles prefixed<br>// tables like fk_customers). This allows customer_id to match<br>// fk_customers, e2e_customers, etc.<br>// ...Naming conventions. customer_id next to a table called customers? Must be a foreign key. Ship it.<br>The comment that aged badly<br>The real problem was further down, in the fallback. What happens when neither column matches a naming pattern? The function did this:<br>// If we can't determine from FK naming conventions, return a default<br>// relationship based on table order (left table = parent by convention<br>// in SQL JOINs). This is still a heuristic but is consistent with how<br>// JOINs are typically written.<br>return &JoinRelationship{<br>ChildTable: rightTable,<br>ParentTable: leftTable,<br>// ...<br>}Read that again.<br>When the function had no idea, it did not say “I have no idea.” It returned an answer anyway, based on which table appeared first in the query.<br>Every JOIN got an answer. That was the bug. Not a crash, not a parse error. A function that is sometimes right, sometimes wrong, and gives the caller no way to tell which.<br>Wrong is worse than empty<br>Think about what downstream code does with a foreign key relationship. In our case it generated test data: parent rows first, then children referencing them. Flip parent and child and you get foreign key violations, or worse, data that inserts fine but means the wrong thing.<br>An empty result fails loudly. The caller sees nothing came back and handles it. A wrong result fails quietly, three layers up, a week later, in a system that trusted the library.<br>And the heuristic had plenty of ways to be wrong:<br>customer_id happily matched tables named fk_customers and e2e_customers, because of a prefix check added for one test environment<br>Self-referencing tables (employees.manager_id) confused the direction logic<br>Junction tables, where both joined columns are primary keys, got an arbitrary winner<br>And the table-order fallback was a coin flip dressed up as a convention<br>Each bug was fixable. Another pattern, another special case, another test. That is exactly how the function grew to be the smartest 333 lines in the codebase. The line count was not the cost. The confidence was.<br>The replacement: facts or nothing<br>The fix was not a better heuristic. It was a contract change:<br>schema := map[string][]analysis.ColumnSchema{<br>"customers": {{Name: "id", PGType: "bigint", IsPrimaryKey: true}},<br>"orders": {{Name: "id", IsPrimaryKey: true}, {Name: "customer_id"}},<br>}joins, _ := analysis.ExtractJoinRelationshipsWithSchema(query, schema)<br>// orders.customer_id -> customers.id, because the schema says id is a PK.You pass schema metadata, you get relationships derived from actual primary keys. You do not pass schema, you get nothing. If the metadata cannot settle a case (both sides are primary keys), the answer is nil, not a guess.<br>Deleting the no-schema path meant deleting tested, working, green code. The 12 tests I removed were not failing. They were carefully asserting that the guesses came out the way the guesses come out. Tests that lock in behavior nobody should rely on are not coverage. They are a fence around a landmine.<br>The migration cost turned out to be near zero: every production caller already had schema metadata sitting right there. They had just never been asked for it.<br>The part I did not expect<br>Last week a stranger opened an issue on the repo.<br>He was analyzing hundreds of queries and hit a case where a WHERE clause column was not qualified by a table name, so the parser left the table field empty.<br>He did not ask the parser to guess. He asked for ExtractWhereConditionsWithSchema, by name, with the same schema-map shape the JOIN function uses. The design had taught him what to ask for.<br>That is the moment you find out an API decision was right: when users start requesting extensions to the constraint...