The Considerate Data Modeler

olooney1 pts0 comments

The Considerate Data Modeler

The Considerate Data Modeler

by Oran Looney

June 4, 2026

Databases

Database modeling is closer to library science than computer science. No one is<br>impressed by a librarian who gets creative and files cookbooks under &ldquo;G&rdquo; for<br>&ldquo;Gourmand.&rdquo; The best catalog system is where everything is in an obvious place,<br>where everything conforms to expectations.

As Alec Baldwin&rsquo;s database consultant explained to a team of application<br>developers he was brought in for the day to train in Glengarry Glen<br>Ross, it comes down to three simple letters, LCD:

Lowest

Common

Denominator

Always target the lowest common denominator. ANSI SQL isn&rsquo;t the LCD; you want<br>to be targeting the subset of the subset of ANSI SQL which is truly portable.

Why? Because your data is eventually going to end up in another RDBMS, pumped<br>over an ETL pipeline to a data warehouse of some kind by a tool like dbt or<br>ADF. Any unique, advanced feature of a database that you use will cause<br>headaches and make you enemies.

I know this firsthand because at one time or another I&rsquo;ve played each role.<br>I&rsquo;ve been the application developer using PostGIS spatial types in my<br>application and explaining to the data team about spherical coordinate systems,<br>polygons, and R-trees and why we absolutely needed them and couldn&rsquo;t just use:

latitude NUMERIC(8,2),<br>longitude NUMERIC(8,2)

even if that&rsquo;s their coding standard for their data warehouse. I&rsquo;ve been the<br>data scientist parachuted into a hostile operational database and given three<br>days to write a query to flatten it down into usable training data and wincing<br>over every JSON extraction. And I&rsquo;ve been the guy advocating building a single<br>data warehouse from three different databases built by different people at<br>different companies before they were acquired and merged, dealing with the<br>impedance mismatch that resulted. (If I never hear the phrase &ldquo;conformed<br>dimension&rdquo; again in my life it will be too soon.) So I can tell you with great<br>confidence and breadth of experience:

Nobody likes a smart ass. Nobody ever says, &ldquo;wow, you leveraged the shit out<br>of that unique, proprietary feature from your vendor of choice! You must really<br>be on the cutting edge! It was a great business decision to lock us in like<br>that, and we&rsquo;re really enjoying learning the ins-and-outs of every quirk of<br>your very special database! And the ETL team, oh, they&rsquo;re just having great<br>fun writing custom mapping logic! They were getting so bored using the<br>default copy activity all the time!&rdquo; No, they just grit their teeth, smile, and<br>ask how it can be mapped back to third normal form.

Case Study: Enum

MySQL and Postgres have an enum type; Oracle, Databricks Delta, and Snowflake<br>do not.

The poor DBA will probably grit their teeth and model your enum as a CHECK<br>constraint on a string, which is wildly inefficient.

Think also of the poor analysts and data scientists who have to use your<br>database. SQL is their mother tongue and they think nothing of LEFT JOINing<br>to a few reference tables to pick up human-readable columns… at the end of<br>their query. In the meat of the logic, all the CTAS statements and<br>subqueries, they want to use fast, exact joins on primary keys.

Here is some homework for you:

1) I want to deprecate a code in an enum, but it is used in historical data<br>and still meaningful there. If it was a ref table we would add a soft delete<br>flag; how do I do the same thing with enum?

2) I need to associate additional information to each code in my enum. For<br>example, I have a State enum, but I also need to store the two-letter state<br>abbreviations, and distinguish between states and territories. How would I do<br>that with an enum?

3) Write a query that returns one row for every state in the State enum and<br>counts the number of orders. To be clear, it should return a row even for<br>states where the count is zero, like ('Wisconsin', 0). Compare your answer<br>to this query:

SELECT state.name, COUNT(orders.id) as n_orders<br>FROM state<br>LEFT JOIN orders ON state.id = orders.state_id<br>GROUP BY state.name

Which one an analyst would prefer? Note also exactly what happened: in order<br>to make it work, it flattened the enum back to the relational view on the fly.

Keep it Boring

If the answer to literally every question about how to do some slightly<br>advanced thing is &ldquo;use a lookup table,&rdquo; why not start with that in the first<br>place?

You might argue, well, we could start there, but then migrate when these super<br>advanced cases actually come up in practice. YAGNI! KISS!

The problem there is that the refactoring isn&rsquo;t transparent at all. Relational<br>databases have an &ldquo;interface,&rdquo; just like an API - that&rsquo;s why we can swap tables<br>for views, for example. But the surface area of that interface is quite large:<br>every table name, column name, and type.

Going from an enum to a lookup table is a...

data rsquo enum state database ldquo

Related Articles