What ORMs have taught me: just learn SQL (2014)

downbad_1 pts0 comments

What ORMs have taught me: just learn SQL

Main

Blog<br>()

What ORMs have taught me: just learn SQL

I’ve come to the conclusion that, for me, ORMs are more detriment than<br>benefit. In short, they can be used to nicely augment working with SQL<br>in a program, but they should not replace it.

Some background: For the past 30 months I’ve been working with code<br>that has to interface with Postgres and to some extent, SQLite. Most<br>of that has been with SQLAlchemy (which I quite like) and Hibernate<br>(which I don’t). I’ve worked with existing code and data models, as<br>well as designing my own. Most of the data is event-based storage<br>(“timelines”) with a heavy emphasis on creating reports.

Much has been written about the Object/Relational Impedance<br>Mismatch. It’s hard to appreciate it until you live it. Neward, in his<br>well known essay, lays out many cogent reasons why ORMs turn into<br>quagmires. In my experience, I’ve had to deal directly with a fair<br>number of them: entity identity issues, dual-schema problem, data<br>retrieval mechanism concern, and the partial-object problem. I want to<br>talk briefly about my experiences with these issues and add one of my<br>own.

Partial objects, attribute creep, and foreign keys

Perhaps the most subversive issue I’ve had with ORMs is “attribute<br>creep” or “wide tables”, that is, tables that just keep accruing<br>attributes. As much as I’d like to avoid it, sometimes it becomes<br>necessary (although things like Postgres’ hstore can help). For<br>example, a client may be providing you with lots of data that they<br>want attached to reports based on various business logic. Furthermore,<br>you don’t have much insight into this data; you’re just schlepping it<br>around.

This in and of itself isn’t a terrible thing in a database. It becomes<br>a real pain point with an ORM. Specifically, the problem starts to<br>show up in any query that uses the entity directly to create the<br>query. You may have a Hibernate query like so early on in the project.

query(Foo.class).add(Restriction.eq("x", value))

This may be fine when Foo has five attributes, but becomes a data fire<br>hose when it has a hundred. This is the equivalent of using SELECT<br>*, which is usually saying more than what is intended. ORMs, however,<br>encourage this use and often make writing precise projections as<br>tedious as they are in SQL. (I have optimized such queries by adding<br>the appropriate projection and reduced the run time from minutes to<br>seconds; all the time was spent translating the database row into a<br>Java object.)

Which leads to another bad experience: the pernicious use of foreign<br>keys. In the ORMs I’ve used, links between classes are represented in<br>the data model as foreign keys which, if not configured carefully,<br>result in a large number of joins when retrieving the object. (A<br>recent count of one such table in my work resulted in over 600<br>attributes and 14 joins to access a single object, using the preferred<br>query methodology.)

Attribute creep and excessive use of foreign keys shows me is that in<br>order to use ORMs effectively, you still need to know SQL. My<br>contention with ORMs is that, if you need to know SQL, just use SQL<br>since it prevents the need to know how non-SQL gets translated to SQL.

Data retrieval

Knowing how to write SQL becomes even more important when you attempt<br>to actually write queries using an ORM. This is especially important<br>when efficiency is a concern.

From what I’ve seen, unless you have a really simple data model (that<br>is, you never do joins), you will be bending over backwards to figure<br>out how to get an ORM to generate SQL that runs efficiently. Most of<br>the time, it’s more obfuscated than actual SQL.

And if you elect to keep the query simple, you end up doing a lot of<br>work in the code that could be done in the database faster. Window<br>functions are relatively advanced SQL that is painful to write with<br>ORMs. Not writing them into the query likely means you will be<br>transferring a lot of extra data from the database to your<br>application.

In these cases, I’ve elected to write queries using a templating<br>system and describe the tables using the ORM. I get the convenience of<br>an application level description of the table with direct use of<br>SQL. It’s a lot less trouble than anything else I’ve used so far.

Dual schema dangers

This one seems to be one of those unavoidable redundancies. If you<br>try to get rid of it, you only make more problems or add excessive<br>complexity.

The problem is that you end up having a data definition in two places:<br>the database and your application. If you keep the definition<br>entirely in the application, you end up having to write the SQL Data<br>Definition Language (DDL) with the ORM code, which is the same<br>complication as writing advanced queries in the ORM. If you keep it<br>in the database, you will probably want a representation in the<br>application for convenience and to prevent too much “string typing”.

I much prefer to keep the data definition in the database and read it<br>into the application. It doesn’t...

data orms database query application object

Related Articles