20050501

On the limitation of ORM systems

Lately, I've been wondering why so many ORM systems tend me to throw up my hands in disgust and hope I'd just written straight SQL.

There's the Object-Relational impedance mismatch, of course. That alone is a big killer. The fact that relational systems support declarative rather than object-oriented data is a big problem. Another is that relational systems discourage encapsulation; you should know, when designing a database, exactly what you're going to store, because it dictates relationships. In OOP, refactoring objects to move state data around is no big deal. After all, what's one more pointer dereference? Especially when it's as likely that the refactor causes one less pointer dereference.

In relational systems, where the data is is one of the most important decisions. It dictates index structure, how many joins you need to make, how easily you can access the data from different parts of the system, etc.

Whatever ORM system you want to use, it must truly address this concern. Lazy-loading is not an answer. Neither is select n+1. You need something that can anticipate object queries (maybe from hints as to what objects are related when you pull an object graph) and collapse large joins into a normalized object graph.

But that's not the only problem. Another is the "garbage generation" problem.

This is a problem, mind you, that is mostly specific to garbage-collected languages where objects are very heavyweight (i.e., Java, C#; Python objects are heavyweight, but so are primitives, so I don't count Python in there). Even if you collapse large joins, each query will generate a lot of temporary objects. This is fine if you're actually going to do something with them. But what if you're only trying to display them to the user?

For such situations, a simple query with queries on column names may work better; especially since you can use limit/offset to tabulate the data. However, it's extremely annoying to lose the ability to query a typed object; working with string column names is annoying, it makes it hard to refactor, you don't get code completion, etc. Admittedly, all those things are crutches and aren't especially necessary to programming, but it'd be nice if you could have them.

But if you try to generate objects from a large query, you generate a lot of garbage. All that for a simple table display! A table display that is, in most business applications, very likely to be much more frequent than table updates...

The ideal interface, to me, would be to get an iterator that returns a single table row object graph at the time. Unfortunately, very few ORMs offer that.

The last problem is that many ORMs really, really want to provide your database schema. They'll work with custom schemas, but it's always a big problem. It's, to be fair, a really hard problem.

That said, I think Hibernate really handles many of those things well. So does iBATIS SQL Maps, although it does different things well. I'd like to have a combination of both, that is, the power of Hibernate with the ability to write custom queries. Or I'd settle for an iBATIS extension that makes it easier to handle multiple database dialects, and a way to have it iterate over result sets rather than populate lists. Or maybe iBATIS already does this; if anyone knows how, please drop me a line.

UPDATE: iBATIS has queryWithRowHandler(), which probably works the way I'd like it to. It's probably better than an iterator, because people forget to close result sets. Should have checked before starting to write this. All that's really missing is the custom database support. And, according to the todo file, they have no support for BLOBs yet (I know a few applications where this can be a problem). Still, overall, it looks like an interesting solution when you have to recycle an existing schema.

No comments: