20050503

Why Free Software matters

No, I haven't turned into Richard M. Stallman.

I'm not in the habit of linking to trade press articles; most are trite. This one, however, summarizes my whole take on proprietary vs. free. Proprietary licenses give rise to abuses of power more often than not. That is why, as a general principle and if possible, they should be avoided. I don't care how much of the economy rests on them; a lot of the economy used to rest on strip mines, and those are still seen as harmful and as needing very strong regulation.

Now, in the bitkeeper case, it's a bit more complex, because one could argue that there was no good substitute at the time. And I can understand Linus' position of wanting to use the best tool. However, what I find unfortunate is how Linus and Larry reassured everyone that exactly what happened would not happen. Whose fault it is, doesn't matter. The fact remains: a proprietary program gives all the power to the author, and none to the customer. If you're going to use a proprietary program, make damn sure you freeze the license at least for the version you're using, so that conditions don't get changed retroactively. Better yet, if you have a free software alternative that fits the bill with minimum customization, just bite the bullet or hire someone to do it for you. It may cost more up front, but in the long run, it may actually be a safer proposition.

I know it's cliché, but would you buy a car with the hood welded shut?

(Interestingly enough, while there is a move for software to become more transparent, cars are actually becoming more opaque, but that's a whole 'nother can of worms...)

This said, take the time to read RMS's writings, instead of just putting him in the "communist" box. What he says is very similar to this. Some claim his arguments are "moral" rather than "practical" (and some discuss the morality of what he says). I contend that there is no difference; what is morality for, if it's not meant to help figure out praticalities?

20050501

On the limitation of ORM systems

Lately, I've been wondering why so many ORM systems tend me to throw up my hands in disgust and hope I'd just written straight SQL.

There's the Object-Relational impedance mismatch, of course. That alone is a big killer. The fact that relational systems support declarative rather than object-oriented data is a big problem. Another is that relational systems discourage encapsulation; you should know, when designing a database, exactly what you're going to store, because it dictates relationships. In OOP, refactoring objects to move state data around is no big deal. After all, what's one more pointer dereference? Especially when it's as likely that the refactor causes one less pointer dereference.

In relational systems, where the data is is one of the most important decisions. It dictates index structure, how many joins you need to make, how easily you can access the data from different parts of the system, etc.

Whatever ORM system you want to use, it must truly address this concern. Lazy-loading is not an answer. Neither is select n+1. You need something that can anticipate object queries (maybe from hints as to what objects are related when you pull an object graph) and collapse large joins into a normalized object graph.

But that's not the only problem. Another is the "garbage generation" problem.

This is a problem, mind you, that is mostly specific to garbage-collected languages where objects are very heavyweight (i.e., Java, C#; Python objects are heavyweight, but so are primitives, so I don't count Python in there). Even if you collapse large joins, each query will generate a lot of temporary objects. This is fine if you're actually going to do something with them. But what if you're only trying to display them to the user?

For such situations, a simple query with queries on column names may work better; especially since you can use limit/offset to tabulate the data. However, it's extremely annoying to lose the ability to query a typed object; working with string column names is annoying, it makes it hard to refactor, you don't get code completion, etc. Admittedly, all those things are crutches and aren't especially necessary to programming, but it'd be nice if you could have them.

But if you try to generate objects from a large query, you generate a lot of garbage. All that for a simple table display! A table display that is, in most business applications, very likely to be much more frequent than table updates...

The ideal interface, to me, would be to get an iterator that returns a single table row object graph at the time. Unfortunately, very few ORMs offer that.

The last problem is that many ORMs really, really want to provide your database schema. They'll work with custom schemas, but it's always a big problem. It's, to be fair, a really hard problem.

That said, I think Hibernate really handles many of those things well. So does iBATIS SQL Maps, although it does different things well. I'd like to have a combination of both, that is, the power of Hibernate with the ability to write custom queries. Or I'd settle for an iBATIS extension that makes it easier to handle multiple database dialects, and a way to have it iterate over result sets rather than populate lists. Or maybe iBATIS already does this; if anyone knows how, please drop me a line.

UPDATE: iBATIS has queryWithRowHandler(), which probably works the way I'd like it to. It's probably better than an iterator, because people forget to close result sets. Should have checked before starting to write this. All that's really missing is the custom database support. And, according to the todo file, they have no support for BLOBs yet (I know a few applications where this can be a problem). Still, overall, it looks like an interesting solution when you have to recycle an existing schema.