20050329

Long time no post

Time I waste sometime airing my thoughts to a non-existing audience.

First, promises I made. Fanfic not advancing at all, I'm afraid. Been busy with taxes and lots of other stuff. Xenosaga II is advancing. I'm in the sidequest part, and I dislike doing those on second playthrough (I'll just play a second time with a lot less annoying things to do), so it can be a bit tedious. Though a lot of them are no worse than those in FF X-II, and probably no worse than the stuff in FF VII (I suspect that although I did those ten times, I'd balk at doing it again at my age). Beard is still off, for practical reasons and because I got used to it. And there's no way I can screw up shaving activities in this state.

Reminds me of the best way to keep things clean: own as few things as you can manage. Works OK so far.

Other things... got federal tax return back, so working on taxes instead of fanfic paid off, I guess. In an élan of non-geekiness, I bought a 30$ D-Link router instead of hacking one from leftover computer parts. The amount of power eaten by all those moving parts in regular computers made me balk. D-Link's not really hackable, but they get major good points for giving it a plain HTML interface. I don't know how solid the device is, but it's not being used for anything critical (my laptop isn't connected to it all the time), so even if it dies, it's no big deal.

Been following the Hacker's Diet's exercise program once more, reached rung 20 without too much trouble yesterday. I feel overall more alert. I recommend this program; its main attraction, to me, is that I cannot make excuses to skip an 11-minute routine, and since it's supposed to be daily, I can't think stuff like "I'll do it tomorrow..."


In my current conceited state of mind, I decided to commit to the electronic medium my thoughts on the rules every programmer should know. OK, so they're the rules I know, and they may not apply to every programmer; but I've been toying with the idea of writing some stuff on this someday, and, well, now's as good a time as any.

So, here it is: BGE's Killer Programming Rules... Of Justice.

If it ain't tested, it probably doesn't work
This can be seen as a corollary of the second law of thermodynamics. Code, left alone, "rots". Of course, this sounds silly; the code doesn't change, so how can it rot? Well, everything around it is changing; new OS, new libraries, new runtimes, even the code around it may make that code malfunction. There are only two ways to ensure the code remains OK: active maintenance, which is not always practical (codebases tend to become really big!) or periodic testing. And "testing" does not mean send it to a client and hope it works. That's exactly when it won't (and that part's a corollary of Murphy's law)
Just say no to protected data members
Protected data is t3h 3v1l. There is no telling what derived classes will do to it. Unless you want to be condemned to leave your base classes with the same implementation until the end of time (and that's not really feasible, because of the first principle, above), you should never use protected data. That's right, never. I'm usually not that drastic, but I've never seen protected data being used in a sane manner. And this is coming from the guy who things Multiple Inheritance can be very useful at times. If you think of the implications, protected data makes your base class less reusable and unable to evolve, which kind of goes against the grain of object-oriented programming principles.
Overriding concerns should not be abstracted away
Overriding concerns are things like transactional semantics and resource clean-up. Somebody, somewhere, is going to have to make a decision on the transaction boundaries or on the clean-up time. It very likely should be code that has a wide enough view of the problem to make an intelligent decision, usually some top-level or near-top-level method. I've ranted about this before.
Keep resource ownership sane. Don't transfer it implicitely.
This means several things, namely:
  • DON'T allocate a resource in a method to clean it up in another function at the same level;
  • DON'T have objects allocate resources and expect the caller to clean it up if something bad happens (note that if your object has a "close" function of some sort, it's really the object cleaning up; what I mean here is don't expect the caller to call a method to get said resource and clean it up by hand);
  • DON'T program as if exceptions cannot occur in any block, and DON'T try to catch every exception to force cleanup in catch handlers. This is extremely brittle. See Herb Sutter's Exceptional C++ for more details;
  • DON'T transfer resource ownership if you can help it;
  • DO try to give any limited resource a finite scope in a single method, if possible;
  • DO wrap the resource in an object with a close() function (or a destructor in C++) if the lifetime of the resource cannot be determined by the code allocating it.
Respect the computer and the OS; it's more often right than wrong.
Surprising to many who know me, I do apply this piece of advice to Windows-family OSes as well. In my experience, 98% of the time when I thought the compiler/OS/computer was being stupid, I found out that it was a coding error (not always mine, but always within the programming group). There are some exceptions to this: kernel panics and such should not happen through a programming error, period. But I'm talking of more subtle cases, where you wonder why the heck the code stopped working, why the OS is returning an error there, etc. Don't just throw up your hands in the air at the stupidity of the OS, even if, yes, it is stupid sometimes. But you'll probably find with times that if it's not necessarily entirely your fault, it's at least partly your fault, because what you're doing is a bad idea and that's why the OS is being difficult. Of course, OS programmers are human, and so are computer designers; but they have several hundred thousand programmers who bang on their code everyday, and that doesn't count users doing all sorts of nasty things to their nice piece of software; they are therefore tested very widely. OSes are pretty mature these days, and except for, say, exploits and such, if your code acts weird, it's pretty much certain that it's your fault.
Try to listen to what the machine is trying to tell you.
This is related to the previous point. If you have to do something really cumbersome, or scary, or brittle to get things to work, it's likely that somebody is trying to tell you something. Namely, that your semantics are muddled, that you're using the wrong approach, or that you're trying to do something that's not really allowed. Virus writers want to do the latter, but industrial programmers don't; it always causes huge problems in the long run, when it stops working with compiler XYZ and OS Gamma. Another nice way to test whether it's a boneheaded idea is to try to explain it to somebody. If you feel silly explaining it, or you can't explain it clearly, it's probably because you're a bit confused about what you're supposed to do or how you can achieve it. Try to take a different tack.
Sometimes, it pays to trust your intuition.
I've been known to really dazzle co-workers by looking at some code, pointing at a line saying it's not a good idea to do that, and, of course, it ends up being that line that causes the problem they're trying to fix. Now, if you can't really prove that this is the problem, intuition is worthless. But it's easy enough to throw test data at said method/class and check.
The brilliant lone programmer is a myth.
Well, I can't claim that I'm a perfect authority on the subject. But I've known programmers who started out as loners. Sure, they are more brilliant than some more social programmers. But they always reached their full potential only after becoming more social. Programming is about ideas; if you don't exchange them, you can miss something really obvious, or paint yourself in a nice little corner, or constrain your mind to a nice little box of your own making. It is unfortunate that the myth of the lone, incredibly brilliant scientist is so pervasive in our culture; I guess it's because everyone likes heroes. But, as Isaac Newton allegedly said, "if I have seen so far, it is because I have been standing on the shoulder of giants." (Aside: it's interesting Newton, of all people, said that, as there are rumours that he was not really the most cooperative scientist, nor one who shared his results very often). Now, this doesn't mean that I think one should ignore more introverted programmers; rather, one should try to make them feel comfortable in the team, so they start sharing all those insights. Being an introvert myself, I know it's not easy to bring out one, but it's not completely impossible either.
Keep commented-out code out of your source files.
Yeah, yeah, I know, maybe you'll need it someday. Just like protected member variables right? :-) Seriously, you should use source control. If you use source control, there's no reason to keep this cruft around. If you're really worried that you may need it, apply a source control label to the tree before removing it. The reason? Dead code breaks the flow of the code around it, makes plain-text searches find false positives (I know good IDEs make plain-text searches less frequent, but they still happen sometimes), and by the time you'll need it again, it probably won't be any good anymore; it'll have suffered bit rot. If you really must keep some code commented out in the source file, at least be polite and move it to the end of the file with a comment giving a hint where it came from. This way, people looking at it will figure out immediately that it's not something that was left commented out by mistake. But I really think it should be ditched rather than commented out; the latter idea is a lousy compromise at best. Just use source control. Comments are for explanations, not for executable statements.
Avoid doing things that disgust you, especially if the rest of the code already does.
Your objective, when touching a piece of code, should be to improve it, not worsen it. Adding a feature can be seen as an improvement, but it's not necessarily an improvement for the code quality; it's just a feature. Going through the code and resisting the temptation to copy-paste a segment is an improvement. Nobody will pay for such things, and you're always short of time; I know, I've been there. But when I stopped making excuses for myself, I realized that in many cases, I could find a solution that, if it didn't improve the code much, at least didn't make it worse, implemented the feature properly, took less time to debug because it was easier to understand, and (that's the part that surprised me quite a bit) didn't take more time, overall, to do than the quick-and-dirty solution would have taken. In fact, every time I was forced to take the quick-and-dirty solution, when I redid it properly later, I was always really annoyed somebody had forced me, because the proper way hadn't been any longer and wasn't really riskier. Code will worsen on its own; you should always strive to lessen its entropy, not add to it.
Avoid doing things "just in case."
This is the infamous YAGNI (You Ain't Gonna Need It) principle from Extreme Programming. By all means, think of a design that will accomodate "just in case" (just don't take that too seriously nor waste too much time on it). But don't bother implementing it unless you have an immediate use for it. You'll just be adding to the complexity, with no benefit, and you'll have added code that will rot eventually from disuse and lack of testing.
It's not because there's a class that it's object oriented.
I've seen many cases where people would create bunch of classes, each implementing bunch of interfaces... And who, in the end, created a spaghetti-like mess. OOP does not mean you should forget structured programming; it's still there, within your classes. And it's not because you replaced your globals with singletons that you don't have globals in your project. And it's not because your class implements an interface that it's swappable, especially if all the code ends up using the exact class because of missing stuff in the interface. I could go on, but I think you get the point.
Be really, really careful when designing and changing persistent data formats.
Those are always a bitch to upgrade. Your internal code can change somewhat more easily, so internal code organisation is only important for your next maintenance release; if it's not ideal, there's the possibility to fix it later. But broken persistent formats can be a real pain to fix. Even more so if you're signing the data or encrypting it in some way; you may not be able to re-sign or re-encrypt it after you've converted it. Bummer. Changing formats should be done carefully as well, because there's always trouble with conversions or with people who want to use old versions with new data. And no, it's not because you're using XML that you're safe from any of those problems. You can have problems with XML-based formats, too; you only avoid (some) parsing problems, not semantic problems.
Don't go ape with design patterns.
I sometimes think that software design teachers are encouraging students to look for ways to apply patterns when they teach their course. This is exactly the wrong way to teach patterns. Patterns should be a solution to an existing problem: you have a piece of code that needs to do such-and-such thing, so such-and-such pattern will help. You should never, even be given a pattern and asked to apply it in your program. The reason I think this is what's happening is because I've seen a trend in code written more recently that it's filled with pattern mush, often in places where it doesn't make sense. Extra factories where a simple function would do; composites where the composite's properties are not being used; and so on. This is extremely annoying because it can make the code hard to follow, especially if the person applying the pattern didn't really understand it. Also, I find that a few patterns in the GoF book aren't all that useful; Flyweight is rarely applied correctly, and I always found Visitor to be a bit cumbersome (yes, I know what it's supposed to solve, but I've usually found different ways to solve this particular problem, and they tend to be easier to read for maintenance programmers).
Make objects minimal-state.
That is, objects should maintain the bare minimum of the state they need. Never add a member variable just to avoid passing a parameter between two member functions; it may look like a good idea at first, but it's really not--you're adding extra state to the object for no good reason. Would you add a global to your module just to avoid parameters in structured programming? Well, adding a member for that reason is like adding a semi-global, and it's not a good idea.
Avoid the construct-and-call-setters anti-pattern.
I've seen this very often, and it makes me feel a bit sick every time. You have an object with an empty constructor, which must have its setters called before you call any method. IoC is bringing back this way of doing things (so is struts, to some degree), and it really distresses me. What if you forget to call a setter? What if you call them in the wrong order somehow? What if somebody calls a setter after calling a method, thus violating some invariants? By turning an atomic operation (object construction) into a non-atomic one, you're asking for trouble. Strive to keep your objects in a sane state, always. It should not be possible to put it in an insane state from the public interface, especially not by forgetting to call a method... Note that you can always prevent such problems by adding manual checks, but that's brittle; it's better to make it impossible to put the object in an incorrect state in the first place.

Well, that ran a bit longer than I thought, and it contains some stuff that's a bit lower-level than what I initially wanted to write. But there it is; I hope it was somewhat useful to you, at least. It's by no means complete, but it's a start. I may complete this list from time to time when things come to my attention. I may also strike out some items or modify them, as I've been known to revise my ideas on some things.

20050316

It's gone, Gone, GONE!!!

Had an argument with my beard trimmer yesterday around 00:30, which resulted in me shaving the whole thing off, including mustache. Morale: don't shave at 00:30.

Hadn't seen my face whiskers-free for ten years. It looks weird.

Not sure whether I'll let the beard grow back yet. I look less intimidating without, which may have some benefits overall. But my face looks wider and my chin sort of disappears in my neck visually, which I don't find that aesthetically pleasing. But maybe I'm just not used to it. I'll probably give myself a few weeks, and then decide. Probably depending on how many of my friends laugh at me.

I have to get a picture up of The Beardless One, just for everyone's amusement. Stay tuned...

20050308

A-25 Redux

The mayor of the Anjou borough has a half-page on every "Ville d'Anjou" leaflet we get once every month.

On the whole A-25 issue, he has to say that it will improve the pollution situation by putting pressure off A-40, and that the complaint about lack of collective transports is silly since he asked Transport Québec to put reserved lanes for buses on the bridge, otherwise it was a no go.

He also mentions that the bridge won't really make urban sprawl worse, because people are already sprawled all the way to St-Jérome, and that's way farther than the sphere of influence of the new bridge. As he points out, people don't sprawl because a bridge becomes available--they sprawl because they can't find affordable and calm spaces in the city boudaries. Anjou could provide that, especially if it were less isolated from a transportation point of view. The A-25 bridge won't help that, particularly, but as I mentioned in a previous post, I'm worried that no bridge will also mean no metro, no commuter train and no A-720 extension.

Needless to say, I agree with much of what he says. I don't know about the pollution situation, though; A-13 was supposed to help, but didn't in the end, because it only allowed people to settle way further. A-25 won't be as bad since it won't reach all the way to the north shore, but it could still have unexpected effect. I think a longer A-720 would be a better choice if one wanted to make the "less pollution thanks to the highway" argument.

Still good points. Too bad he's preaching to the choir; he should write for the Plateau or Outremont regional newspaper. Still, I have some hope--the president of the chamber of commerce of eastern Montreal wrote an editorial in a major newspaper about the whole transportation infrastructure situation. Hopefully it's been heard.

Keeping my fingers crossed... At the very least, if that bridge has a cycling lane, I could cycle to Laval, and that would be, as they say, Way Cool.

20050302

Added to developers' whiteboard

String b = "some string";
StringBuffer sb = new StringBuffer();
sb.append("a" + b + "c");
sb.append("a").append(b).append("c");

'Nuff said.

Note to future language designers: make sure "+" is an efficient operation, maybe by having it return some sort of temporary "not concatenated yet" list of strings and resolve the catenation at the very end. This would be possible to pull off in C++ given the powerful type system and pass-by-value semantics (and it would literally rock if move constructor made it into the language!). It's not possible in Java, so we have to tell people to be careful. And for some reason, there are a some people who aren't.

Reminds me of the equivalent problem in Python:

b = 'some string'
sb = 'a'
sb += b
sb += 'c'

sb = ['a']
sb.append(b)
sb.append('c')
sb = ''.join(sb)

But notice that it's more a problem of incremental appending than one of usual concatenation. At least, 'a' + b + 'c' will not produce a temporary "string buffer" object on top of the extra temporary strings. Besides, Python does not have pretensions as being the new systems programming language...

Some would argue that neither does Java, but I disagree; witness the number of client apps being written in that language because that's what graduates are taught, and because people can't handle language where the GC does not come by default (make no mistake, the Boehm GC is available for C++ and works very well).

20050226

Consumption frenzy

Got the books! Yay!

Quickly read Money 201 and managed to be uplifted and depressed at the same time. Looks like I've been doing many things right, but a few wrong. Unfortunately, to do those things right, I'll require more money. Or slack off on pre-paying parts of the mortgage. But I hate paying interest, regardless of what finance books say. I should write my own book, maybe.

Read Exceptional C++ style rather quickly. I feel relatively good about it--it's a good book, and it looks like I'm not too rusty. But, to my dismay, I'm also not that interested in all those dark corners anymore (something I've alluded to in previous posts). Still, I wish the "smaller language screaming to get out" Bjarne Stroustrup mentioned when talking about C++ would come to light. The closest I've found so far is Python, and its library is growing a bit messy.

Speaking of libraries, I've had the opportunity to look in the Java libraries for a few things. I was trying to get some substring appends to be efficient. Unfortunately, StringBuffer does not have an append(String, int, int) operation--only an append(char[], int, int) operation. So I had to call substring() (which is bad, but better from a garbage generation point of view than toCharArray() or whatever it's called). Man, I wish for the nth time I could get access to the internal array. Actually, StringBuffer could, if the String's array were package-private, and in my view, this would be a very sensible design.

Anyhow, I look around for a solution, and it looks like Writer has a substring write operation. So, I think, maybe I can change my code to work through a writer instead. But, being curious, I wonder if they just go through the whole array char-by-char, or maybe chunk stuff in a pooled buffer.

The answer is: none of the above. They call substring().

Also annoying, they don't let you create a StringWriter on an existing StringBuffer, but you have access to the underlying StringBuffer anyhow. This is incredibly non-symmetrical and quite stupid.

Coming from the C++ world, I find myself constantly annoyed by the sheer lack of rigor in the design of the core parts of the Java libraries. Newer parts (such as the Collections system) show more care, but some very fundamental classes (such as java.lang and java.io classes) show sloppiness. So, you get new classes that have better concepts, such as java.nio and Collections. But what can they do about String? It's such a fundamental type, and yet they give no way to easily extend it. You can't even access the internal array. Of course, that's done for safety reasons (so nobody can modify the string in place, since the string is supposed to be immutable), but you can still access it anyhow with reflection and a custom class loader. Worse, as far as I can tell from the StringBuffer code, it's not as efficient as it could be, because it goes through the public interface of the string object.

This may sound like a nit, but efficient string manipulation is extremely important. You want to have a language that lets you do as much as you can with as few temporary buffers as possible. Especially when object allocation and garbage collection are as slow as in many JVMs. I've had many sites run out of memory because they did extra copies of incoming requests and outgoing responses. Granted, they shouldn't do that, but given the API provided, it's the most natural way. I mean, I keep seeing (and writing!) code that does such things as "string("+s+")" even though it's inefficient. If it's inefficient, why is it the most natural way of doing things? At least, in C++, the compiler has a chance to collapse the temporaries! The specifications of the Java language prevent any sort of optimization for this construct. Bad.


Rant aside, I got one other thing--Xenosaga II. So far, I like it, although I couldn't fully believe it when they asked me to switch disks (I was a mere 8 hours in the game...). I hope the second disk is a bit longer. A lot of online reviews have complained how it's tedious and so on, but if they had a bit of a longer memory, they'd recall that Xenogears was pretty much the same. This new game feels more like the original Xenogears, with its long dungeons and somewhat higher level of difficulty. There's very few boss fights I finished in a singly try; I usually get killed at least once. This is a refreshing change from most modern RPGs, like Final Fantasy X-2 (which never felt very difficult--you have so much flexibility with jobs, and changeover is so fast, that it's hard to get stuck in an attrition battle with your enemy).

A couple of things are somewhat suboptimal with the game, though. First, I don't know why they messed with KOS-MOS' voice acting. It was excellent in the first Xenosaga, perfectly neutral and emotionless, except for a few (intentional) occasions. The new acting varies, going between somewhat neutral and somewhat whining. It just doesn't work; KOS-MOS is supposed to kick ass, not whine.

I'm also annoyed by the poor treatement they gave Yuki Kajiura's soundtrack. It's an awesome soundtrack, but in the game, the tracks are often cut before they finish (unforgiveable in a game that uses voice acting! The player does not control the rate of delivery, so efforts should be made to time the script to the music; Xenosaga I pulled it off much better), cover them with too much sound effects (disminishing their impact), and sometimes use them in strange contexts. How dare they make Kajiura's work sound so bland!

Load time in combat isn't that wonderful either. However, it's possible to level up relatively quickly, which puts a lot of tedium out of it. I prefer to have less more-difficult fights than to have to fight weak enemies 100 times to level up (like I've done often in FF VII). I might as well put a rubber band on the "O" button if I'm going to do that. I prefer games to treat me like a thinking being than like an automaton who just presses "O".

On the plus side, the fact that there's almost no segments with no BGM helps me enjoy the game quite a bit. The non-movie soundtrack is nothing really earth-shattering, but some tracks are very, very solid. Unlike many reviewers, I don't think it was a mistake to move from a symphonic soundtrack to a synthetic one. Symphonic soundtracks are popular in SF themes, mostly due to Star Wars and Star Trek. But synthetic soundtracks can work well, too--witness early Babylon 5. It's a matter of balance. Strong melodies should accent strong points, and trance-like tracks should be used for BGM in more repetitive parts.

And you gotta love the new character models. Too bad the hands are done with a thumb, index and block containing the three remaining fingers, all the time. That trick is used in Final Fantasy X and X-2 when there's too many characters in the scene, but Xenosaga II uses it all the time. It's a bit sloppy. But the nice face models and expressions make up for it.

I'll post a full review when I've finished the game. Which, at the rate I'm playing, will probably be next week-end or so.

20050217

Not so humbling after all, to humanity's great sorrow

OK, so I was really tired yesterday. Turns out I had good reasons to abstract file lookup: I needed to do easy unit tests. Granted, I could have mocked a ServletContext, but I think it's cleaner this way. So, much to everyone's chagrin, I'm not really humbled by my experience.

I have, however, discovered the benefit of a good night's sleep, yet again. Something I won't get next Tuesday because I have to go to the dentist. How this is interesting to you I have no idea, but you never know!

In other news, the guys at work are making fun of my work habits and love of mechanical keyboards1. They are mean to me. But then again, I complain all the time, so I suppose I deserve it.


1 The text in the picture is in french, and reads "Directly from the Espace Logient/Benoit Goudreault Emond/Concerto in Mechanical Keyboard/And Bouts of Anger". This is a loose translation, but it's probably accurate enough. "Espace Logient" is a pun on a show room in Montreal known as "l'Espace Go" (which is, by the way, quite a nice show room).

20050216

Humbling Experience

You know how it becomes customary for all software developers to whine about everyone else's code. Too complex. Too convoluted. Ad nauseam.

Well, read your own code.

Today, I was trying to retrofit some local file-reading capabilities in a system that was mostly meant to read stuff off of a network request. Since the network request bits are semi-auto-configured, I wanted the same capabilities for the local file-reading stuff. So, made an interface. This removed some bindings between the system and the application framework. Made it cleaner, more standalone, etc etc.

Then, in the bus, it hit me: bad idea. The system is completely dependent on the application framework anyways, because database tables/file names/etc. are all done according to an implicit convention that cannot be found outside said framework. So, all this wonderful isolation just made things more complex for no reason whatsoever. At worse, I should ask callers to supply the appropriate framework object and use it directly; if I need abstraction later, I can always put it there--later. Given that it's code for more junior programmers, why make it more complex than it needs to be? It's already a bit complex with a singleton spawning a query engine, which spawns a stateful query and a result.

In my defence, I got very little sleep yesterday :-)

In other news, I just ordered bunch of books from Amazon.ca. If you want some software-engineering-related or C++-related books, they have killer rebates right now (50% off selected books). Some of the books on sale are not that great (if I see one more "Enterprise Java with xyz" book, I'm going to hurl), but some others are the Herb Sutter classics, and some rather new books on working with legacy code and configuration management. I got myself the Sutter classic I didn't have, and "Working with Legacy Code", which is something I really need to read RFN. Especially since today, I was writing, effectively, legacy code, and it was my fault.

Also ordered a personal finance book, because it's tax season right now, and my new financial adviser seems to be determined in making me feel inferior. But looking back with a cool head, my gut feeling is that, besides a little bit of neglect (mostly money stuck in an ING account instead of invested in, say, a dividends fund), I've done pretty well. Probably the advisor's tactic was merely to try to sell me a credit line, something which I'm not really open to.

Finally, I realized, to my disappointment, that cool template tricks don't really do it for me anymore. I had the chance to get the Template Metaprogramming book 50% off, and I passed. It looks cool, but I have very dim hopes that I'll get to use it, because:

  1. It tortures compilers, including G++, and
  2. I don't think I'll ever be allowed to use this stuff except in a few toy or hobby projects, because it'll be very hard for any company I ever work for to find people able to understand this stuff.

In many companies, there's a "language guru" and a number of journey(wo)men. I've seen few places where there are many gurus (though Silanis was one of them at a time), and even then, they're hampered by management's fears that nobody will be able to figure out the code. A very sane fear in some ways, but not as much as one would think when you realize that code written by non-gurus tends to be as obscure as guru-code, except it's not because of mere technical proficiency reasons!

I also realized today that what I'd really like would be to rewrite my current company's codebase all in Python. But then, I'm sure nobody would want it, even though Python is easy, because we hired Java programmers, of course. Everyone's so damn specialized.

Well, that was today's rant. I need sleep. Especially since re-reading my previous lines makes me realize that the experience is having less and less of a humbling effect as I get riled up...

20050208

Whither Moore's law's application to everyday computing?

An interesting article: Where have my cycles gone?.

This article asks the question I've asked myself for the longest time.

Nowadays, it's not as bad, because I run Linux on my home computer. I have, therefore, a good idea where my cycles have gone. The computer is pretty snappy at this time (well, it is an AMD Athlon XP 2000+, but there's a mere 256 MB of RAM), despite how much resolution I drive it with, how much anti-aliasing I've added, and how many background services I run.

But whenever I use a Windows XP computer, a Java application, or even some Linux desktops (those with GNOME or KDE come to mind... I use XFce which avoids much of the madness), I really wonder: given Moore's law, how come many tasks appear to be slower than they ever have been?

The author cites some understandable reasons. Here's my take on reasons I do not understand.

  1. Incorrect algorithms: somewhere, programmers have gotten really sloppy. Trying to sort linked lists with a classical quicksort (hint: don't!), running through data structures many times in an effort to work on the data where a single pass would work, doing all sort of really stupid things to help performance (such as adding cache to an O(n2) algorithm that could really be done in O(nlog2n)), and so on and so forth. I'm always suprised (in a bad way) at how many things are done with such sloppy algorithms. If the programmer would just think for a few seconds, it would avoid such problems. Complexity problems like this are usually trivial on small data sets, but what if your small data set is the set of pixels in the GUI blit routines that get used all the time, hmmm?
  2. Wrongheaded ideas: some OSs and applications have patently bad ideas at their core. Like searching stuff frequently that is not indexed. Like putting files all over the disk and expecting the filesystem to have an efficient lookup algorithm tailored to your application. Like opening multiple database transactions where everything should be done in one (this is bad for performance, but also for data integrity). The list goes on. I see this in commercial software all the time. I can think of no fundamental reason why system applications would be in a better state, given that they are marketed nearly the same way as user applications (which, IMHO, is really wrongheaded!). User applications, of course, have all those problems. Yuck.
  3. Speed/memory tradeoffs: for some reason, everybody got drilled into them that you should always save cycles in priority to memory. So people read the whole file in memory and cursor through it with pointer operations. So they put stuff in sparse hash tables where a sorted array would do and give significantly similar performance characteristics. So they introduce lots of caches to gain a small 5% increase in performance. It doesn't work, and here's why. Today's systems are usually starved for IO time or for memory; CPU is rarely running at 100% while you work (take a look at a system monitor or at the Windows Task Manager during the day; you may be very surprised at what your computer is doing). As people try to run several programs at once and get them resident, the problem worsens. Once physical memory is exhausted, the system becomes starved for IO time as the OS needs to swap stuff. If you get in this situation, it will always be much slower than the "slower" algorithm that uses almost no memory. There's also an interesting effect I've seen in some cases: due to CPU cache effects and the importance of keeping a working set small so it's most effective, larger uses of memory may be detrimental for performance in many situations, even if the OS isn't starved for RAM.
  4. Memory/resource leaks: they're everywhere. Garbage collected languages are great, but they shouldn't be taught as the first language. People who learned on garbage collected languages tend to think that the garbage collector takes care of all resource allocation. Hate to break it to you, kids: it only takes care of memory allocation, and on top of that, if you keep references to an object too long (like in caches...that thing again!), it never gets collected. Garbage collection is no excuse to be sloppy about object ownership. See previous point on why resource leaks eat up time.
  5. Freakin' objects everywhere and the lack of stack allocation: this is mostly a problem for language like Java, which couples an asinine lack of stack allocation for simple objects with a really slow allocator and garbage collector. Mix in a really high per-object allocation overhead (all object get a condition variable and a vtable, whether they need it or not!) and you've got a recipe for high temporary memory usage. That would be OK if the Java VM contracted its memory use once in a while. But NOOO! I sometimes think those who wrote the JVM have disregarded 30 years of computing, both in VM design and in garbage collection algorithms. I can think of no other reason that would explain how they could deliver such a ridiculous JVM 1.0. And I still don't understand how many of my Python scripts have more predictable performance than many of my Java programs, despite the JIT and the fact that the Python programming model does not lend itself to much optimization.
  6. Buzzword mania: why is it that we need EJB? Do you do transactions across multiple databases? Do you really need to distribute your objects (remember the first law of distributed computing: don't distribute your objects)? Replace EJB and related questions with buzzword of the month. Many products are built with technologies that don't really fit the problem, increase complexity, memory use, and decrease performance, for no visible gain in capabilities. This is really dumb. See my earlier article on why software projects fail.

Is that it? Probably not. But I think it covers a lot of things. If you're studying in CS or Comp. Eng., I really recommend that you do the following:

  1. Study algorithms. Know the main ones. Know which data structures have what complexity guarantees. This will help you choose the right structure and algorithm for the right job.
  2. Pay attention to the more low-level classes. Assembly looks like pre-history, but the principles of how machines work will always remain useful. C and C++ feel like nailing your toes to the desk in an awkward position, but you'll learn to be careful about resource ownership, and that's a valuable skill regardless of the language.
  3. Remain skeptical of those who claim your designs aren't "elegant" enough. There's always a sweet spot; but in any case, a design with less code will almost always be more elegant from a maintainability, understandability, and from a performance point of view as well. In my experience, university "elegant" means "complicated". It's cool looking, full of design patterns and objects and inheritance. But when 60% of what you typed is syntactic and semantic sugar, you'll end up with a mess sooner or later. And that will make it harder to figure out whether you picked the right algorithm. Don't misunderstand: elegant design does exist. But you'll have to develop your own sense of it. The understanding of elegant design in academic circles varies widely. Be especially wary of teachers who don't code, or instructors who never had to maintain any of their projects. Design sense mostly comes from learning what not to do by having done it and being stuck maintaining it.
  4. Remain humble when you're about to do some task. You may be smart enough to implement the equivalent of a database by hand; but why take the chance? And even if you're smart enough, keep in mind you don't really have the time anyhow. Solved problems may be fun to solve again, but good commercial-grade code is always developed with time pressure. Time you spend on your fun problem will be taken away from time you should spend making the overall system design maintainable.
  5. Try to understand what you're doing. Try to understand the libraries you're using, at least in a general way. Otherwise, it's going to be very hard to pick a given routine (or even a given method overload!) over another.

Well, that's my advice, for what it's worth. I just realized that a lot of it applies to people who already program professionally, and I can think of a few people who don't follow this advice. I know I try to follow it carefully, and it served me well so far. I've been programming commercially for nearly 5 years, and as a hobbyist since I'm 14, so I like to think that I've learned a few things at this point. I'm sure there are other things programmers should be careful of, but those I've noted in this post are supposedly 'obvious' and people still don't do it.

Hence this rant.

Hopefully, if people apply those, Moore's law's application to everyday computing, in the form of faster, more capable computers with the ability to do more for their users, will become reality.

20050129

Bela Lugosi Night

For christmas, I got a pair of DVDs from the "classic horror movies collection". Given by none other than my very cool parents, who know how to humour my strange tastes.

Today, I was feeling sort of lousy, so after the grocery run (well, I went to the mall, but nothing really inspired me, except a miniature plunger for my bathroom sink... how exciting), I curled on the sofa and listened to a couple of movies, both starring Bela Lugosi.

For those not knowing Bela Lugosi, he's an actor who played in many 40's horror movies. But his main claim to fame in modern cinematography is probably his partial participation in one of the worse movies ever made, Plan 9 From Outer Space by director Ed Wood. Lugosi was slated to play the head vampire, but he died early in the making of the movie. Ed Wood, ever the optimist, replaced him with a much taller man who had a different face and had the replacement cover his face with a cape for the whole movie. If you haven't seen Plan 9, you should; it's quite bad, but so bad it's sort of good. Ed Wood by Tim Burton is worth seeing as well; it's a not-quite-documentary on the making of Plan 9

Anyhow, for those reasons, Bela Lugosi is known to me. But the only movie I've seen him in was Plan 9, which doesn't count.

Hence: The Devil Bat. Nutty doctor bitter about some rich man making a fortune on his discoveries (the doctor cashed out instead of getting a stake in the company) invents a way to make bats gigantic using electric current (???). He conditions the bats to hate the smell of a particular aftershave he's developing. He then distributes the aftershave freely to members of the rich man's family and sics the bat on them.

It's a harmless movie. Special effects are especially laughable, but it was made in 1949. Though they could have been more careful about some things, like the bat-flying-out-the-window shot where the window is obviously NOT the window we saw open (the bricks don't have same texture, etc). There's some priceless Lugosi acting (meaning, quite exaggerated) like replying to "goodnights" from his intended victims with "goodbye Mr. insert name here". A rather silly subplot of the hero's assistant trying to woo the rich man's daughter's French maid. And the rather unnatural switch of the female lead's affections from her murdered intended to the hero (man--it's like it's the most natural thing in the world!)

That said, it's too bad such movies don't really have a soundtrack, because some parts run a bit long and dry.

Then, I listened to The Human Monster. Unfortunately, sound quality was somewhat poor, as it's a really old movie. Lugosi exaggerates his expressions even more than in the other movie, and his face just screams "villain!" all over the place. I found that movie a bit less enjoyable, although it's definitely more serious and more care was put into it. I guess it's just too serious--old horror movies should be cheesy. But as far as it goes, it's not that bad a movie, with a pretty predictable plot, but some relatively well-scripted moments. Keep in mind I'm no movie expert. Unfortunately for me, who wanted to look at a Lugosi flick, he gets less face time here. I must admit, though, that I found the scene where he puts the female lead in a straightjacket and drowns a man in front of her quite tense and effective.

Looking at old movies like this is an interesting experience. You realize that modern movies have more technical means, but their plots stink--they're as predictable as those old things, and often play on the same themes (there are exceptions, but there were exceptions then as well). You realize that while attitudes towards women have changed on the surface, they are basically the same in modern movies; they were simply quite a bit more open at the time, and they weren't as bad as many people make it in comparison to those of modern movies. The female lead in The Human Monster is a damsel in distress, but she's also incredibly stubborn about wanting to find out who killed her father. Surprising after seeing The Devil Bat, where the female lead is a human carpet in many ways. As now, scripts vary widely.

Well, I'll probably listen to other movies from those disks in the near future and let people know what I think about them.

20050127

Dumb questions for Java heads

At work, we came across a really annoying problem upgrading some machines from Tomcat 4.1 to Tomcat 5.0

We have some servlet mapped on the servlet-mapping "/". It's supposed to handle an URL known as /en/blah. Now, due to some rather odd customer requirements, we're also required to have a physical file /en/blah/index.jsp, which, even more oddly, contains a redirect to /en/blah/natter (instead of /en/blah).

When the user browses to /en/blah/, Tomcat 4.1 calls the servlet. So does Resin 2.1.x. But Tomcat 5.0 opens up the welcome file /en/blah/index.jsp.

OK, I know it's not such a good idea to have two potential targets for a given URL that yield different results. But, I'm curious whether either behaviour is mandated by the Servlet specs? Or is it simply undefined?

My reading of the 2.4 specs says that the "/" servlet is the default servlet, and welcome files have higher priority. However, this may not have been the case in earlier specs. But looking at the changes in the back of the 2.4 specs, they don't seem to mention anything like that. Or maybe I'm missing something.

Just curious...