Monday, August 28, 2006

Old (Bad) habits die hard

Recently I was reminded painfully of the fact that habits you have taken to once hardly ever get laid off.

I usually consider myself someone who tries to write software after I have thought it through. I do not mean "over-engineering", "over-abstracting" and "over-prepare-for-anything-that-might-ever-come'ing". However I also believe that starting hacking blindly is not a good thing either. And I try to write "nice" code, even though it might be a little more work, as long as it is easier to read or just plain more stable (which is often the same).

Sometimes however, especially under a tight schedule, by force of habit I (and probably any developer out there) tend to do things that upon later review make me feel deeply embarrassed. Just so I did a couple of weeks ago...

I had to write a component that translates data from a legacy system, stored in plain text files, into a relational database, accessed through an object relational mapping layer. Each entry into the SQL database is generated from single text file. The information used as the primary key is encoded into the filenames.

For some reason I cannot understand anymore today, I decided to keep track of the files I had already processed by appending their names to a new plain text protocol file. At the time I probably thought it was quicker to implement than to come up with a new kind of protocol value object to store it into the database for each entry. However in retrospective I doubt that this approach was really faster to implement. (Even worse, this code is deployed into an J2EE application server where you are not even really allowed to do file i/o if you follow the EJB spec.)

Anyway, this translation component includes a polling mechanism that regularly inspects a directory for new, unprocessed files. While during my tests nothing looked wrong, after some time in production the processing (better: the polling) became continously slower, as the list of files already processed was growing longer and longer. So the time to find out if a file was new or still had to be processed gradually increased. To make matters worse, the number of file names available is limited by the design of the legacy program, so they start to roll over after a while, making the poller believe that a file had already been processed and could be skipped, while in fact there was new data in it. I would have had to include the file modification timestamp into the list of already processed items and compared them for every poll cycle as well to make this work again.

So in the end I had to refactor the whole thing and roll out an update of our application that stored the necessary meta-information into the database server to get an acceptable throughput again. To avoid the the file modification date check we additionally we implemented an archive mechanism for the legacy files to keep the number of items in a single folder limited and the names unique.

However all of this shows that people tend to fall back into old habits, once they get under pressure. It can be really hard to notice this in time and make yourself do it "right" from the start.

Here is my private list of things I notice people (including myself, of course) doing time and again despite knowing better:

  • Use files instead of other, more suitable forms of storage
  • Hard code strings ("I will come back to I18N-ize this later...")
  • Include debugging output via System.out.println instead of through the logging mechanism
  • Add checks for null's in places where you should not get any null in the first place instead of trying to find out the real cause.
  • Change the code, but not the documentation - it is always fun to look for a bug or unexpected behaviour that is caused, because a method does not do what you would expect from its comment
  • Change the behaviour, but not the names (of methods, fields, variables...). This should never happen, given the excellent renaming support modern IDEs offer.

Feel free to add to this list anything that comes to your mind :)

No comments: