Thursday, December 13, 2007

OS X / iPod Touch DHCP timeout problem

It has been some time since I last posted, but I spent the time well, especially on a vacation in Malta. Very nice place, even in late November. Apart from one rainy day we had great weather and temperatures around 20°C. Very comfortable. Another nice aspect was the display of prices in both Maltesian Lira (Lm) and Euros, because as of January 1st, 2008 Malta will change their currency and introduce the Euro as well. Very convenient for travelers :)

Back to business: Of course I did not want to be cut off from the net completely, so apart from my girlfriend's notebook I took my iPod Touch with me. The hotel advertised free WiFi access in the lobby. The first day I went down there and connected to the network without any problems using the iPod. They provided a freely accessible, unencrypted access point.

Two days later I could not get a working link. The iPod detected the wireless network and was able to join it, however I never got an IP address from the DHCP server but ended up with one of those IPV4LL (better known as APIPA) addresses that usually don't do you any good...

Trying and renewing the DHCP lease did not help, so I decided to take the hard way and ask someone at the front desk. I was surprised when the guy I talked to showed some basic knowledge of DHCP and asked me to wait while he checked the router. A moment later he came back and told me he could not do anything, because on Sundays the technical hotline available for the staff was not available. He gave me a complementary hour of Internet access on one of the for-pay computers in the business lounge and asked me to check again on Monday.

So I did - and still had the same problem. So I went back to the front desk, again asking if they had any problems with the router leasing IP addresses. This time the person at the desk was less knowledgeable and told me "We do not have IP addresses. Just set everything to automatic and it will work.". Very helpful. I decided to try again and explained to her in layman's terms that my device being set to automatic would result in it asking their WiFi device what to do and that this did not work. She recommended moving over to the business area and try again, because "the wireless network is installed over there". Imagine the front desk being approximately 15m away from the business area. Just three steps down, no walls etc. and my iPod showing a perfectly fine signal strength for the "Lobby Wireless" network. Anyway, because she had asked nicely I went over there - and guess what - this time got an address when I hit the renew lease button...

While she was happy that she had been right and helped another one of the tech-illiterate guests, I was both happy that I finally could check my mails, but also a little confused. Trying again later with the XP notebook it dawned on me that this was simply a very slow DHCP service and apparently the iPod is not too keen on waiting longer than approximately 5 seconds (rough guess) before falling back to Zeroconf.

I went looking around the net for someone with similar problems. Turns out this does not seem to be a iPhone/iPod issue but a general Mac OS X thing. I have not tried it out (do not have a Mac), but maybe a change to /etc/dhclient.conf could work, specifying a longer timeout value - assuming that OS X uses dhclient...

Monday, November 26, 2007

New GMail rather slow

I noticed the new GMail (the one where an "Older Version" link appears in the top right part of the window) seems to be way slower than that "older version". Whenever I use the mouse to scroll through my inbox I can see the browser slowly redraw the screen from top top bottom.

Is this normal on a 3GHz HT machine? I surely hope this is still pre-release code :)

Tuesday, November 20, 2007

iTunes depending on Internet Explorer cache?

Today I noticed that although in Firefox I could see the latest episode (#150) of The JavaPosse iTunes did not notice it, even when I called the "Update Podcast" menu. It still insisted on episode 149 to be the most recent one.

I reloaded the Feedburner feed with F5 and Strg-F5 to no avail. I suspected our company proxy to somehow misbehave and launched Internet Explorer to check what I would get there. Usually I do not use IE, so I could be quite sure to have it request the feed through the proxy and not serve it from its own cache.

To my surprise I immediately got the right, current feed displayed, including Episode #150.

Once I had seen this in Internet Explorer I tried "Update Podcast" in iTunes again and this time it started to download the episode it had not seen before.

I have not bothered to look through the iTunes documentation, so maybe I am writing about something completely normal here, however for me it is definitely counter-intuitive and took me 10 minutes to figure out by trial and error...

Monday, November 05, 2007

Upgrading Windows XP Home to Professional

Some time ago a friend of mine upgraded his little office network with a Windows 2000 server. Since then he had been using Windows XP Home edition - mainly to just access SMB shares on the server. There were however some pain points with that setup. Profiles were not stored on the server, so the backup to tape drives on the server could not include them. Moreover we had to manually take care of having identical user accounts on all machines to share data among them and grant access to each others printers.

So we decided to upgrade to Windows XP Pro, set up Active Directory and use roaming profiles. However because there were several products installed on the different workstations that take a long to time to configure properly we did not want to start from scratch.

Unfortunately with XP you cannot just change the product key and all of a sudden Home becomes Professional. For one the keys just do not match - there are even different keys for the retail XP Pro version and the corporate volume license installs...

So this is what we did to get to XP Pro without having to install everything anew. (Notice that all programs and settings remain intact. You will only lose all Windows updates since the state on the installation medium. You might also have to install some drivers again after the process, depending on your hardware)

Prepare the machines for upgrading

The basic idea is to do a "repair install" from the freshly bought XP CDs. That will overwrite all system files from the CD, so along the way you get the missing parts that Home does not have. However this puts you back into the stone-age, because the last XP CDs have a copyright date of 2004 and contain no fixes apart from SP2. So when the first machine failed to boot except for safe mode because of the 2007 nVidia graphics drivers not matching the 2004 system we uninstalled the drivers from the other machines upfront.

Perform "repair install" of XP Pro

Boot from the XP Pro CD. You will be asked if you want to use the repair console to fix problems with existing installations. I find this somewhat irritating, because we do want a repair install, but not with the repair console. After accepting the license with F8 the setup program will find the Windows on your hard disk. This time choose to repair that installation. At first I was somewhat shocked when I immediately saw "Deleting file:" messages flying through the status bar without further confirmation. However from here on you just have to wait. It will take about the same time as a fresh Windows install.

During the graphical phase of the installer you will possibly be asked to confirm or deny drivers that have not been digitally signed by Microsoft. These drivers are part of the Windows installation being upgraded/repaired. You should indeed follow the advice given in the confirmation box to NOT use them anyway. In our case this led to a machine blue-screening on boot because of the graphics driver! You are better served to reinstall the drivers when the upgrade is complete and all recent Windows updates have been applied again.

Optional: Prepare Offline Windows Update

As the Windows setup gave me some time to wait I decided to prepare an offline update CD to get the new installations up-to-date quickly. In case you have a real broadband connection this is not really necessary. As we only had a 1MBit DSL line at hand, downloading more than 80 updates plus several drivers on 5 individual machines did not seem like a good idea. So I downloaded "c't Offline Update" - a nifty little tool developed by the German c't magazine. The page is in German, however there are some links to English resources, too. I used the current version 4.1.

Basically what the tool does is get the list of all available upgrades for the products you select (Windows and Office in several versions and language versions) from Microsoft and downloads all updates automatically. They are then packaged into a CD or DVD image you can burn and use to update any machine without a network connection.

Install Windows Updates

As the CD install replaces lots of system files with outdated versions it is imperative to re-install all Windows updates that were lost. You can do so by either using the Offline medium created earlier (see above) or by simple visiting the Windows Update site via Internet Explorer. I noticed a problem after I had installed the CD on all machines: For some reason about 5MB of updates had not been installed offline. They showed up as new updates on windowsupdate.microsoft.com, however I could not install them. Windows Update always seemed to start but almost immediately failed with (I believe, did not write it down) error code 0x800000002.

To get Windows update working again I needed to re-register some DLLs. Fortunately I found a script that does this automatically. I duplicate it here, because I have often seen forum links go dead after some time. However I would like to make clear that I am not to credit for this. The messages in the script are in German. They basically just say: "We are going to re-register some DLLs to repair Windows Update. We do not take any responsibility in case your computer does not survive etc."

@echo off
cls

echo.
echo.
echo  Windows Update Wiederherstellung
echo  ================================
echo.
echo  Diese Hilfe dient zur Wiederherstellung aller notwendigen
echo  Microsoft Windows Update .dll Datei-Registrierungen
echo.
echo  Die Nutzung dieser Hilfe geschieht unter Ausschluss jeglicher
echo  Gewaerleistung und/oder Garantie fuer die Funktion auf allen Computern!
echo.
echo  Das Nachregistrieren der Dateien, beneotigt ein wenig ihrer Geduld!
echo.
echo.
echo                                                       -Team MSHelper.de-
echo.
echo.
echo.
echo.

pause
@echo on

regsvr32 cryptdlg.dll /s
regsvr32 dssenh.dll /s
regsvr32 gpkcsp.dll /s
regsvr32 initpki.dll /s
regsvr32 jscript.dll /s
regsvr32 mssip32.dll /s
regsvr32 msxml.dll /s
regsvr32 msxml2.dll /s
regsvr32 msxml3.dll /s
regsvr32 qmgr.dll /s
regsvr32 qmgrprxy.dll /s
regsvr32 rsaenh.dll /s
regsvr32 sccbase.dll /s
regsvr32 slbcsp.dll /s
regsvr32 softpub.dll /s
regsvr32 vbscript.dll /s
regsvr32 wintrust.dll /s
regsvr32 wuapi.dll /s
regsvr32 wuaueng.dll /s
regsvr32 wuaueng1.dll /s
regsvr32 wuauserv.dll /s
regsvr32 wucltui.dll /s
regsvr32 wups.dll /s
regsvr32 wups2.dll /s
regsvr32 wuweb.dll /s


@echo off

echo.
echo.
echo  Alle notwendigen .dll Dateien wurden erfolgreich nachregistriert!
echo  Bitte testen Sie erneut den Zugriff auf die Windows-Update Seite.
echo.
echo.
echo                                                       -Team MSHelper.de-
pause
exit

Reinstall drivers

The last thing I had to do was install some drivers. For once the ForceWare drivers I had removed before the repair installation had to be restored, and on another machine I re-installed the AVM FritzCard drivers. However scanner drivers and even some special serial ports setups worked flawlessly without any manual intervention.

Monday, October 29, 2007

Leverage Eclipse perspectives for different screen layouts

At work I use a three monitor setup for Java development. The machine is based on Windows XP and Eclipse 3.3. The individual views are spread out across the middle and right screens, the left one is mostly used for browser and email windows. Because I usually use a rather large font for code editing the center screen is entirely dedicated to the code editor.  The right screen contains outline, call hierarchy, source history, a console view, the package explorer and the task/warnings lists. All in all each and every pixel of this 2560x1024 area is used effectively for Java development.

However in the evenings I sometimes like to complete some unfinished task at home. Thanks to a VPN connection I have full access to the CVS server and everything else I need. However I do not want to have to synchronize and download all changes to the laptop before I start. Usually I leave open Eclipse on my office PC all the time. To save time I just connect to it via RDP. The problem with that is however the limited resolution of the laptop screen - it is only 1400x1050. I used to have VNC running and scroll left and right to see all of the office screens, but it is just to slow and there are keyboard issues with the notebook keyboard.

The main disadvantage with this setup was Eclipse - because the RDP resolution is smaller than the office desktop I could only see a small part of the views on the right screen. The first few times I reset the Java and Debug perspectives to their default configuration which docks all views inside the main window. While this worked, it is a real pain to rearrange the views again when I came back to work the next morning.

It took some time until I remembered the "Save perspective" feature. I have now set up my "normal" Java and Debug perspectives - stretched out across two screens - and a "single window" version for each of them. The only thing that needs to be configured now when I switch from desktop to laptop mode is found in the Window - Preferences - Run/Debug - Perspectices menu: When working from home I define the "simple" perspectives as the defaults, at work I choose the multi-head versions.

Monday, October 15, 2007

How to charge the iPod Touch's battery on Linux

I have never been very much into iPods before. Although people were all excited about the click-wheel and the great user experience I tried some of them and never shared the whole fuzz.

However when I first heard about the iPod Touch I was really fascinated. I watched the iPod Keynote from September 2007 and immediately like the iPod Touch. Usually I am not a person who buys products immediately after their release, but I figured they had had some time to get the worst bugs out of the iPhone; and as the iPod is undoubtedly nothing more than the "phone without the phone" I decided to give it a go and ordered it from the Apple Online Store on September 9th. Delivery was scheduled for the week starting Oct 1st and it promptly arrived on Thursday. As I had it delivered to the office address I suddenly noticed a lot of people coming to my desk...

I connected it to the office desktop (XP) where I had been using iTunes for quite some time already to manage the music I usually listen to while coding. Syncing was a breeze and I was really happy.

Because I did not have time to fully charge it at work and of course played around with the WiFi (which is a little heavier on the battery than just listening to music). So I decided to fully charge it over night at home. However that turned out a lot more difficult than I had imagined. Up to that point I had always considered charging a mobile device via the USB port something nobody could do wrong. However when I plugged it in after I had fired up my Ubuntu desktop it just very briefly showed the "charging" icon on the iPod's screen and immediately went back to "working on battery". The Gnome desktop presented me with a camera import wizard... I tried to disconnect and reconnect, even disabled all udev rules because I suspected one of the programs started upon the discovery of the new device to somehow confuse it - all to no avail.

When I booted into the Windows partition I still keep for emergencies, it started to charge immediately... Now this is some kind of platform tie in! I knew before that Apple's support for anything but OS X or Windows is virutally non-existant, but preventing people from charging the battery, just because of the "wrong" operating system?

It took me four more days, pounding Google with all sorts of keywords I could imagine - "ipod charge linux", "ubuntu ipod battery" and so on. Only when I got the idea to search for some combination of terms that contained "iPhone" I got to this: iPhone Now Charges in Linux.

That site contains the source for a kernel module that will send a short sequence of bytes to the iPod or iPhone that will tell it to start charging the battery. I find it very annyoing that stuff like this necessary to just recharge a device... Thanks to Matt Coyler I do not have to boot Windows just to charge my MP3 player. Great job!

Friday, October 05, 2007

Can't start server: Bind on TCP/IP port: No such file or directory

(Also see the follow-up post about some progress)

Today I was (again) facing a log file from a machine that had for some reason not been able to start a temporary MySQL daemon during the night to prepare for a streaming MySQL slave installation. The necessary 2nd daemon had created its new ibdata files, however just after that aborted the startup process with the following message:

Can't start server: Bind on TCP/IP port: No such file or directory
071001 23:09:55 [ERROR] Do you already have another mysqld server running on port: 3310 ?
071001 23:09:55 [ERROR] Aborting
071001 23:09:55 [Note] mysql\bin\mysqld.exe: Shutdown complete

As you can see, the port is a different one from the default MySQL port, so I can be sure there was no conflict with the primary instance. Even more curiously the same process has been working on that and other machines flawlessly for some time. However I remember having seen this message once before, but back then I did not have the time to look into it any further. We just restarted the streaming slave setup process and it went right through.

This time however restarting that process didn't work. It just aborted with the same message again. I especially wondered about the error message: "Bind on TCP/IP port: No such file or directory". What the hell is that supposed to mean? A colleague and I even had a look at the MySQL source code, but the "No such file or directory" message is not to be found in conjunction with the bind error message. I looked on the web but could not find any explanation where that comes from.

However because the process would fail repeatedly I had a look at port 3310:

netstat -an | findstr "3306"
TCP    10.123.234.12:3310    10.123.239.11:1433       ESTABLISHED

You can tell this is a Windows machine. Use findstr is a poor man's grep for Windows. However what's more interesting is that there is a connection on 3310. However this is not due to someone having already bound that port for some sort of a service, but instead it is in use as a local endpoint for a connection to a SQL server instance (port 1433)!

It turned out to be the same problem we had some time ago with JBoss not being able to bind port 1099 on Windows 2003. We had not explicitly reserved port 3310 for private use by a user-controlled application, so Windows was free to assign it as a local endpoint to any process requesting a socket connection. This is probably due to the range of so called ephemeral ports. The name stands for a range of ports that make up the "local" end every time an application connects to a remote service. The default bounds for this range are from port 1024 up to 4999. As 3310 is right in the middle of this range, apparently chances are big enough to get bitten by this more than once in a lifetime. We have now added port 3310 (and 3306 for that matter, which was also missing) to the list of ports excluded from the dynamic assignments. Probably 3306 has never been a problem, because the primary MySQL instance is configured as an autostart service. I guess it starts early enough during boot so that the chances of anything else having claimed that port are very low. However there was most probably some luck involved, too...

This might also be a problem if you need a lot of outgoing connections on a machine. To configure the range for Windows 2003 Server, have a look at Microsoft's documentation on TCP/IP stack implementation details, specifically the section on the "TCP TIME-WAIT delay" in the Core Protocol Stack Components and the TDI Interface chapter (for Windows 2000 see this page). They are both linked from Knowledge Base article #908472

We will monitor the situation and see if we get any more troubles with this.

I still however do not get the "No such file or directory" part of the message...

To round things up here's another link on the topic of ephemeral ports and their meaning for network security: www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf.

Tuesday, September 25, 2007

The value of a CVS commit database

Due to some discrepancies between the Eclipse 3.2.2 compiler and Sun's javac we needed to upgrade our development environments to Eclipse 3.3. Otherwise we could not tell for sure that something that looked ok in Eclipse would compile in the daily build process.

Even though I had used 3.3 privately for some time now, there is always some tension when switching a whole bunch of developers in a really large project.

At first everything seemed fine, apart from some minor issues that could be easily worked around.

However I ran into a nasty little bug regarding the CVS integration when I had to switch a workspace that had been checked out on HEAD to another branch. That branch had been created to keep the Helpers- and Utilities-Refactoring I wrote about before separate from the HEAD until it is complete.

Within Eclipse you can just select "Switch to another branch or version" from the context menu on any project (or file for that matter) and select the branch you would like. I had done this with 3.2.2 several times without any problems. So not suspecting any problems I switched my local working copy to the branch and started checking in modified files after some more refactoring.

However shortly after I had done that a colleague complained that there were compile errors on HEAD now. Turns out that Eclipse 3.3.0 has a well hidden bug in that feature: The version switch involves some requests to the CVS server. This works alright for files that have no changes in the working copy, however for files with outgoing changes the server response is not handled correctly and those remain on HEAD. Because I had made some changes before the switch already, part of my changes went to the branch, the rest of them to HEAD, leaving both in an incomplete state. For details on the bug see Eclipse Bug #192392.

The files I had checked in spanned several projects and where of course committed in little chunks with different comments. At that point I was very glad that I had my ViewVC commit database to query for anything I had done over the last few hours that had gone to HEAD. Without it it would probably have taken me hours to just find out which files I had checked in on the wrong branch. While it was still some tedious work to actually restore everything back to the state I wanted, just identifying the files affected was done with a rather simple SQL query in no time.

I can only advise anyone working on a project with more that just a few files to set up a database that stores all commits by type (addition, change, deletion), file, branch, date and author. This wasn't just my life insurance in this case, but in combination with a full text index on the commit comment field it is also a very good base for change logs - using simple SQL they can be generated very flexibly and within just the blink of an eye.

The version of ViewVC we use is rather old and contains some custom changes that probably would not be required with a more recent release. So I recommend taking a look at the current version the project offers.

Thursday, September 20, 2007

Helpless Helpers and Useless Utilities

With any code base of a reasonable size there are lots of issues you would normally take care of immediately when you come across them, however often there is just no time for it. In the end you will have to live with the knowledge that you had to leave some ugly hacks in it just to meet the deadline.

Because we have recently finished development of the next major release of our software product, there is some time now to do code cleanup and get some more automated tests on the way. Because one of the bugs that almost prevented us from holding our schedule was a particularly nasty - but well hidden one - there has (again) been some discussion about coding guidelines and quality.

People always seem to agree that you need to talk to each other, think in larger terms than just your specific problem at the time and strive for code readability and re-usability. For starters I personally would sometimes even do away with just a little more of the former one...

The bug I mentioned above was in a method to determine whether a given date is a holiday or a business day. Because this is not a trivial problem to solve generically for several countries, we rely on a database table that is pre-populated with the holidays for the next few years in a given country. To tell whether or not a date is a holiday is just a mere database lookup. Nothing can really go wrong here, can it? The method in question was implemented like this:

public static boolean isHoliday(Date aDate) {
   if (getRecordFromHolidayTableFor(aDate) == null) {
      return true;
   }
   return false;
}

Of course there was a corresponding method for business days, too:

public static boolean isWorkday(Date aDate) {
 Calendar cal = Calendar.getInstance();
 cal.setTime(aDate);
 int day = cal.get(Calendar.DAY_OF_WEEK);
 if (day == Calendar.SATURDAY || day == Calendar.SUNDAY) {
     return false;
 }
 return isHoliday(aDate);
}

Just in case you do not know: Saturday is usually considered a business day in Germany, for our customer it definitely is. So this part of the latter method alone is at the very least somewhat dubious.

By now you will probably have stopped looking at the code, trying to find out what it is you miss to make this work correctly. You are right to do so, because it is of course just plain wrong. The isHoliday method returns just the opposite to what you would expect. And the isWorkday code - apart from the wrong condition for the DAY_OF_WEEK even leverages this and works in fact (almost) correctly.

One might ask how this could go unnoticed till the last moment of development and I can tell you, the answer will not be pleasant: It didn't!

There were about 20 calls to the holiday method. 19 of these always inverted the result! The author of the 20th call had relied on this seemingly simple method to just work and only found out about the bug when he accidentally connected to a database with an empty holidays table and still got true out of it.

Both of the above methods were part of a class called "DateUtil". Apparently someone had had the idea of providing common date functions with this class, to spare everyone from writing their own methods to manipulate and use dates.

While this, of course, is generally a good idea it would have been even better to provide some thorough testing code for it, because even in the seemingly simplest of methods you have a sufficiently high chance of making a mistake while implementing it. One could point a finger to the author of that class, because he introduced the bug in the first place. But I would rather have 19 more fingers to point at each and every one who noticed the problem and just worked around it in his or her private piece of code, telling no one about the problem and letting everyone find out for him- or herself again. I will not even try and estimate how many hours were spent uselessly because of this...

But it did not end there. Alerted to this I decided to have a look at the project source tree. In the course of development I had already seen classes like StringUtil, CalendarUtil and others. So I just ran a find workspace/ -regex ".*(Util|Utility|Utilities|Helper)\.java" to look for classes matching that pattern. I ended up with over 160 helpers and utilities. Many of them were obviously duplicating functionality just by looking at their names. Several PersistenceUtilitys, StringHelpers and the like. Looking through them they often contained very similar methods, re-implementing the same task over and over again.

Because practically none of these were covered with (J)Unit tests but had only been empirically tested on the precise context of their calls, you could not even just go ahead and replace calls to a similar sounding method in a more central place - and vice versa, making almost all of them virtually useless from a more general perspective.

So the whole idea of creating utility and helper classes had in fact been reduced to absurdity. Right now we are in the process of looking through all of them and consolidating to a handful of really helpful helpers and useful utilities - each of them accompanied by a set of unit tests to prevent another "a holiday-is-a-business-day" problem in the future. We will see what other gems bubble up and see the light of day in the process...

Tuesday, September 11, 2007

Hard disks are too big

I remember when I got my first hard drive. It was part of the first PC machine I ever had, a 386DX40. Before that I owned an Atari 800XL and an Amiga 500. While there were hard drives for the Amiga I couldn't afford one at the time. The 386 came with a 52MB Quantum. Back then I had to decide whether I wanted to have Windows 3.0 on the disk or Wing Commander 2.

While I'm not really too eager to go back to those times I recently noticed (again) that the vast amounts of disk space we have today are not always a good thing in a manner of speaking.

Over the years I owned several computers, with each new one came a new, bigger hard drive. As I usually sold the old machine - and the smaller hard drive with it - I just copied (almost) everything into some sort of "old_disk" folder on the new machine. When the next upgrade came, I had (of course) not reviewed everything in that folder and so, just to be on the safe side, just put it into the next "layer of archive" directory.

Currently I am using a Pentium IV machine with three hard drives. An older IDE drive with 80GB, one SATA 250GB and a third one with 400GB, also SATA. When I first installed Ubuntu in the summer of 2006 (Dapper Drake) I decided to give it the whole 80GB drive. To make space I just moved all the Windows data on that disk to the 250GB one. (Guess what... "80gb_disk" the folder was called).

Now, a little more than a year later, Ubuntu has replaced Windows as my primary and everyday operating system. I still keep Windows for the occasional game of Far Cry, but apart from that I do not really need it anymore. The last piece of hardware I cannot get to work with Linux is the CanoScan LiDE 70 scanner, but that will be replaced soon. Because the 80GB was filling up steadily I decided to do it properly this time and dedicate the whole 400GB disk to the /home partition.

Of course I needed to get the data off there, but as it was more than the already crowded 250GB disk could take, I finally had no choice but to wade through the data and see what I could throw away.

It was just unbelievable. I found things I had completely forgotten about; letters, pictures, stuff from school and university I never thought I had kept in the first place. It was like going through those boxes that somehow seem to magically appear in the attic and the basement every time you move to a new place. Some of the things were really great to find - like meeting someone you haven't seen for some time.

However most of the space was just taken up by old Windows installations, "temporary" download folders and rotting user profiles. Do you know that feeling when doing the occasional re-install of Windows to better keep the old installation somewhere, in case you need that one special settings file or have to take a look at the old registry again? Or let that ISO image lie around and burn it to a CD or DVD later and then never do it anyway? I found at least 4 Outlook PST files - apart from the last one I really used, of course

It took me more than 6 hours to go through it all and this time really deleting stuff. Most of it was complete trash and absolutely useless. I still kept all of my documents, music files and other "handmade" stuff. In total I removed around 290GB of old cruft...

Imagine yourself back to the 386 time, would you have believed anyone, had they told you that you were going to have roughly six thousand(!) 52MB disks, filled up with nothing but useless junk? :-)

Sunday, September 02, 2007

ERROR 1033 (HY000) on InnoDB configuration error

One of the key features MySQL often uses to advertise for their database is the modular architecture that allows them to have different storage engines below the same SQL layer. In practice the application and/or database designer can choose from a variety of low level data storage implementations that each offer different characteristics and may be chosen on a per table basis. (Even though I personally believe most designs will use one type of table for all tables of a particular schema).

The idea behind this is that for example people who do not need transactions should not have to worry about them at all – maybe there is a performance impact involved which they cannot afford to take. Moreover some specialized types of index or column might not be available on all engines. Basically the concept is very interesting and can be really useful for developers.

However there is a weakness that in my opinion needs some severe work to be done: The interface between the common SQL layer and the storage engines seems to be somewhat limited with respect to what storage engines can do to inform the level above about status and error conditions.

For example there is no (elegant) way to find out about the details of a constraint violation problem when using the InnoDB storage engine. While you will get an error message that some statement failed due to a violation of referential integrity constraints, you have to use the generic “show engine innodb status” command to get some details. However this will not only tell you about the error you care about at that particular moment, but will also give you lots of information on lots of other stuff inside InnoDB. This is however necessary, because you do not have any other means of find out out about those – e. g. when you are investigating a performance problem.

From what I learned from a consultant some time ago this is due to the limit interface specification through with MySQL itself (the upper layer) and the storage engines talk to each other. Because this protocol has to be somewhat generic messages from the bottom to the upper levels have to be somehow wrapped into some form of special result set which you then have to parse and understand on your own. Moreover if memory serves me right, there is a limitation on how much data can be transferred at a time (could be a limitation of the client as well). Because of this you will not even always get a full InnoDB status output, because it will be truncated if it gets bigger than 64k.

While this is not particularly nice it is a limitation I believe is acceptable, especially in the case of InnoDB, because the innodb_monitor feature allows you to get the full output into a log file.

What I consider much worse however, is that error messages from the underlying storage engine are often mapped to more generic MySQL messages in an unpredictable way.

Time and again I have run into problems that present you with an error message that has nothing to do with the actual problem. For example in a replication scenario you might get an error message 1236, claiming that there is something wrong with replication position counters, but it turns out that this message can also mean a full disk on the master. If you know enough about the implementation details you might see the way this message comes to pass, but if you are troubleshooting a production system this is not what you want to do. Moreover I tend to forget these peculiarities if they are just seldom enough. Just recently I found a machine spitting out strange errors on each and every query I issued (InnoDB):

mysql> select count(*) from child;

ERROR 1033 (HY000): Incorrect information in file: './test/child.frm'
mysql> 

Now, just having this message, what would you expect is wrong here? File corruption? Broken SCSI controller? Faulty memory?

When you use Google to search for “ERROR 1033 (HY000)” you will get all sorts of results, most of them suggesting to try myisam_check (not very useful for InnoDB) or the REPAIR statements. Often you will find someone who claims that restoring from the latest backup might be the only option.

While all of this is certainly true to solve the problems that this error message what originally intended to report, in my case they were all just leading in the wrong direction.

Turns out that something was wrong with the my.cnf configuration file. This was on a machine set up using the “Streaming Slave Deployment” mechanism I described in an earlier article. For some reason the script that usually adapts the config file automatically after downloading the data files had not been started, so a default my.cnf was still in place.

Unfortunately the InnoDB data file sizes did not match those downloaded from the master server. This is what my.cnf contained:

innodb_data_file_path=ibdata1:512M;ibdata2:100M:autoextend

This is a listing of the data directory:

-rw-rw---- 1 mysql mysql 555745280 2007-08-31 20:26 ibdata1
-rw-rw---- 1 mysql mysql 104857600 2007-08-31 20:26 ibdata2
-rw-rw---- 1 mysql mysql   5242880 2007-08-31 20:26 ib_logfile0
-rw-rw---- 1 mysql mysql   5242880 2007-08-31 20:24 ib_logfile1
drwxr-xr-x 2 mysql mysql      4096 2007-08-31 20:27 mysql
drwx------ 2 mysql mysql      4096 2007-05-12 12:24 test 

It may not be obvious, but 555745280 bytes for ibdata1 is not 512MB, but 530MB. Nevertheless the MySQL server started even with this wrong configuration. However every statement would fail with the message above.

Shutting down the server and correcting the line above to

innodb_data_file_path=ibdata1:530M;ibdata2:100M:autoextend

restored everything to a working state:

mysql> select count(*) from child;
+----------+
| count(*) |
+----------+
|        9 |
+----------+
1 row in set (0.03 sec) 

While I really like MySQL and find it generally rather easy to configure and get very high performance, this is definitely a major weakness I would like to see improved in future versions. For the time being I will try to make a post about anything that strikes me odd enough for someone to be interested in, too :)

Friday, August 31, 2007

Comfortable XPath accessor

Three weeks ago I blogged about a "Groovy way" to create XML documents in Java. This article now is about a convenient way to access XML without requiring more than a single class (downloadable here) on top of the JRE's own XML library functions.

In fact I wrote this class before I even looked for an easy way to create XML myself, because at the time I just had to parse some XML and extract the values in an easy to use fashion.

I believe it is easiest to show an example of how to use the class. Consider the following very simple Java class:

import java.math.BigDecimal;

/**
 * Simple value object for a contact.
*/
public class Contact {
public String firstname;
public String lastname;
public boolean withAccount;
public Integer numberOfCalls;
public BigDecimal amountDue;
}

Usually the fields would not be public of course, but for the sake of the example, just imagine the getters and setters being there ;-)

Now consider getting an XML string back from an external system that you cannot change:

<representative>
   <address>
         <street>Somewhere</street>
         <city>Over The Rainbow</city>
         <details>null</details>
   </address>
   <personal>
       <name>Someone</name>
       <firstname>Special</firstname>
       <age>55</age>
       <acct>true</acct>
   </personal>
   <supportHistory>
       <phone>
          <total>14</total>
          <x11>5</x11>
          <x12>9</x12>
       </phone>
       <incident>
          <id>1</id>
          <cost>1.50</cost> 
       </incident>
       <incident>
          <id>2</id>
          <cost>2.50</cost>
       </incident>
       <incident>
          <id>3</id>
          <cost>3.50</cost>
       </incident>
       <incident>
          <id>4</id>
          <cost>4.50</cost>
       </incident>
   </supportHistory>
   <current> 
       <due>44.12</due>
       <total>100.88</total>
   </current> 
</representative>

What's the easiest way to get this into the “Contact” class above, without using persistence frameworks, mapping tools and the like? What about this:

XPathAccessor acc = new XPathAccessor(someXML);
Contact contact = new Contact();

contact.firstname = acc.xp("representative", "personal", "firstname");
contact.lastname = acc.xp("representative", "personal", "name");
contact.withAccount = acc.xpBool("representative", "personal", "acct");
contact.numberOfCalls = acc.xpInt("representative", "supportHistory", "phone", "total");
contact.amountDue = acc.xpBD("representative", "current", "due");

I find this rather straightforward. Of course the Strings should be declared as constants somewhere to prevent typos.

The example is really simple, because we just extract some plain values from the XML. However, you have the full power of XPath at your disposal. This means you could do something like this:

BigDecimal getTotalIncidentCost(String someXML) throws SAXException,      
IOException, ParserConfigurationException, XPathExpressionException {
XPathAccessor acc = new XPathAccessor(someXML);
BigDecimal tResult;
Node historyNode = acc.getNode("representative", "supportHistory");
tResult = acc.xpBD(historyNode, "sum(incident/cost)");
return tResult;
}

The XPathAccessor instance could (and should) be reused, of course, depending on how often you need to access the document. As it caches XPath expressions that have already been used, it saves some cycles to re-use it for all accesses to a particular document.

Getting the “historyNode” first is of course not really necessary in this simple case, however sometimes it can come in handy to keep the expressions readable when you need to access deeper parts of the XML.

As with the XElement class, feel free to use this one, too. I would be happy to get feedback and/or improvements.

Thursday, August 30, 2007

How the Vulcan greeting came about

Ever gave a thought on how the Vulcan greeting (you know, the V shaped hand gesture) came about in Star Trek? Turns out this is really a fun and interesting story, not just some script writer conceiving it out of thin air. Have a look at this very funny video in which Leonard Nimoy, Mr. Spock, explains how it entered the show. 

Wednesday, August 22, 2007

Windows Date Created Timestamp strangeness

I have been using Windows since version 3.0 and thought  I had seen most of its subtleties. However today I found a new "gem" I had not encountered before.

We use two Perl scripts to do some FTP transfers regularly, scheduled by the Windows "Scheduled Tasks" to run several times a day. The scripts both use a common function to append to their respective daily log file. In case a file is older than 5 days, judging by its age in days based on the "Date Created" timestamp, it will be deleted and a new one created under the same name. This is intended to not have the files grow indefinitely.

While one of the jobs worked just fine, appending to its log files and rotating after 5 days, the other seemed to overwrite its log file on each run. Strangely enough  - as said before - they both used the same log function, just with different file names.

After some poking around in the scripts source we decided to make a more low level test to see whether this had anything to do with either Perl or the "Scheduled Tasks". We created a file from the command line by just redirecting an echo command. We then deleted the file and recreated it again by the same means. This is what we got (German Windows XP, but behavior observed was equivalent to Windows Server 2003 and Windows Professional 2000):

C:\quantum>time /t
22:46

C:\quantum>echo "quantum" >> mechanics.txt

C:\quantum>dir /T:C mechanics.txt
 Volume in Laufwerk C: hat keine Bezeichnung.
 Volumeseriennummer: 0CA5-E6F2

 Verzeichnis von C:\quantum

22.08.2007  22:46                 8 mechanics.txt
               1 Datei(en)              8 Bytes
               0 Verzeichnis(se),  8.096.706.560 Bytes frei

C:\quantum>time /t
22:47

C:\quantum>del mechanics.txt

C:\quantum>echo "temporal" >> mechanics.txt

C:\quantum>dir /T:C mechanics.txt
 Volume in Laufwerk C: hat keine Bezeichnung.
 Volumeseriennummer: 0CA5-E6F2

 Verzeichnis von C:\quantum

22.08.2007  22:46                 8 bar.txt
               1 Datei(en)              8 Bytes
               0 Verzeichnis(se),  8.096.706.560 Bytes frei

C:\quantum>

Notice the experiment starts at 22:46. A file is created and its creation timestamp is automatically set to 22:46 (dir /T:C shows the creation time). Then, at 22:47 the file is deleted and recreated with different content. Again the file creation date is 22:46 while the "Date Modified" timestamp is 22:47.

Looking at the timestamps of the log file mentioned above we noticed that the "Date Created" attribute was back several months. Apparently every time the file had been deleted and recreated the original timestamp had been restored as well.

Looking around the net for some explanation for this (we suspected a filesystem bug) after a fair amount of searching I finally came across "The Old New Thing: The apocryphal history of file system tunnelling" and through that Microsoft Knowledge Base article 172190, titled "Windows NT Contains File System Tunneling Capabilities".

Basically what happens is that Windows caches the timestamp of the deleted file for some time. In case a new file appears under the same name, it will get the cached value. The default maximum time period between deleting and re-creating the file is 15 seconds. However we could also see it happen with more than 8 minutes! I do not yet understand how this comes about...

Once I knew the keyword was "tunnelling" it was also easy to find this 2001 post from a discussion on Bugtraq. I completely agree with Ken Brown who wrote that post (especially concerning the 2nd paragraph):

"Tunnelling" is a long way from any keywords that I'd associate with file systems - and a search for "tunnelling and ntfs" turns up a great many references to VPNs and bits of networking. It now turns out that it isn't really a property of the file system at all, which obviously makes the search even harder. 
[...]
Obviously not serious, but I bet that someone, somewhere, has an application that depends on file creation dates and wonders why it goes wrong every now and again. That is a *mild* potential security problem, if only because it could cause confusion. Documentation bugs can be security problems. Unexpected or unwanted behaviour from a machine is always a potential security problem.
[...]
The accumulation of seemed-like-a-good-idea-at-the-time backwards-compatible gotchas in the Windows file systems [...] all combine to introduce uncertainty and unpredictability, which leaves gaps for security errors.

Because one of our scripts took more than 15 seconds (the default time-to-live for cache entries) between deleting and recreating the file as it was scheduled during a high load time, it literally took advantage from the lack of resources.

The other script, running during far less busy periods had always been fast enough to trigger the tunnelling "feature" and get the cached creation time over and over again.  Of course the workaround for the scripts is simple: Consider the "Date Modified" attribute.

A more drastic approach would be to change the registry settings controlling this behavior. See the knowledge base article for details. 

But seriously - I will never understand (and even less appreciate) - why Microsoft tends to always choose backwards compatibility over reliability. Maybe this was a sensible feature right when long filenames on FAT volumes came about, but seriously, is it necessary to carry this all the way to the 2003 server? I bet it is still present in Vista as well....

Friday, August 17, 2007

Gnome Nautilus SSH fails when hostkey changed

Today I tried to upload some files to my server via Nautilus. Months ago I created an SSH connection to my home folder via the "Places - Connect to Server" option on the main menu. It allows you transparently use SSH via the graphical user interface.

However for some reason trying to double click the desktop connection just did not do anything at all. Selecting the entry in an open file manager windows led to a confusing error message:

Nautilus cannot display "ssh://shipdown.de".
Please select another viewer and try again.

Another connection, set up via WebDAV worked without problems. It occurred to me that this might have something to do with the recent crash of the server which made it necessary to set it up freshly. This of course included the generation of a new SSH host key. Trying to connect via the command line confirmed this:

ds@yavin:~$ ssh ds@shipdown.de
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Please contact your system administrator.
Add correct host key in /home/ds/.ssh/known_hosts to get rid of this message.
Offending key in /home/ds/.ssh/known_hosts:1
RSA host key for shipdown.de has changed and you have requested strict checking.
Host key verification failed.

Turns out that Nautilus fails for the same reason. Once I had edited the .ssh/known_hosts file and replaced the old key with the current one, the connection worked again.

There is a Ubuntu Bug (#41738) entry as well as a Gnome upstream report (#322501) describing this. However as this is rated as a low priority bug and has been known since Gnome 2.14 I do not expect it to be fixed very soon, so I thought it was worth noting here.

Wednesday, August 15, 2007

GMail advanced search operators

Several rules in my GMail account apply various labels to mails and mostly archive them right away. This is handy for newsletters, mailing lists and the like. However over time - if you do not read them all - there are unread mails, scattered around several labels which are too old to show on the first 50 items.

Up to now I sometimes went through the following pages, using the "Select unread" and "Mark as read" functions. Today I stumbled across a page that mentions a search term to show only unread mails. I had suspected something like this must exist, but there is no GUI feature I know of to use it.

So in addition to any other search criteria you might have (e. g. "label:newsletter") you can just add "is:unread" and only get those mails that you haven't looked at before. Other things I just tried out and found to work:

  • is:starred - applies to messages with stars.
  • is:unread - applies to unread messages
  • is:read - applies to read messages
Only after that I found this page on the GMail Help Center. Sometimes life can be so easy, if you just know where to look... :-)

 

Sunday, August 05, 2007

Building XML the Groovy way in Java

When working with XML - i. e. creating XML documents - the Java DOM API is a little cumbersome. You have to ask all sorts of factories for instances of themselves, those instances for documents, elements and so forth. It is usually a lot code to write, even if all you want is a little XML fragment with only a few elements, e. g. to be sent over the network to some server. One way to make your life easier is to resign to StringBuilder/StringBuffer and building the XML "by hand". However this is error-prone and not always easy to read.

Recently I had to implement a service that responded with XML over the net, building the document by collecting data from several sources and combining them together. The first version I wrote used the DOM API and once finished was hard to read even for me. I would have liked to use Groovy's MarkupBuilder for this, however company policy does not allow this (yet).

So I looked around for a similarly easy to read solution in plain Java. I found this entry in The Ancient Art Of Programming. It discusses the problem of the verbosity of the Java solution and compares it to the clarity of .NET's rather new Linq feature. In one of the comments Erik-Jan Blanksma posted an implementation that mimics the syntax using variable argument lists. I took the classes and built a second version of my code with it. Now the Java code looks almost like the Groovy MarkupBuilder code, except that I do not need any additional libraries. Take a look at this example:

private String getResult() {
XElement tResult = new XElement("result",
new XElement("error", "0"),
new XElement("receipt",
new XElement("head",
new XElement("id", adapter.getId()),
new XElement("ctry", adapter.getCountryCode()),
new XElement("region", adapter.getRegionCode()),
new XElement("primary", adapter.isPrimary())
)
);
return tResult.toString();
}

This is really great to use. I added some tweaks to the code to make it more robust and suitable for my needs. Apart from adding some error checking (no empty element names, null values etc.) I also created an interface for classes that convert a java.util.Collection into a corresponding array of XElements. This was necessary, because the original version did not allow dynamic additions to the XML. This interface is intended to allow worker classes that somewhat mimic the Groovy syntax of just intermixing loops etc. A simple implementation could look like this:

public class SimpleWorker implements XCollectionWorker<String> {
public XElement[] work(Collection<String> aCollection) {
List<XElement> tList = new ArrayList<XElement>();
for (String tString : aCollection) {
tList.add(new XElement("string", tString));
}
return tList.toArray(EMPTY_XXELEMENT_ARRAY);
}
}

Using it to add a list of Strings to an XML document can be done like this:

private String getResult(Collection<String> aStringCollection) {
XElement tResult = new XElement("result",
new XElement("error", "0"),
new XElement("receipt",
new XElement("head",
new XElement("id", adapter.getId()),
new XElement("ctry", adapter.getCountryCode()),
new XElement("region", adapter.getRegionCode()),
new XElement("primary", adapter.isPrimary())
),
new XElement("someStrings", aStringCollection, new SimpleWorker())
)
);
return tResult.toString();
}

I have packed the necessary files into this ZIP archive. Feel free to use them, if they suit your needs. If you make any improvements, I would be happy to hear from you.

Wednesday, July 25, 2007

MySQL Index Analyzer updated

After several months I have again spent a little work in the MySQL Index Analyzer I first published back in August of 2006.

I added a feature that will find duplicate columns inside an index, caused by the internal appending of the InnoDb primary key columns to each secondary index.

To get the code and read more about the new feature, including an example, go to the MySQL Index Analyzer Blog.

Monday, July 09, 2007

Back online

Two weeks after we moved to our new appartement we got DSL back. I have to admit I was sceptical at first, because ususally I tend to not trust phone and internet providers too far when it comes to make changes to an existing setup. However apparently everything worked well. The ISDN line was switched over on the day we moved, keeping the same number without a glitch. The only thing that got lost on the way was the outgoing caller id. However after two calls to the hotline (hey, this is still well within acceptable limits, isn't it?) they assured me it was to be reactivated by tomorrow.

The DSL provider promised to get the line back up two weeks after the ISDN line had been established and they kept their promise. Today I plugged the Fritz!Box in and everything worked immediately. It is a real relief. I did not remember how slow ISDN was, even with two channels bundled to 128kbit/s. My girlfriend is relieved, too, not having to use my computer all the time, because she did not like to install the ISDN drivers.

From what I see the line should be good enough for 6MBit/s downstream, 3 at least. I will try and upgrade from the current 1MBit/s tomorrow. After all it's just 3€/month and includes a VOIP flat besides the internet flat. Let's hope this works smoothly as well :)

Wednesday, June 20, 2007

Thoughts on (Un)checked Exceptions

Brian Goetz shares his thoughts on the (recurring) idea of removing checked exceptions from the Java language in a future version, because they tend to make code too ugly to read or are just considered a "failed experiment".

I personally count myself to the faction of people in favor of checked exceptions, because I believe they can certainly make people think more about error handling than they usually tend to. It may be harsh to say that, but in my opinion people - developers maybe even more - somethimes need to be forced to do unpleasant things. Of course checked exceptions can be a pain in the neck when you need to create e. g. Runnables, however in those cases you can still resort to throw an unchecked one . While this is a somewhat ugly technique you nevertheless are forced to think about error handling.

I suggest reading his proposals on what could be done to ease the symptoms. I mostly agree with the 2nd group of people and like the idea of having closures to better encapsulate all sorts of resource management hassles. Groovy has some nice examples of this, e. g. reading from a file and implicitly having all the boilerplate taken care of. To get some background on closures in Java, have look at the closures related posts on Neal Gafter's blog.

Concerning (checked) exceptions in general you might want to have a look (or better an ear) to an interview with Anders Hejlsberg - the principal mind behind Delphi and more recently .NET's C# language. You can find it in a series called "Thinking in Code" on Bruce Eckel's blog.

Sunday, June 17, 2007

Download Youtube Videos (and Google video)

Maybe this is obvious, but it was new to me. :)

Today I skimmed through the jroller.com homepage and found an article on Valerio Schiavoni's blog about revision control systems. He embedded a link to a video on YouTube showing Linus Torvalds giving a speech at Google about git and why he thinks it's better than other revision/software control systems.

When I started to watch I realized I would not have time to watch it to the end. So I wanted to save it locally and watch it later, without having to be online again. So I googled for "download youtube videos" and the first three matches all pointed to one sort or other of special download tools.

Turns out you do not really need anything like that. By chance I found the completely downloaded video file in the /tmp directory named something like FlashRItWZO. I tried some video on Google video, too, works fine as well. The files all start with "Flash" and are then followed by some random part.

You just have to wait until the whole video is buffered. You can then copy the file from the /tmp directory to some more permanent location. On Linux mplayer happily plays them.

One caveat needs to be remembered however: If you want the whole video you must not seek through it while it is still streaming. Otherwise you will end up with just the parts you saw/that were buffered. On the other hand you can of course use this to only store a part of a longer video, in case you do not need the whole thing.

Wednesday, June 13, 2007

MySQL server_errno=1236 when disk full

Yesterday I was asked for help concerning a replication problem with one of our test systems. My colleague had already installed a fresh dump he had created with mysqldump ... --master-data. The dump looked ok and contained a master configuration statement:

...
CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000127',MASTER_LOG_POS=4462223;
...

The slave was provided with the correct user, password and host name for this master. Nevertheless issuing a START SLAVE did not work. In the slave .err file we found this:

 
...
050327 20:54:51 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position (server_errno=1236)
050327 20:54:51 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
...

After some fiddling around and searching the net for solutions we had a look at the server as well. SHOW MASTER STATUS revealed seemingly correct values and SHOW MASTER LOGS also listed several binlog files, up to the one mentioned above (127).

Only by looking at the Windows console of the server we found out what was wrong. Upon logging in the operating system popped up a nice balloon message about "Low Disk Space On Drive E:". In fact low was slightly understated: The drive was full up to the last byte.

Interestingly MySQL happily moved the current position counter in the output of SHOW MASTER STATUS forward, even though the file could not be written. The manual chapter on How MySQL Handles a Full Disk claims that the server should wait until there is enough space available again, and I remember having seen some "Waiting for someone to free space" before. I will have to take a look into this again and file a support request if I cannot find an explanation.

Saturday, June 09, 2007

Visualize hard disk temperature with gnuplot

When the kernel issue I blogged about hit me I first suspected a(nother) defective hard disk. I opened the case to find my 250GB Samsung Spinpoint SP2504C so hot that I could barely touch it without burning my fingers. Mentally preparing to reinstall Ubuntu on a disk yet to be bought I remembered that this disk was not needed to boot at all, because it just contains data files. (Anyone else losing track of what is stored where with the disk sizes these days?)

So I decided to just let everything cool down and then start again - in the meantime I had noticed that the latest kernel update had caused the effect and that starting with 2.6.20-15-386 would work. Once I was back to my desktop I installed some packages for hardware monitoring:

ds@yavin:~$ sudo apt-get install hddtemp sensors-applet lmsensors

Once they (and their dependencies) had all been installed I first started toying with hddtemp:

ds@yavin:~$ sudo hddtemp /dev/sd[abc]
/dev/sda: IC35L080AVVA07-0                        : 46°C
/dev/sdb: SAMSUNG SP2504C                         : 34°C
/dev/sdc: SAMSUNG HD403LJ                         : 32°C
ds@yavin:~$ 

This little script collects the data from disks /dev/sd[abc] and stores them in a space separated format into a logfile:

#/bin/bash
logfile=/var/log/hddtemp.log
timestamp=$( date +%T );
temps=$( hddtemp /dev/sd[abc] | awk -F: ' { print $3 } ' | cut -c2-3 | tr "\n" " " );
echo "${timestamp} ${temps}" >> ${logfile}

It is invoked via cron once per minute. This was added to /etc/crontab

*/1 *    * * *   root   /usr/local/bin/sht.sh 

This data can now be plotted graphically using gnuplot:

#!/usr/bin/gnuplot -persist
#
#    
#       G N U P L O T
#       Version 4.0 patchlevel 0
#       last modified Thu Apr 15 14:44:22 CEST 2004
#       System: Linux 2.6.20-15-386
#    
#       Copyright (C) 1986 - 1993, 1998, 2004
#       Thomas Williams, Colin Kelley and many others
#    
#       This is gnuplot version 4.0.  Please refer to the documentation
#       for command syntax changes.  The old syntax will be accepted
#       throughout the 4.0 series, but all save files use the new syntax.
#    
#       Type `help` to access the on-line reference manual.
#       The gnuplot FAQ is available from
#               http://www.gnuplot.info/faq/
#    
#       Send comments and requests for help to
#               <gnuplot-info@lists.sourceforge.net>
#       Send bugs, suggestions and mods to
#               <gnuplot-bugs@lists.sourceforge.net>
#    
# set terminal x11 
# set output
unset clip points
set clip one
unset clip two
set bar 1.000000
set border 31 lt -1 lw 1.000
set xdata time
set ydata
set zdata
set x2data
set y2data
set timefmt x "%H:%M:%S"
set timefmt y "%H:%M:%S"
set timefmt z "%H:%M:%S"
set timefmt x2 "%H:%M:%S"
set timefmt y2 "%H:%M:%S"
set timefmt cb "%H:%M:%S"
set boxwidth
set style fill empty border
set dummy x,y
set format x "% g"
set format y "% g"
set format x2 "% g"
set format y2 "% g"
set format z "% g"
set format cb "% g"
set angles radians
unset grid
set key title ""
set key right top Right noreverse enhanced box linetype -2 linewidth 1.000 samplen 4 spacing 1 width 0 height 0 autotitles
unset label
unset arrow
unset style line
unset style arrow
unset logscale
set offsets 0, 0, 0, 0
set pointsize 1
set encoding default
unset polar

unset parametric
unset decimalsign
set view 60, 30, 1, 1
set samples 100, 100
set isosamples 10, 10
set surface
unset contour
set clabel '%8.3g'
set mapping cartesian
set datafile separator whitespace
unset hidden3d
set cntrparam order 4
set cntrparam linear
set cntrparam levels auto 5
set cntrparam points 5
set size ratio 0 1,1
set origin 0,0
set style data points
set style function lines
set xzeroaxis lt -2 lw 1.000
set yzeroaxis lt -2 lw 1.000
set x2zeroaxis lt -2 lw 1.000
set y2zeroaxis lt -2 lw 1.000
set tics in
set ticslevel 0.5
set ticscale 1 0.5
set mxtics default
set mytics default
set mztics default
set mx2tics default
set my2tics default
set mcbtics default
set xtics border mirror norotate autofreq 
set ytics border mirror norotate autofreq 
set ztics border nomirror norotate autofreq 
set nox2tics
set noy2tics
set cbtics border mirror norotate autofreq 
set title "Plattentemperaturverlauf" 0.000000,0.000000  font ""
set timestamp "" bottom norotate 0.000000,0.000000  ""
set rrange [ * : * ] noreverse nowriteback  # (currently [0.00000:10.0000] )
set trange [ * : * ] noreverse nowriteback  # (currently ["31/12/99,23:59":"01/01/00,00:00"] )
set urange [ * : * ] noreverse nowriteback  # (currently ["31/12/99,23:59":"01/01/00,00:00"] )
set vrange [ * : * ] noreverse nowriteback  # (currently ["31/12/99,23:59":"01/01/00,00:00"] )
set xlabel "Time" 0.000000,0.000000  font ""
set x2label "" 0.000000,0.000000  font ""
set xrange [ "00:00:00" : "23:55:00" ] noreverse nowriteback
set x2range [ * : * ] noreverse nowriteback  # (currently [-10.0000:10.0000] )
set ylabel "Temperature (C) " 0.000000,0.000000  font ""
set y2label "" 0.000000,0.000000  font ""
set yrange [ 20.0000 : 60.0000 ] noreverse nowriteback
set y2range [ * : * ] noreverse nowriteback  # (currently [-10.0000:10.0000] )
set zlabel "" 0.000000,0.000000  font ""
set zrange [ * : * ] noreverse nowriteback  # (currently [-10.0000:10.0000] )
set cblabel "" 0.000000,0.000000  font ""
set cbrange [ * : * ] noreverse nowriteback  # (currently [-10.0000:10.0000] )
set zero 1e-08
set lmargin -1
set bmargin -1
set rmargin -1
set tmargin -1
set locale "C"
set pm3d scansautomatic flush begin noftriangles nohidden3d implicit corners2color mean
unset pm3d
set palette positive nops_allcF maxcolors 0 gamma 1.5 color model RGB 
set palette rgbformulae 7, 5, 15
set colorbox default
set colorbox vertical origin 0.9,0.2 size 0.1,0.63 bdefault
set loadpath 
set fontpath 
set fit noerrorvariables
plot '/var/log/hddtemp.log' using 1:2 with line title "hda", '/var/log/hddtemp.log' using 1:3 with line title "hdb", '/var/log/hddtemp.log' using 1:4 with line title "hdc"
#    EOF

Running the gnuplot script above displays a graph with the temperature curves for my three hard disks. You will need to modify both scripts to match your disk configuration. Furthermore logrotate should be configured to start a new file every day, otherwise the graph will look somewhat strange.

sample graph

You need not care about the

sh: kpsexpand: not found
sh: kpsexpand: not found
sh: kpsexpand: not found
sh: kpsexpand: not found

messages when you run the script. This seems to be a packaging problem with gnuplot. kpsexpand seems to belong to the tetex-package, which however is not really needed for it to work.

Additionally I have installed the Gnome Sensors applet that now continuously displays data from several sensors, including CPU temperature and fan speeds.

BTW: The temperaturs above were taken after I installed a fan in the front panel of my PC's case. Before that all disks were 10-15°C warmer, bringing them close to their specified limits.

Monday, June 04, 2007

Beware of Ubuntu Kernel 2.6.20-16!

Anyone using Ubuntu Feisty Fawn should not install the kernel update to 2.6.20-16! It comes as a security update, but it includes some nasty trickery with the ATA/SATA drivers. After I installed it I could not boot anymore, because for some reason the drive names had changed!

All the SATA drives that had /dev/sdXX names before were now called /dev/hdXX. Even though my /etc/fstab only contains "by-uuid" entries, all but the PATA boot drive could not be accessed.

I already suspected a hardware defect, because booting without the splash screen showed "lost interrupt" errors. I only got skeptical when I read it was on drive /dev/sdg which I just plain do not have!

Unplugging both SATA drives at least let me boot into the system again. From what I read in Launchpad Bug #116996 booting with the previous version 2.6.20-15 should work in the meantime.

I have to admit I am pretty pissed off by Ubuntu at the moment. I really love the work they do, however sometimes they seem to be a little too fast with updates (remember the troublesome X11 update some months ago) that are critical to the system. Maybe I would not be so angry about e. g. the sound card not working properly, but come on people, changing hard disk drivers silently as part of an unrelated security update?!

The bug is several days old already and Debian already has it fixed. However for some reason there does not seem to be much movement from Ubuntu...

Friday, May 25, 2007

MySQL Optimizer Bug 28554

When we tried to clean up a rather large (4.500.000 rows, 20GB) InnoDB table some days ago, we were astonished by the time MySQL took to complete the task. We had already LIMITed the transaction size, but every single chunk still took minutes to execute. The table itself contains some number columns, including a numeric primary key, and a blob. The delete condition was mainly based on the primary key (being smaller than a predefined value) and status field. After some mails between the support crew and us an optimizer bug was identified: MySQL Bug #28554.

The problem is that in some cases the optimizer makes a bad choice concerning which index to use. It will pick a secondary index that can be used to cover a WHERE indexed_column=<constant> condition, even though it will cause way more data to be scanned than necessary. The primary key for the second condition pk_column<=<constant> would be a far better choice, visiting only a much smaller number of rows.

The bug has been verified in 4.1.12, 4.1.22 and 5.0.42-BK. Read the bug report for details and some example data.

Thursday, May 24, 2007

USB scanner not working on VMwared XP

I recently changed my health-insurance company. Today I got a letter asking me to return my insurance card or notify them in case I destroyed it myself. So I thought I might just fax them an image of the destroyed card. Because Fritz!Fax works beautifully in my virtual XP box, I wanted to get my Canon LiDE Scan 70 installed in the VM (no Linux support...). I downloaded the most recent driver from Canon's website and installed it.

Unfortunately I only get blue screens as soon as I "connect" the scanner to the virtual machine using the VMware server console. Being somewhat disappointed I will now have to boot up the RealThing(TM) Windows partition again...

Tuesday, May 22, 2007

Linux EventQueue Problem

As a part of an ongoing project I had to write a little on-screen keyboard. It is a very simple thing: Just a panel consisting of a series of JButtons that - when clicked - insert KeyEvents into the EventQueue to enable any Swing text component to be used. I do not have the original code at hand right now, but I reconstructed it at home:

import java.awt.*;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.awt.event.KeyEvent;

import javax.swing.*;

public class TestTool extends JFrame {

    private JTextArea textArea;

    private JButton btnA;

    private JButton btnB;

    private KeyEvent createEvent(Component aSrc, char aChar) {
        return new KeyEvent(aSrc, KeyEvent.KEY_TYPED, System
                .currentTimeMillis(), 0, 0, aChar);
    }

    ActionListener btnLstn = new ActionListener() {
        public void actionPerformed(ActionEvent e) {
            EventQueue q = Toolkit.getDefaultToolkit().getSystemEventQueue();
            if (e.getSource() == btnA) {
                q.postEvent(createEvent(btnA, 'A'));
            } else if (e.getSource() == btnB) {
                q.postEvent(createEvent(btnB, 'B'));
            }
        }
    };

    public static void main(String[] args) {
        new TestTool().go();
    }

    protected void go() {
        Container tContent = getContentPane();
        tContent.setLayout(new BorderLayout());
        textArea = new JTextArea();
        tContent.add(textArea);
        btnA = new JButton("A");
        btnB = new JButton("B");
        btnA.setFocusable(false);
        btnB.setFocusable(false);
        btnA.addActionListener(btnLstn);
        btnB.addActionListener(btnLstn);
        JPanel pnl = new JPanel(new FlowLayout());
        pnl.add(btnA);
        pnl.add(btnB);
        tContent.add(pnl, BorderLayout.SOUTH);
        setSize(500, 380);
        setDefaultCloseOperation(EXIT_ON_CLOSE);
        setVisible(true);
    }
}

Starting this will bring up a little window with two buttons "A" and "B". Clicking either of these will insert the appropriate letter into the text area. I tested this on Ubuntu Feisty Fawn with no problems at all. However on RH9 this (well, not exactly this, but very similar) code does not work reliably (that code works on Windows however, so it cannot be completely broken). On Red Hat 9 the button clicks just do not produce letters in the text area. It can be made to work, when I insert a Thread.sleep(1500) before putting the new KeyEvent on the queue. This will make the button go "down", remain there for 1,5s and then flip out again, usually (but not always) producing a letter in the text area. At the moment I have no idea what causes this strange behavior. On Thursday I will compare the code above with the original and try it out and Red Hat...

Any ideas are of course very welcome :)

Saturday, May 12, 2007

MySQL: Add primary key to table with duplicates

Maybe this is obvious, but I post it anyway, just to remind myself should I need it again.

Recently I had to change a table that I had not completely thought through when I first created it. The structure was so simple, I did not think I could do anything wrong with it:

CREATE TABLE `parent` (
  `par_id` bigint(20) NOT NULL,
  `somevalue` varchar(20) default NULL,
  PRIMARY KEY  (`par_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `child` (
  `x_parid` bigint(20) default NULL,
  `value` bigint(10) default NULL,
  KEY `fk_parid` (`x_parid`),
  CONSTRAINT `child_ibfk_1` FOREIGN KEY (`x_parid`) REFERENCES `parent` (`par_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There is a 1:0..* relationship between parent and child. Some sample data:

mysql> select * from parent;
+--------+--------------+
| par_id | somevalue    |
+--------+--------------+
|      1 | Parent No. 1 | 
|      2 | Parent No. 2 | 
|      3 | Parent No. 3 | 
+--------+--------------+
3 rows in set (0.00 sec)

mysql> select * from child;
+---------+-------+
| x_parid | value |
+---------+-------+
|       1 |    10 | 
|       1 |    12 | 
|       1 |    15 | 
|       2 |    25 | 
|       2 |    26 | 
|       2 |    26 | 
|       3 |    31 | 
|       3 |    31 | 
|       3 |    31 | 
+---------+-------+
9 rows in set (0.00 sec)

Clearly it is possible (and intended) that there can be several children with references to the same parent, even if the have equal values. Inserting data using my application was no problem. However when I tried to read back a parent including its children, I got an error from the persistence framework (proprietary 3rd party) that there was a probable inconsistency in the data, because there were several identical children. Turns out, the thing needs a primary key on every table to manage it in memory.

So how do you add a primary key to the children table when it already has data? First attempt:

mysql> ALTER TABLE child
    ->   ADD COLUMN child_id BIGINT(20) NOT NULL FIRST,
    ->   ADD PRIMARY KEY(child_id);
ERROR 1062 (23000): Duplicate entry '0' for key 1

Of course, adding a column fills it with either NULL or with the data types default value, in this case 0. Declaring it as a primary key then ought to fail, because all 9 rows got the default value.

So we need a some initial distinct values for the column (in the application the persistence layer will provide generated keys once there is primary key column). For this the auto increment feature comes in handy:

mysql> ALTER TABLE child
    ->   ADD COLUMN child_id BIGINT(20) AUTO_INCREMENT NOT NULL FIRST,
    ->   ADD PRIMARY KEY(child_id);
Query OK, 9 rows affected (0.06 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> select * from child;
+----------+---------+-------+
| child_id | x_parid | value |
+----------+---------+-------+
|        1 |       1 |    10 | 
|        2 |       1 |    12 | 
|        3 |       1 |    15 | 
|        4 |       2 |    25 | 
|        5 |       2 |    26 | 
|        6 |       2 |    26 | 
|        7 |       3 |    31 | 
|        8 |       3 |    31 | 
|        9 |       3 |    31 | 
+----------+---------+-------+
9 rows in set (0.00 sec)

Because we do not need the auto incrementing anymore, we can remove it again:

mysql> ALTER TABLE child MODIFY COLUMN child_id BIGINT(20)  NOT NULL;
Query OK, 9 rows affected (0.08 sec)
Records: 9  Duplicates: 0  Warnings: 0

Unfortunately this cannot be combined into a single ALTER TABLE statement, because the parser seems to first check if the statement's parts are valid, which is not the case for the last MODIFY COLUMN segment as the child_id column does not exist at that time.

Friday, May 04, 2007

FindBugs - Writing custom detectors (Part 2)

This is the second part of the "Howto write a FindBugs Bug Detector" (see the first part here). To understand why one would write the kind of detector mentioned here, you should read the first part if you do not already know it.

Last time I presented a detector that is able to detect static fields of the types java.util.Calendar and java.text.DateFormat. While declaring fields like this may be suspicious, there is not necessarily something wrong with the code. The real danger comes from calling methods on suchlike fields, especially if they are not synchronized to protect them against concurrent access. So in this article we will extend and improve the existing detector to cope with this problem.

Something to chew on

This a simple class that uses a static Calendar instance. It does not really do much, but the shorter the program, the easier to understand its bytecode (this is what we need to do this time).

import java.util.Calendar;

public class StaticCalendarSample3 {
    public static final Calendar cal = Calendar.getInstance();
    public void testCal() {
        cal.clear(); // this is dangerous
    }
}

What I "like" most about concurrency programs (or multi-threading bugs) is that even the simplest and most harmless looking code can be a source of nasty and hard to find problems. To emphasize this I even marked the cal field final, because that's advice you hear often (e. g. Securing Java, Ch. 7, Rule 3 or Effective Java, Item 13, "Favor immutability" (Google Account required to view)) and for good reason. However in this particular case it will not help.

In order to write a detector that will alert the cal.clear(); line we need to take a look at the method's bytecode, because that is what the detector will have to look at:

ds@yavin:~/... $ javap -c StaticCalendarSample3
Compiled from "StaticCalendarSample3.java"
public class de.bismarck.fb.test.StaticCalendarSample3 extends java.lang.Object{
public static final java.util.Calendar cal;

static {};
  Code:
   0:   invokestatic    #10; //Method java/util/Calendar.getInstance:()Ljava/util/Calendar;
   3:   putstatic       #16; //Field cal:Ljava/util/Calendar;
   6:   return

public de.bismarck.fb.test.StaticCalendarSample3();
  Code:
   0:   aload_0
   1:   invokespecial   #21; //Method java/lang/Object."<init>":()V
   4:   return

public void testCal();
  Code:
   0:   getstatic       #16; //Field cal:Ljava/util/Calendar;
   3:   invokevirtual   #26; //Method java/util/Calendar.clear:()V
   6:   return

}

The javap tool is part of the JDK installation, the -c option outputs bytecode instead of trying to recreate the Java source. Here you see the "machine code" instructions the compiler generated from the .java file. In the static intializer you can see the invocation of Calendar.getInstance() and the subsequent store (putstatic) of the result of that call to a static field which is denoted by #16. javap automatically adds a more human readable comment that allows you to identify the original variable name (cal) and type.

So a possible way to detect a call on a static Calendar is to look for a getstatic that accesses a Calendar or a subclass thereof, immediately followed by an invokevirtual opcode. For a list of all opcodes and their meanings, you can check The Java Virtual Machine Specification. This would work, at least for the simple cases, and in fact the first version of my detector worked liked this. There is a problem with this approach however (it stored the offset of the getstatic call and looked for a invokevirtual call less than 4 counters from there). Have a look at this Java/bytecode:

import java.util.Calendar;

public class StaticCalendarSample3 {
    public static final Calendar cal = Calendar.getInstance();
    public void testCal() {
        Calendar localCal = cal;
        System.out.println("Hello world");
        localCal.clear(); // still the static field!
    }
}
 
Compiled from "StaticCalendarSample3.java"
public class de.bismarck.fb.test.StaticCalendarSample3 extends java.lang.Object{
public static final java.util.Calendar cal;

static {};
  Code:
   0:   invokestatic    #10; //Method java/util/Calendar.getInstance:()Ljava/util/Calendar;
   3:   putstatic       #16; //Field cal:Ljava/util/Calendar;
   6:   return

public de.bismarck.fb.test.StaticCalendarSample3();
  Code:
   0:   aload_0
   1:   invokespecial   #21; //Method java/lang/Object."<init>":()V
   4:   return

public void testCal();
  Code:
   0:   getstatic       #16; //Field cal:Ljava/util/Calendar;
   3:   astore_1
   4:   getstatic       #26; //Field java/lang/System.out:Ljava/io/PrintStream;
   7:   ldc     #32; //String Hello world
   9:   invokevirtual   #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   12:  aload_1
   13:  invokevirtual   #40; //Method java/util/Calendar.clear:()V
   16:  return

}

In this example the call to the static field is still there, however it is not as easy to detect, because a reference to it first gets stored in a local variable, then some other code follows. Only then is the Calendar accessed again (12: aload_1), this time from a register into which is was stored earlier (3: astore_1). In this situation the detector will not work.

The documentation on FindBugs is somewhat sketchy, but with some very kind help from the Findbugs-discuss mailing list, including some very helpful sample code, I managed to get it more reliable. These are the things that need to be done to generate a bug warning:

  • locate a invokevirtual call
  • determine if it is directed at a type of Calendar/DateFormat or a subclass thereof
  • if so, check if it corresponds to a static field
  • if it does, check if it is outside a synchronization block (assuming that if there is such a block, someone has already thought about multi-threading)

The first two were rather easy:

@Override
public void sawOpcode(int seen) {
 // we are only interested in method calls
 if (seen != INVOKEVIRTUAL) {
  return;
 }

 // determine type of the object the method is invoked on
 ObjectType tType = ObjectTypeFactory.getInstance(getClassConstantOperand());
  // if it is not compatible with Calendar or DateFormat, we are not
 // interested anymore
 if (!tType.subclassOf(calendarType) && !tType.subclassOf(dateFormatType)) {
  return;
 }
 ...

While the first two are rather easy, Bill Pugh provided valuable information for the third one: There is a class called the OpcodeStack which allows to identify the item a method was called on by filtering out the "superfluous" code in between 0: getstatic #16 and 13: invokevirtual #40 in the example above. In order to use it, your detector should subclass OpcodeStackDetector which in turn inherits from BytecodeScanningDetector. As this was the base class of our detector before, too, we can simply change our parent class. This enables allows for use of the OpcodeStack automatically. Have a look at the implementation for details. I am glad I did not have to figure that out myself :)

The code using it looks like this:

 ...
 // determine the number of arguments the method expects
 int numArguments = getNumberArguments(getSigConstantOperand());
 // go back on the stack to find what the receiver of the method is
 OpcodeStack.Item invokedOn = stack.getStackItem(numArguments);
 XField field = invokedOn.getXField();
 // find out, if the field is static. if not, we are not interested
 // anymore
 if (field == null || !field.isStatic()) {
  return;
 }
 ...

As for the synchronization check, David Hovemeyer provided me with the neccessary pointers and some pseudocode which resulted in the following:

 ...
 try {
  if (currentMethod != null && currentLockDataFlow != null && currentCFG != null) {
   Collection<Location> tLocations = currentCFG.getLocationsContainingInstructionWithOffset(getPC());
   for (Location tLoc : tLocations) {
    LockSet lockSet = currentLockDataFlow.getFactAtLocation(tLoc);
    if (lockSet.getNumLockedObjects() > 0) {
     // within a synchronized block
     return;
    }
   }
  }
 } catch (DataflowAnalysisException e) {
  reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
 }
 ...

I still need to try and find more information about the implementation of the so called CFG - the Control Flow Graph - to fully understand what the LockDataFlow, Location and LockSet classes do in full detail. What you can see however in the code above is, that it does not matter what the synchronisation block uses as its monitor object. This is something that might be worth some further improvements in the future, but for now it should suffice. Probably this is a case where JSR305 might come in handy.

Anyway, this is it. If all the above checks have not caused an early return from the method, we can generate a bug report:

 ...
 // if we get here, we want to generate a report, depending on the type
 String tBugType = null;
 if (tType.subclassOf(calendarType)) {
  tBugType = "STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE";
 } else if (tType.subclassOf(dateFormatType)) {
  tBugType = "STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE";
 }
 if (tBugType != null) {
  reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY)
   .addClassAndMethod(this).addCalledMethod(this)
   .addOptionalField(field).addSourceLine(this));
 }

The whole class is contained in the following listing. It especially contains some more housekeeping methods that take care of the CFG and LockDataFlow instances:

/*
 * FindBugs - Find bugs in Java programs
 * Copyright (C) 2003,2004 University of Maryland
 * 
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 * 
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 * 
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

package edu.umd.cs.findbugs.detect;

import java.text.DateFormat;
import java.util.Calendar;
import java.util.Collection;

import org.apache.bcel.classfile.Field;
import org.apache.bcel.classfile.JavaClass;
import org.apache.bcel.classfile.Method;
import org.apache.bcel.generic.ObjectType;

import edu.umd.cs.findbugs.BugInstance;
import edu.umd.cs.findbugs.BugReporter;
import edu.umd.cs.findbugs.OpcodeStack;
import edu.umd.cs.findbugs.ba.*;
import edu.umd.cs.findbugs.bcel.OpcodeStackDetector;

/**
 * Detector for static fields of type {@link java.util.Calendar} or
 * {@link java.text.DateFormat} and their subclasses. Because {@link Calendar}
 * is unsafe for multithreaded use, static fields look suspicous. To work
 * correctly, all access would need to be synchronized by the client which
 * cannot be guaranteed.
 * 
 * @author Daniel Schneller
 */
public class StaticCalendarDetector extends OpcodeStackDetector {

 /** External Debug flag set? */
 private static final boolean DEBUG = Boolean.getBoolean("debug.staticcal");

 /**
  * External flag to determine whether to skip the test for synchronized
  * blocks (default: if a call on a static Calendar or DateFormat is detected
  * inside a synchronizationb block, it will not be reported). Setting this
  * to <code>true</code> will report method calls on static fields if they
  * are in a synchronized block. As the check currently does not take into
  * account the lock's mutex it may be useful to switch allow
  */
 private static final String PROP_SKIP_SYNCHRONIZED_CHECK = "staticcal.skipsynccheck";

 /** The reporter to report to */
 private BugReporter reporter;

 /** Name of the class being inspected */
 private String currentClass;

 /**
  * {@link ObjectType} for {@link java.util.Calendar}
  */
 private final ObjectType calendarType = ObjectTypeFactory.getInstance("java.util.Calendar");

 /**
  * {@link ObjectType} for {@link java.text.DateFormat}
  */
 private final ObjectType dateFormatType = ObjectTypeFactory.getInstance("java.text.DateFormat");

 /** Stores the current method */
 private Method currentMethod = null;

 /** Stores current Control Flow Graph */
 private CFG currentCFG;

 /** Stores current LDF */
 private LockDataflow currentLockDataFlow;

 /**
  * Creates a new instance of this Detector.
  * 
  * @param aReporter
  *            {@link BugReporter} instance to report found problems to.
  */
 public StaticCalendarDetector(BugReporter aReporter) {
  reporter = aReporter;
 }

 /**
  * Remembers the class name and resets temporary fields.
  */
 @Override
 public void visit(JavaClass someObj) {
  currentClass = someObj.getClassName();
  currentMethod = null;
  currentCFG = null;
  currentLockDataFlow = null;
  super.visit(someObj);
 }

 /**
  * Checks if the visited field is of type {@link Calendar} or
  * {@link DateFormat} or a subclass of either one. If so and the field is
  * static it is suspicious and will be reported.
  */
 @Override
 public void visit(Field aField) {
  super.visit(aField);
  String tFieldSig = aField.getSignature();
  if (aField.getType() instanceof ObjectType) {
   String tBugType = null;
   ObjectType tType = (ObjectType) aField.getType();
   try {
    if (tType.subclassOf(calendarType) && aField.isStatic()) {
     tBugType = "STCAL_STATIC_CALENDAR_INSTANCE";
    } else if (tType.subclassOf(dateFormatType) && aField.isStatic()) {
     tBugType = "STCAL_STATIC_SIMPLE_DATA_FORMAT_INSTANCE";
    }
    if (tBugType != null) {
     reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY).addClass(currentClass).addField(
             currentClass, aField.getName(), tFieldSig, true));
    }
   } catch (ClassNotFoundException e) {
    AnalysisContext.reportMissingClass(e);
   }
  }
 }

 /*
  * (non-Javadoc)
  * 
  * @see edu.umd.cs.findbugs.visitclass.BetterVisitor#visitMethod(org.apache.bcel.classfile.Method)
  */
 @Override
 public void visitMethod(Method obj) {
  try {
   super.visitMethod(obj);
   currentMethod = obj;
   currentLockDataFlow = getClassContext().getLockDataflow(currentMethod);
   currentCFG = getClassContext().getCFG(currentMethod);
  } catch (CFGBuilderException e) {
   reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
  } catch (DataflowAnalysisException e) {
   reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
  }
 }

 /**
  * Checks for method invocations ({@link org.apache.bcel.generic.INVOKEVIRTUAL})
  * call on a static {@link Calendar} or {@link DateFormat} fields. The
  * {@link OpcodeStack} is used to determine if an invocation is done on such
  * a static field.
  * 
  * @param seen
  *            An opcode to be analyzed
  * @see edu.umd.cs.findbugs.visitclass.DismantleBytecode#sawOpcode(int)
  */
 @Override
 public void sawOpcode(int seen) {
  // we are only interested in method calls
  if (seen != INVOKEVIRTUAL) {
   return;
  }

  try {
   // determine type of the object the method is invoked on
   ObjectType tType = ObjectTypeFactory.getInstance(getClassConstantOperand());

   // if it is not compatible with Calendar or DateFormat, we are not
   // interested anymore
   if (!tType.subclassOf(calendarType) && !tType.subclassOf(dateFormatType)) {
    return;
   }

   // determine the number of arguments the method expects
   int numArguments = getNumberArguments(getSigConstantOperand());
   // go back on the stack to find what the receiver of the method is
   OpcodeStack.Item invokedOn = stack.getStackItem(numArguments);
   XField field = invokedOn.getXField();
   // find out, if the field is static. if not, we are not interested
   // anymore
   if (field == null || !field.isStatic()) {
    return;
   }

   if (getNameConstantOperand().equals("equals") && numArguments == 1) {
    OpcodeStack.Item passedAsArgument = stack.getStackItem(0);
    field = passedAsArgument.getXField();
    if (field == null || !field.isStatic()) {
     return;
    }
   }

   if (!Boolean.getBoolean(PROP_SKIP_SYNCHRONIZED_CHECK)) {
    // check synchronization
    try {
     if (currentMethod != null && currentLockDataFlow != null && currentCFG != null) {
      Collection<Location> tLocations = currentCFG.getLocationsContainingInstructionWithOffset(getPC());
      for (Location tLoc : tLocations) {
       LockSet lockSet = currentLockDataFlow.getFactAtLocation(tLoc);
       if (lockSet.getNumLockedObjects() > 0) {
        // within a synchronized block
        return;
       }
      }
     }
    } catch (DataflowAnalysisException e) {
     reporter.logError("Synchronization check in Static Calendar Detector caught an error.", e);
    }
   }

   // if we get here, we want to generate a report, depending on the type
   String tBugType = null;
   if (tType.subclassOf(calendarType)) {
    tBugType = "STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE";
   } else if (tType.subclassOf(dateFormatType)) {
    tBugType = "STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE";
   }
   if (tBugType != null) {
    reporter.reportBug(new BugInstance(this, tBugType, NORMAL_PRIORITY).addClassAndMethod(this).addCalledMethod(this)
            .addOptionalField(field).addSourceLine(this));
   }
  } catch (ClassNotFoundException e) {
   AnalysisContext.reportMissingClass(e);
  }
 }

}

In order to use the new bug type, the findbugs.xml and messages.xml files have to be completed, too. Add the following lines to them:

findbugs.xml:
  ...
      <BugPattern abbrev="STCAL" type="STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE" category="MT_CORRECTNESS" />
      <BugPattern abbrev="STCAL" type="STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE" category="MT_CORRECTNESS" />
  ...
  
  
messages.xml
<BugPattern type="STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE">
<ShortDescription>Call to static Calendar</ShortDescription>
<LongDescription>Call to method of static java.util.Calendar in {1}</LongDescription>
<Details>
<![CDATA[
<p>Even though the JavaDoc does not contain a hint about it, Calendars are inherently unsafe for multihtreaded use. 
The detector has found a call to an instance of Calendar that has been obtained via a static
field. This looks suspicous.</p>
<p>For more information on this see <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6231579">Sun Bug #6231579</a>
and <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6178997">Sun Bug #6178997</a>.</p>
]]>
</Details>
</BugPattern>

<BugPattern type="STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE">
<ShortDescription>Call to static DateFormat</ShortDescription>
<LongDescription>Call to method of static java.text.DateFormat in {1}</LongDescription>
<Details>
<![CDATA[
<p>As the JavaDoc states, DateFormats are inherently unsafe for multithreaded use. 
The detector has found a call to an instance of DateFormat that has been obtained via a static
field. This looks suspicous.</p>
<p>For more information on this see <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6231579">Sun Bug #6231579</a>
and <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6178997">Sun Bug #6178997</a>.</p>
]]>
</Details>
</BugPattern>

When you download the current version of FindBugs either as a ZIP package or from the SVN trunk, it will contain an earlier version of this check, however disabled by default in the findbugs.xml file. I will submit the newer one as a patch to the tracker on SourceForge, so I guess it will be included in one of the later builds.

Some final words

While I am still far from really deeply understanding all the concepts and techniques in FindBugs, I found it relatively easy to get going, even though the primary source of "documentation" is the source code of existing detectors. I was also very happy to get quick, friendly and helpful responses on the mailing list. So I'd really like to encourage anyone who comes across a bug pattern that is not yet detected to give it a shot and try to implement a detector.

The future of FindBugs is open for ideas and discussions, Bill Pugh just yesterday started a FindBugs Google Group where you can take part in discussions and present any ideas you might have :)