Saturday, December 30, 2006

Java 5 crash - the saga continues

It really seems to be a problem related to the GC options mentioned here. However there seems to be more to it. With the rather aggressive setting of only 6 seconds between full GCs we were able to reproduce the problem, however only on a very specific combination of hardware, OS and Java VM.

Running on RedHat 9 with Sun's Java VM 1.5.0_09 we could only see the problem on one machine that appears to be the same as all the others we use. However as this very same machine shows no problems at all when running our application with Java 1.4.2_08 or 1.6.0RC - even with the 6 seconds interval - I do not believe in faulty hardware.

I removed the machine's hard drive and put it into an identical system (as far as we can see and are told by its manufacturer). In that box (and two more I tried) I cannot get any crash with either Java 1.4, 1.5 or 1.6. Strangely enough even the drive from one of those other machines mounted into the problematic box showed no problems. We really do not have any clue here.

Nevertheless I have now moved everything back to where it was and have increased the GC interval to 4 hours. So far I have not seen any more crashes. The system will be running till January 3rd. If there were no crashes between now and then we will consider this parameter as a solution to our immediate problem, even though the underlying cause may never be found...

Friday, December 15, 2006

Some progress on Java 5 on Linux crash

In two previous posts (first here and second here) I reported about Java 5 VM crashes on Linux machines.

Digging deeper into the problem with external support led to some new evidence. Apparently the problem is in some way related to regular garbage collects initiated by the so called "GC Daemon" thread. It gets spawned when you use some fashion or other of RMI and calls full GCs in order to get rid of unreachable remote objects.

One can specify the interval (in milliseconds) between calls to the garbage collector using -Dsun.rmi.dgc.server|client.gcInterval. With our application using RMI to call remote services we reduced this value to as little as 6 seconds. As we expected this let us reproduce the problem we have much more often than before. In 4 days we observed 8 VM crashes, each of them with very similar hs_err files.

This means that my test program might have revealed even another bug, because it did produce some crashes, however it does not use anything connected with RMI.

In the freshly released Java SE 6 the default interval was increased from one minute to one hour (see Bug #6200091 and the RMI release notes).

As a workaround we might just increase those values for our application explicitly to lessen the risk of related crashes, however a real fix would be great.

Sunday, December 03, 2006

Follow Up: F-Secure's response

Not too long ago I wrote about a problem concerning F-Secure Anti-Virus 2007 and the Kerio Personal Firewall in this article. At the end of it I said I would inform F-Secure about the problem. I did and this is about their response.

On November 13th I used the support form on F-Secure's website (Germany) to report to them the problem I had experienced. My report was about 2K long (I still have it) and included precise information about the situation, what I had found out and what to do to prevent it. I suggested having the installer issue a warning concerning 3rd party personal firewalls, especially since those seemed to be no problem with the 2006 version. I also included a link to my even more verbose blog post.

On November 14th I got a response from one of their support agents. Apart from a lengthy auto-generated intro on how to issue correct problem reports (comes with every mail, does not have anything to do with your individual request) I got a very brief answer (I translated this from German, but I tried to be as exact as possible):

Dear F-Secure Customer, thank you very much for your request to our technical support. Unfortunately the firewall software you use is not compatible with our software. If you need a firewall, we recommend using our "Internet Security". Should you have further questions or need assistance with the actions above please call under ... We wish you a pleasant day. Regards, ...

If I was asked I could probably tell the ready-made text blocks used in this answer apart...

I replied again, because this did not satisfy me at all. I told them that I would have had expected a little more verbosity, especially since I had provided a detailed analysis of the problem. Furthermore I told them about the same problem happening to several people I know - all not too tech-savvy people who called me for assistance. Some of them were so pis^H^H^H dissatisfied that they will probably not buy a new subscription for F-Secure once their current one has expired.

I specifically asked two simple questions: a) Whether my analysis was correct and if it could be used to prevent the problem from happening should need to upgrade more machines. b) If it would be possible to follow my suggestion and simply have the installer display a warning that as of version 2007 personal firewalls might cause severe problems. This would have been enough from my standpoint - even though detecting the remaining files of a Kerio PFW would have been even nicer.

This was their answer (again translated):

Dear F-Secure Customer, thank you very much for your request to our technical support. regarding a) It is not possible to run version 2007 on systems on which Kerio Firewall has not been completely removed. regarding b) Our setup's sidegrade feature does unfortunately not detect all versions of the Kerio Firewall. Otherwise it will insist on uninstalling the Firewall before it starts to install. Should you have further questions or need assistance with the actions above please call under ... We wish you a pleasant day. Regards, ...

As the icing on the cake on November 21st I got a feedback request about how content I was with the recent support issue I had filed to ensure optimum support with technical difficulties. I in fact filled out the form, complaining about the very brief and impersonal answers that did not meet my expectations. I have not heard anything since then.

Is this great? I know someone who will seriously think about renewing his own subscription...