Friday, December 15, 2006

Some progress on Java 5 on Linux crash

In two previous posts (first here and second here) I reported about Java 5 VM crashes on Linux machines.

Digging deeper into the problem with external support led to some new evidence. Apparently the problem is in some way related to regular garbage collects initiated by the so called "GC Daemon" thread. It gets spawned when you use some fashion or other of RMI and calls full GCs in order to get rid of unreachable remote objects.

One can specify the interval (in milliseconds) between calls to the garbage collector using -Dsun.rmi.dgc.server|client.gcInterval. With our application using RMI to call remote services we reduced this value to as little as 6 seconds. As we expected this let us reproduce the problem we have much more often than before. In 4 days we observed 8 VM crashes, each of them with very similar hs_err files.

This means that my test program might have revealed even another bug, because it did produce some crashes, however it does not use anything connected with RMI.

In the freshly released Java SE 6 the default interval was increased from one minute to one hour (see Bug #6200091 and the RMI release notes).

As a workaround we might just increase those values for our application explicitly to lessen the risk of related crashes, however a real fix would be great.

No comments: