Digging deeper into the problem with external support led to some new evidence. Apparently the problem is in some way related to regular garbage collects initiated by the so called "GC Daemon" thread. It gets spawned when you use some fashion or other of RMI and calls full GCs in order to get rid of unreachable remote objects.
One can specify the interval (in milliseconds) between calls to the garbage collector using
-Dsun.rmi.dgc.server|client.gcInterval. With our application using RMI to call remote services we reduced this value to as little as 6 seconds. As we expected this let us reproduce the problem we have much more often than before. In 4 days we observed 8 VM crashes, each of them with very similar
This means that my test program might have revealed even another bug, because it did produce some crashes, however it does not use anything connected with RMI.
As a workaround we might just increase those values for our application explicitly to lessen the risk of related crashes, however a real fix would be great.