Sunday, November 26, 2006

Java 5 crashes on Linux 2.6, too

Two and a half weeks ago I published a post about problems with random VM crashes using Java 5 on Linux with a 2.4 kernel.

Most of the feedback I got suggested upgrading to a more recent kernel version. Because this represents a major undertaking for our application (several thousand clients deployed with Java 1.4.2 on RH9) we needed to be sure this would work.

Because all the crash reports - see the original post - seemed to hint into the GC's direction I wrote a little test application to stress the garbage collector. What it does is to create a configurable number of threads, each of which just allocates a byte[] of variable size. In case an OutOfMemoryError occurs, the thread gets replaced with a new one. You can find the code at the bottom of this post.

I started 4 instances of this tool under Ubuntu 6.06, each configured with 20 threads (first parameter) and up to 40MB of memory per thread (second parameter, in bytes).

After 4 days of continuous running - we had just started to feel a little more safe - one of the processes crashed, leavig a hs_err file behind, again telling us the current activity was a full garbage collection.

We have now filed a bug report with Sun which is yet to be reviewed by them. They say that it currently takes around 3 weeks before you get an official bug id, however I do not think any fix will be in time for us to use Java 5 for our next application release.

Does anyone know, if there is a way of getting the crash files with debug symbols? Maybe this would allow us to do some more testing on our own. Is there some sort of debug-VM version available for download?

public class GCStress {
    private static Thread[] threads;

    private static Random random = new Random();

    private static int maxBytes;

    private static int numThreads;

    public static void main(String[] args) {
        numThreads = Integer.parseInt(args[0]);
        maxBytes = Integer.parseInt(args[1]);
        threads = new Thread[numThreads];
        for (int x = 0; x < numThreads; x++) {
            threads[x] = new Thread(new Allocator(), "Thread_" + x);
            threads[x].start();
        }

        while (true) {
            for (int x = 0; x < numThreads; x++) {
                if (!threads[x].isAlive()) {
                    threads[x] = new Thread(new Allocator(), "Thread_r" + x);
                    threads[x].start();
                }
            }
            try {
                Thread.sleep(500);
            } catch (InterruptedException e) {
                // ignore
            }

        }
    }

    private static class Allocator implements Runnable {

        public void run() {
            while (true) {
                int tSize = random.nextInt(maxBytes);
                System.out.println(Thread.currentThread().getName()
                        + " allocating " + tSize / 1024 + "kb");
                byte[] tMemFiller = new byte[tSize];
                try {
                    Thread.sleep(random.nextInt(200));
                } catch (InterruptedException e) {
                    ; // ignore
                }
            }
        }

    };
}

Tuesday, November 21, 2006

Amazingly simple - Collection Initialization

On Todd Huss' blog I just came about a very simple way of initializing a collection with a set of predefined values. It is so simple that it is amazing people do not use it way more often. For my part, I have seen this use of instance initializers for the first time, although they are nothing sooo special...

Saturday, November 18, 2006

MySQL/InnoDB slowness with Blobs

Reading about Peter Zaitsev's feature idea about Finding columns which a query needs to access - which I would really like to see implemented - reminded me of a bug report I filed in 2004 and which bit me again only a few days ago. You can find it under Bug #7074 in the MySQL bug tracking tool. Although it is filed as a feature request, I think one should be aware of this, as it may cause problems in your applications (it did in ours).

Basically it is about explicitly specifying which columns you need in a result set, instead of just using SELECT *. This is generally a good idea, however if the table contains BLOB columns, it becomes even more important, as it may affect performance heavily in an unexpected manner.

From the bug report:

MySQL first reads all the selected columns, and only after that checks the WHERE.

This may lead to long running queries, even if you do not use the BLOB column in the WHERE clause and even if there is no data to retrieve based on the query conditions.

For more details see the bug report.

Monday, November 13, 2006

System Lockup: F-Secure AV 2007 and Kerio Firewall

Recently I received a notification about F-Secure Anti-Virus 2007 being available. As an F-Secure customer you are entitled to upgrade from the 2006 version if your subscription is valid. So I downloaded the installation package and performed the upgrade.

After the obligatory reboot things started to fall apart. My computer would not respond for more than about 30 seconds after I had logged in. Opening the Start menu would work, maybe even opening e. g. the Control Panel sub menu. However nothing else would work after this point. Using Ctrl-Alt-Del to get the Task Manager just allowed me to "wipe" the start menu from the screen, no more action would be possible.

What made me suspicious was a little dialog I had to dismiss right after logging in that informed me about my Kerio Personal Firewall not being found by the system-tray GUI. Because conflicting firewalls are known to cause lockup problems like this, I originally bought F-Secure Anti-Virus instead of the whole Internet Security package. Anti-Virus 2006 had been working fine in conjunction with the separate personal firewall.

I rebooted to see if this was some sort of transient problem with the first reboot after the install. This time I did not even get an Explorer to launch and show me my desktop. Apart from the wallpaper and a mouse pointer I could not see anything. Hitting Ctrl-Alt-Del again let me launch the Task Manager. I tried to start explorer.exe from there, to no avail.

I decided to uninstall the personal firewall. I tried to boot into Safe Mode, just to see that it would not come up and instead die with a blue screen. To be fair, I have to say that I had not tried Safe Mode for a loong time, so I do not know if it would have worked before my problems started.

My only way to resolve this was to boot into the Vista RC installation I luckily had not deleted yet and to disable the startup of the firewall service in the XP install. To do so I loaded the windows\system32\config\system registry hive into the Vista regedit and set the startup type (ControlSet00x\Services\servicename\Start to 0 - which means disabled - in the firewall service node of the active ControlSet001. You can see which control set is the one for "normal" Windows startup by looking at the SYSTEM\Select\Default value.

Upon restart the situation did not change, the same problem as before. Because I was not sure whether just disabling the Kerio service had been enough, I decided to uninstall it. To do so I had to disable F-Secure Anti-Virus, too. So I loaded Vista again and opened the XP registry. Luckily the F-Secure services all have human readable key names, all starting with "F-Secure", so it was very easy to disable them as well.

Back in XP I was for the first time able to do more than wait for the lock-up. I uninstalled Kerio using the Control Panel's "Add/Remove Programs" applet and rebooted, after I had set the F-Secure services back to their original startup settings.

Guess what... It still did not work... I came to the conclusion that there must be some sort of a bug in F-Secure Anti-Virus's 2007 version. In the meantime my father had called, complaining about the same problem, which at the time seemed to support my theory. At that point however I did not know yet, that he used the Sunbelt Personal Firewall, too.

After going through the whole boot Vista - load XP registry - disable services - reboot to XP hassle I finally uninstalled Anti-Virus 2007, rebooted and re-installed 2006. At this point I had restored the situation where I had originally left off - minus the Kerio Personal Firewall.

For some reason I did not want to believe that F-Secure would ship such a lousy product. I fired up regedit again and opened the services subtree. There I reviewed every one of them, not knowing what exactly to look for. Finally I found these two entries:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\fwdrv]
"MaxBufferSize"=dword:00004000
"MaxBuffersAllocated"=dword:00000300
"WarnLog"=dword:00000000
"DebugLog"=dword:00000000
"DebugLogFlags"=dword:00000000
"DatagramRoutingExtent"=dword:4109891b
"StatInspEnabled"=dword:00000001
"AlwaysSecure"=dword:00000002
"FSSecEnabled"=dword:00000000
"RegSecEnabled"=dword:00000000
"AdapterNotificationDisabled"=dword:00000000
"BufCacheSize"=dword:00000060
"TCPConnectionTimeout"=dword:00000000
"BlockIPv6"=dword:00000000
"ErrLogFile"="\\SystemRoot\\System32\\drivers\\fwdrv.err"
"DebugLogFile"="\\SystemRoot\\System32\\drivers\\fwdrv.dbg"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\fwdrv\Enum]
"0"="Root\\LEGACY_FWDRV\\0000"
"Count"=dword:00000001
"NextInstance"=dword:00000001
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\khips]
"Type"=dword:00000001
"TraceLevel"=dword:00000000
"DisplayName"="Kerio HIPS Driver"
"TraceFile"="C:\\Programme\\Kerio\\Personal Firewall 4\\logs\\khips.log"
"Start"=dword:00000001
"ErrorControl"=dword:00000001
"ImagePath"=hex(2):5c,00,53,00,79,00,73,00,74,00,65,00,6d,00,52,00,6f,00,6f,00,\
  74,00,5c,00,73,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,64,00,72,\
  00,69,00,76,00,65,00,72,00,73,00,5c,00,6b,00,68,00,69,00,70,00,73,00,2e,00,\
  73,00,79,00,73,00,00,00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\khips\Enum]
"0"="Root\\LEGACY_KHIPS\\0000"
"Count"=dword:00000001
"NextInstance"=dword:00000001

The ImagePath in the second node reads "%systemroot%\system32\drivers\khips.sys" when viewed with regedit. Searching the net for that name reveals that it is the "Kerio Host Intrusion Prevention Service". Obviously this is a remainder of the Kerio Personal Firewall that I thought I had removed.

In the Device Manager one can also see this service when the "View/Show Hidden Devices" option is enabled. It will show up under "Non-PnP-Drivers" (sorry if the option names are a little off, I am trying to guess their names, because I use a German Windows).

As soon as I had removed both of the registry keys above (kerio.uk.com contains a reference to fwdrv) and rebooted, I could use F-Secure Anti-Virus 2007 without any problems. I will file this with F-Secure now...

JavaPosse podcast on Java GPL'ing

They guys of the JavaPosse have just released a special issue of their podcast in which they interview Mark Reinhold (chief engineer for Java SE), Rich Sands (community marketing manager for Java SE) and Eric Chu (senior director of the Client Systems Group and head of its Java ME initiatives).

Dick wall posted a story on Digg.com called Questions about Open Source Java? This Podcast may have the answers! which leads to the podcast. Very interesting stuff, especially concerning the famous question "Will my app have to become Open Source, too, if I use Sun's Java?".

Be sure to cast your vote for the item on digg.com :)

Saturday, November 11, 2006

Vista Aero and QuickTime

Very cool behaviour of QuickTime on Vista (ok, RC1, but I do not think this will become better):

This is shown when you start the QuickTime control panel applet. Notice that the publisher information is displayed as "Microsoft Windows Publisher", so you have no idea that this was really the QuickTime applet. It could have been any other process in the background, too.

Tuesday, November 07, 2006

Java 5 random VM crashes

We are currently evaluating the consequences of migrating our application from Java 1.4 to Java 5. While initial tests revealed only simple issues (like variables called enum etc.) we are now seeing a much more severe problem: Random VM crashes.

Currently we only see this on Linux (Kernel 2.4) only, however even there we cannot reliably reproduce the problem. On a single machine we have seen two crashes in a week. Notably the application was not being used, it was just started and waiting for user input. Some background threads are running in this situation, however they do not do any work, either. They just poll some database tables for external changes, but there were none.

All of a sudden a VM would crash, leaving a hs_err_pid1234.txt behind. This is what they look like (shortened):

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x402989b9, pid=8736, tid=1094691632
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_09-b03 mixed mode, sharing)
# Problematic frame:
# V  [libjvm.so+0x2689b9]
#

---------------  T H R E A D  ---------------

Current thread (0x08099230):  VMThread [id=8737]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000008

Registers:
EAX=0x00000000, EBX=0x403b7aec, ECX=0x00000008, EDX=0x6861ce38
ESP=0x413fa3d0, EBP=0x413fa3e0, ESI=0x403aa980, EDI=0x403c5ac8
EIP=0x402989b9, CR2=0x00000008, EFLAGS=0x00010206

Top of Stack: (sp=0x413fa3d0)
0x413fa3d0:   0806c558 0806c558 403b7aec 403af5c4
0x413fa3e0:   413fa400 40298ccc 403af5c4 413fa448
0x413fa3f0:   413fa410 40329d7c 413fa448 403b7aec
0x413fa400:   413fa430 40334754 403c5ac8 403af5c4
0x413fa410:   413fa4f0 403262a1 413fa448 413fa454
0x413fa420:   403630e3 403b7aec 00000002 0806c438
0x413fa430:   413fa470 40170ff0 403c5ac8 00000000
0x413fa440:   00000001 00000001 42133220 00000001 

Instructions: (pc=0x402989b9)
0x402989a9:   08 49 89 08 8b 40 08 8b 14 88 8b 42 04 8d 48 08
0x402989b9:   8b 40 08 52 51 ff 50 58 8b 06 83 c4 10 8b 08 85 

Stack: [0x4137a000,0x413fb000),  sp=0x413fa3d0,  free space=512k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x2689b9]
V  [libjvm.so+0x268ccc]
V  [libjvm.so+0x304754]
V  [libjvm.so+0x140ff0]
V  [libjvm.so+0x1438d8]
V  [libjvm.so+0x143122]
V  [libjvm.so+0x14c8ad]
V  [libjvm.so+0x2f0dce]
V  [libjvm.so+0x1409ff]
V  [libjvm.so+0x141c7f]
V  [libjvm.so+0x32df33]
V  [libjvm.so+0x32dad6]
V  [libjvm.so+0x32d0e7]
V  [libjvm.so+0x32d355]
V  [libjvm.so+0x32cec0]
V  [libjvm.so+0x28bbe8]
C  [libpthread.so.0+0x4484]

VM_Operation (0x4a74d258): full generation collection, mode: safepoint, requested by thread 0x0815d230
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x40189396, pid=18790, tid=1094691632
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_09-b03 mixed mode, sharing)
# Problematic frame:
# V  [libjvm.so+0x159396]
#

---------------  T H R E A D  ---------------

Current thread (0x08099230):  VMThread [id=18791]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x59c83398

Registers:
EAX=0x59c83398, EBX=0x403b7aec, ECX=0x6eb6f518, EDX=0x5bc99a08
ESP=0x413fa0a8, EBP=0x413fa0c0, ESI=0x5bc99a04, EDI=0x6eb6f520
EIP=0x40189396, CR2=0x59c83398, EFLAGS=0x00010202

Top of Stack: (sp=0x413fa0a8)
0x413fa0a8:   6afc194c 5bc99a08 6eb6f524 403b7aec
0x413fa0b8:   403aa980 403c5ac8 413fa0e0 402989c1
0x413fa0c8:   6eb6f288 5bc999f8 0806c558 0806c558
0x413fa0d8:   403b7aec 403af5c4 413fa100 40298ccc
0x413fa0e8:   403af5c4 413fa148 413fa110 40329d7c
0x413fa0f8:   413fa148 403b7aec 413fa130 40334754
0x413fa108:   403c5ac8 403af5c4 413fa1f0 403262a1
0x413fa118:   413fa148 413fa154 403630e3 403b7aec 

Instructions: (pc=0x40189396)
0x40189386:   8d 14 86 89 55 ec 39 d6 73 18 8b 06 85 c0 74 0a
0x40189396:   8b 00 83 e0 03 83 f8 03 75 18 83 c6 04 3b 75 ec 

Stack: [0x4137a000,0x413fb000),  sp=0x413fa0a8,  free space=512k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x159396]
V  [libjvm.so+0x2689c1]
V  [libjvm.so+0x268ccc]
V  [libjvm.so+0x304754]
V  [libjvm.so+0x140ff0]
V  [libjvm.so+0x1438d8]
V  [libjvm.so+0x143122]
V  [libjvm.so+0x14c8ad]
V  [libjvm.so+0x2f0dce]
V  [libjvm.so+0x1409ff]
V  [libjvm.so+0x141c7f]
V  [libjvm.so+0x32df33]
V  [libjvm.so+0x32dad6]
V  [libjvm.so+0x32d0e7]
V  [libjvm.so+0x32d355]
V  [libjvm.so+0x32cec0]
V  [libjvm.so+0x28bbe8]
C  [libpthread.so.0+0x4484]

VM_Operation (0x48f8e4d8): full generation collection, mode: safepoint, requested by thread 0x08653a08

Looking through the Sun bug database I found several reports about similar crashes, however they were all closed as not reproducible. This is our problem, too. Right now the application has been running for 5 days without a problem. Nevertheless this is not too comforting, as we would have several thousand VMs in production use. Should we decide to migrate even a chance of 0.1% for this crash would leave us with several problem reports a day which we cannot accept.

Any comments and hints will be greatly appreciated.

Thursday, November 02, 2006

Beryl on Edgy Eft

As announced previously I spent some time to get Beryl to work on my newly upgraded Edgy Eft installation. Although it did not went as smoothly as I would have hoped, it was not too troublesome either.

Dual head still seems to be a major problem in many areas in Linux. This definitely something the Windows people do not have to worry about just as much, but ok, this may partly be related to the hardware vendors not providing some sort of unified and/or open drivers.

Nevertheless it is now working, after some changes to my xorg.conf. Before those I always got an error message from Beryl, complaining about a missing RandR extension.

The effects are really nice, some of them are however too slow for my taste in the default settings. After speeding them up a little (I do not like to wait for a context-menu to wobble into view, if it wobbles for more than a fraction of a second) I really liked it. There are some issues left, but I assume this is because of the ongoing development. E. g. window resizing is a little strange if you grab a window's top edge and move the mouse up and down. One would expect the window to remain in place and gain or loose height from the top, i. e. where you drag. However sometimes windows seem to be resized on the bottom.

Video playback is also choppy, but that seems to depend on the file I play back. Probably due to different codecs, however I have not really looked deeper into it.

From what I have seen so far, I believe there is very much potential in this :)

For those interested, this is the contents of my xorg.conf

Section "Files"
        FontPath        "/usr/share/X11/fonts/misc"
        FontPath        "/usr/share/X11/fonts/cyrillic"
        FontPath        "/usr/share/X11/fonts/100dpi/:unscaled"
        FontPath        "/usr/share/X11/fonts/75dpi/:unscaled"
        FontPath        "/usr/share/X11/fonts/Type1"
        FontPath        "/usr/share/X11/fonts/100dpi"
        FontPath        "/usr/share/X11/fonts/75dpi"
        FontPath        "/usr/share/fonts/X11/misc"
        # path to defoma fonts
        FontPath        "/var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType"
EndSection

Section "Module"
        Load    "i2c"
        Load    "bitmap"
        Load    "ddc"
        Load    "dri"
        Load    "extmod"
        Load    "freetype"
        Load    "glx"
        Load    "int10"
        Load    "type1"
        Load    "vbe"
EndSection

Section "InputDevice"
  Driver       "kbd"
  Identifier   "Keyboard[0]"
  Option       "Protocol" "Standard"
  Option       "XkbLayout" "de"
  Option       "XkbModel" "pc105"
  Option       "XkbRules" "xfree86"
EndSection


Section "InputDevice"
  Driver       "mouse"
  Identifier   "Mouse[1]"
  Option       "Buttons" "6"
  Option       "Device" "/dev/input/mice"
  Option       "Name" "Logitech USB Wheel Mouse"
  Option       "Protocol" "explorerps/2"
  Option       "Vendor" "Sysp"
  Option       "ZAxisMapping" "4 5"
EndSection


Section "Monitor"
  HorizSync    30-121
  Identifier   "Monitor[0]"
  ModelName    "H750"
  VendorName   "Hansol"
  VertRefresh  56-70
EndSection

Section "Screen"
  DefaultDepth 24
  SubSection "Display"
    Depth      15
    Modes      "1280x1024" 
  EndSubSection
  SubSection "Display"
    Depth      16
    Modes      "1280x1024" 
  EndSubSection
  SubSection "Display"
    Depth      24
    Modes      "1280x1024" 
  EndSubSection
  SubSection "Display"
    Depth      8
    Modes      "1280x1024" 
  EndSubSection
  Device       "Device[0]"
  Identifier   "Screen[0]"
  Monitor      "Monitor[0]"
EndSection


Section "Device"
  BoardName    "GeForce 7600GS"
  BusID        "1:0:0"
  Driver       "nvidia"
  Identifier   "Device[0]"
  Option       "AddARGBGLXVisuals" "True"
  Option       "RenderAccel" "True"
  Option       "TwinView"
  Option       "SecondMonitorHorizSync" "60-71"
  Option       "MetaModes" "1280x1024,1280x1024"
  Option       "TwinViewOrientation" "RightOf"
  Option       "SecondMonitorVertRefresh" "50-160"
  VendorName   "NVidia"
EndSection


Section "ServerLayout"
  Identifier   "Layout[all]"
  InputDevice  "Keyboard[0]" "CoreKeyboard"
  InputDevice  "Mouse[1]" "CorePointer"
  Option       "Clone" "off"
  Option       "Xinerama" "off"
  Screen       "Screen[0]"
EndSection


Section "DRI"
    Group      "video"
    Mode       0660
EndSection

Section "Extensions"
EndSection