Monday, April 24, 2006

Awt_Robot and File Handles

In January I wrote about translucent windows with Swing. I found it in O'Reilly's Swing Hacks book and was pretty pleased with the results.

However for some time we have been experiencing problems with our application in production (running Sun's JRE 1.4.2 on RedHat 9). They were not obviously GUI related; for some strange reason a barcode reader attached to the machine randomly stopped working. We traced and debugged a whole lot, but could not find any reason apart from a bug in the native driver layer for these (JavaPOS) devices. As a result we went to the driver vendor and had them look into it on the driver level.

After a while they came up with some results which revealed that there really was a problem in the native part of the driver; however they could not find a way to reproduce it using their test tools, but only when using our application.

Basically what they found was a process called awt_robot that had a file handle on a device node in the /dev filesystem used to communicate with the barcode scanner. However that process had not issued an open call to the file but had started using the already open file handle right away. When our main application tried to close the handle at some point, that close call froze until one killed the awt_robot manually. Only then would the close call return and the application continue normally.

So that explained where the problem was, but now how it came to pass. Armed with that knowledge about the awt_robot we started looking around and found it to be a binary in $JAVA_HOME/jre/lib/i386/. Apparently it is part of the implementation of the java.awt.Robot. No one here really knew that there was a separate program to back the Robot class, but when we debugged through the translucent-windows-part of our code we could see an instance of this program being forked by the java process upon first invocation. Having been started once it would not go away until the VM itself was shut down.

Up to then we had (at least I had) believed that the native robot stuff was part of the JVM itself. As they say: You never stop learning...

Anyway (learning even more here), as any process forked under Linux inherits the file handles of its parent (found this page from IBM with a quick Google search) the awt_robot also inherits the (already open) handle on the device node. It seems that as long as the child process does not close this handle, the parent also cannot do so and has to wait for it. So if awt_robot does not usually exit before its parent java process stops the handle will actually never be closed during the application's runtime.

So in the end our hardware problem turned out to be really very well hidden GUI problem. We commented out the code for the fancy windows and everything ran smoothly again. We did not try if this problem also occurs on the Windows version of Java, but I believe one should at least be aware of the situation.

These comments originate from my old blog and I find them interesting enough to repost here:

Oh, wow, that's an interesting observation. I wrote the initial version of that code. We had to use an external process because of locking issues in AWT. That is, in 1.3 anyway, there were several cases where AWT_LOCK was held over long periods of time. AWT_LOCK is meant to prevent multiple concurrent calls to Xlib, because the Xlib of that time frame was not thread safe. It took several iterations and we ended up with having a separate process so that the Xlib of the child process could not in any way make concurrent Xlib calls. The child process is supposed to inherit a couple pipe file handles, and communicate with the parent over those. It absolutely does not need any other file handles than the two pipe's and the Xlib connection to the display. Have you filed a bug on this???

Posted by David Herron on April 24, 2006 at 06:50 PM CEST #

Not yet. First priority was to get the workaround jar for the customer ready. But I will post a report tomorrow.

Posted by Daniel Schneller on April 25, 2006 at 12:22 AM CEST #

I have now filed a bug report with Sun. The problem occurs when the file in /dev gets opened exclusively by the driver module. In that case it cannot be reopened by the main application once it has been closed, because awt_robot still keeps a handle on it. This prevents opening it again in exclusive mode.

Posted by Daniel Schneller on April 28, 2006 at 05:32 PM CEST #

This is the bug report I filed: I was closed as "not reproducible". However there is a reference to this one: Basically it says that the awt_robot child process is no longer needed in Mustang (Java 6). While this is not an option for us, at least there is hope for the future :)

Posted by Daniel Schneller on August 15, 2006 at 12:17 AM CEST #

No comments: