Monday, January 21, 2008

SAXParseException: -1:-1: Premature End Of File - Misleading error

Today I had to look at a piece of code a colleague had written, using my XPathAccessor class. She used it in a servlet which gets XML formatted requests. As those are generated by an external 3rd party tool we agreed on some XML schema definitions. Everything they send us needs to conform to its corresponding schema, each reply we send gets validated against a different set.

In order to allow independent testing on either side, we provided a little test kit that allows testing our system without having to set up a servlet engine. Basically it just takes a file, reads it into a String and hands that to the handler.

First it gets parsed without validation. This is necessary to find out which type of request we were send (the address is the same for all of them). After the root element is known, it will be read again, this time using the right schema to verify the request.

Once that is done, some reply is put together and sent back to the client. So far, so good.

When I looked at the code I could see nothing wrong. Nevertheless each time a request was sent, we got a

2008-01-21 10:02:12,889 INFO  [STDOUT] [Fatal Error] :-1:-1: Premature end of file.
org.xml.sax.SAXParseException: Premature end of file.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

I had a look at the source of the XML input and first suspected a superfluous CR-LF after the closing element of the root tag. On the net some people claimed that this might be a cause for the error above. But removing that did not help.

This is the relevant code that handles the request:

public String readXmlFromStream(InputSource anInputSource) {
    String tResult = null;
    try {
        XPathAccessor reader = new XPathAccessor(anInputSource);
        String type = reader.xp("rootElem", "reqType");
        if (type.startsWith("K")) {
            Schema schemaK = XElement.getSchema(this.getClass()
                .getResourceAsStream("/schema/K.xsd"));
            XPathAccessor validatingReader = new XPathAccessor(anInputSource, schemaK);
        ...

The last line throws the "Premature end of file" SAXParseException. The constructors of XPathAccessor look like this:

public XPathAccessor(InputSource aSource) throws SAXException, IOException, ParserConfigurationException {
    this(aSource, null);
} 

public XPathAccessor(InputSource aSource, Schema aSchema) 
    throws SAXException, IOException, ParserConfigurationException {

    Validator validator = null;
    builder = factory.newDocumentBuilder();
    document = builder.parse(aSource);
    if (aSchema != null) {
        validator = aSchema.newValidator();
        validator.validate(new DOMSource(document));
    }
} 

Curiously in case of the Servlet no files are involved at all. Everything is in memory, so "Premature end of file" is not too helpful anyway. The solution to this mess can be found - sometimes it turns out to be helpful - in the API documentation for the InputSource:

An InputSource object belongs to the application: the SAX parser shall never modify it in any way (it may modify a copy if necessary). However, standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.

This last sentence is the clue: Because the InputSource has been used to find out the type of request, it cannot be used again for the validating XPathAccessor. In that light the error message at least makes a little sense: The underlying stream has been read to its end and been closed, so one might call that the "end of file"; and because the (2nd) XPathAccessor has just tried to read it from the start, "premature" might be a valid qualifier...

Knowing that also explained why the test suite worked fine; it read the XML contents into a String, which another set of overloaded constructors for XPathAccessor can accept. Of course strings can be read as often as you like, so no problems there.

As the docs do not give an immediate hint, I hope someone finds this post to save him/her some time.

Wednesday, January 16, 2008

Can't start server: Bind on TCP/IP port: No such file or directory (some progress)

Back in October last year I wrote about a peculiar MySQL error message:

Can't start server: Bind on TCP/IP port: No such file or directory

This error only seems to occur on Windows (Server 2003 in this case). While the first part is clear - a port could not be bound, because it was already occupied by another process - the second part does not make any sense.

We got in contact about this with MySQL support recently, because we came across the problem again on a machine that had received the "treatment" described in the earlier article which did not seem to work. Turned out that server had not been rebooted ever since, so the registry change was never activated.

However as a nice side-effect we now know (almost for sure) what that strange "No such file or directory" is all about: see MySQL Bug #33137. It has been categorized as a feature request - I doubt it will get any serious attention soon. So I will just summarize here to have it all in one place.

Apparently when the bind() call that is issued on MySQL server startup fails, on Linux and other platforms the cause for that failure can be queried from the error number (errno) return value. This does not seem to be the case on Windows. According to Axel Schwenke of MySQL support's research this is well documented on the MSDN reference of the Winsock bind function. According to that errno is not set to any meaningful value as a result of the bind() call. Instead you are advised to use WSAGetLastError() to find out if your call succeeded or what went wrong.

MySQL currently does not follow that advice, so the error displayed (errno=ENOENT=-1) does not have anything to do with the bind call at all. As a matter of fact there is currently no way to tell what really happened when that call failed. The only "real world" reason that comes to mind is in fact - as mentioned above - another process already occupying that port.

Be sure, should that feature request ever be implemented, you will find the (final) post about it here :)

Tuesday, January 08, 2008

Preload-mania killing aging machines

Over the holidays a friend of mine asked me to have a look at his machine, because it was extremely slow and barely responding at all anymore. When he tried to print a few documents it took more than 10 minutes for the first page to come out of the printer after the job had been started.

First I suspected a hardware problem, but when I booted it up I quickly realized that this was a bad case of "freeware congestion" combined with "pre-load-mania". When I had last set up the machine I had just put F-Secure Anti-virus, the necessary hardware drivers, Word and Firefox on the machine.

Hardware-wise a Celeron 2,4GHz with 512MB RAM is not exactly a high-end machine, but for web surfing, some emails and the occasional letter it should be very sufficient.

Now, about a year and a half later the login screen was followed by a desktop building up icon-wise and a colorful ICQ login even before the anti-virus software's splash screen. After that the hard drive kept working furiously for several minutes, trying hard to bring in about 15 icons to trickle into the system tray.

Most of those were related to some clever tool, e. g. a weather monitor, a world clock, a picture-of-the-day screen saver control panel, graphics- and soundcard driver helpers, and various other utilities, including some Yahoo Widgets.

Of course most of these had registered themselves as an auto run, some even with a background service. Once all those programs were started the task manager showed around 540MB of committed memory - more than the whole physical memory available and without having opened only one "useful" foreground process.

Using Sysinternals' AutoRuns tool I had a look at all the different places that can be used for running software on logon or boot, and apart from all the (presumably) tiny gadgets and widgets I also found a lot of the ubiquitous pre-loading parts of all sorts of common software: Adobes PDF Reader, Microsoft Office, something from iTunes and several others.

This is something I have always hated, and the older your machine gets, the worse it becomes: Automatic updates to the latest and shiniest new version of any given piece of software have become absolutely commonplace. While this takes the burden off the user to keep up-to-date and get patches for security vulnerabilities is has a serious downside, too. With every new version software tends to become bigger and more bloated. Moreover every vendor seems to believe that the primary and sole purpose of any machine will be to run their and only their software. Given that - completely wrong - assumption it must seem all natural to them to pre-load seemingly all of their software on boot, sitting ready in the background, just waiting for you to click the icon to issue the final call to "MainWindow.setVisible(true)" and be up almost instantaneously.

While this might work if you really use only that one program all the time and have a sufficiently large amount of RAM in your machine, this might actually work out. But this is complete unrealistic. Nobody in their right minds would boot their machine in the morning and manually launch Acrobat Reader, all MS office apps and every application they might possible use that day just to have them ready. It is immediately apparent to even the novice user that this is probably not making the machine more responsive. 

But this is - almost - exactly what happens with all the auto run entries: You just don't see them on the screen immediately. So one thing I always do after installing any software is double-check whether it just registered some sort of auto run and if so remove it.

Along that road I tend to replace more and more of the "big" products with leaner and slicker alternatives. E. g. instead of the Adobe Reader I nowadays usually install FoxIt Reader. Instead of Paintshop Pro (nowadays owned by Corel) I use Paint.NET and IrfanView etc. etc.

This really helps a lot - try for yourself, especially the difference between Adobe and FoxIt Reader is impressive.

Back to my friends machine: For about half an hour I tried to tidy things up, however the sheer amount of auto run tools, services etc. was too overwhelming. So I decided to do a fresh install. Not wanting to backup all user data (around 70GB) to DVDs and not having an external drive at hand I could not reformat the drive during the installation.

That meant that doing a new install in parallel to the existing one would reuse the existing "Program Files" and "Documents and Settings" folders - something I did not want in order to keep the new setup as clean as possible.

Because you cannot rename those folders from a running Windows I booted from a current Knoppix CD and mounted the NTFS partition. From here I just changed the folder names to "DocsAndSettings.old", "Programs.old", and "Windows.old".

Once I had done that I reinstalled XP from the CD I had prepared for the previous install already with XP ISO Builder. The allowed me to just leave the machine working, because it would automatically configure a user account, install all hardware drivers - even those not part of the default Windows CD-ROM - and set up the network to use DHCP. After about an hour I came back and inserted the c't offline update disk to bring Windows up to date.

Once all necessary applications had been reinstalled - which - given that you have all setup disks available and don't have to go download anything - should be done with the anti-virus program being last to save some more time - I just moved the "My Documents" folder back in place.

Boot time from POST to desktop is now around 35 seconds with everything ready to go after about 1:15 minutes. Memory load is around 310MB with a significant portion being taken by the inevitable anti-virus/anti-spyware software. However launching a browser or a home-banking app is now a matter of seconds instead of minutes, and printing a letter also no longer requires you to take a day off from work.

When I presented the results my friend had a hard time believing that I did nothing but completely dispose of all the gimmicks he had found useful and funny over time. We'll see how long he can resist this time.

Wednesday, January 02, 2008

Switch/Case and Autoboxing/Autounboxing

Because I have seen people trying this several times in the last couple weeks I decided I might just as well write a post about it: A seemingly common misconception of Java 5's auto(un)boxing.

While it has been a while now since we moved to Java 5 only now are people slowly getting familiar with the new syntax features. Most of them learned Java beginning with 1.4 and have a history in the DBase or FoxPro world. So object oriented programming and Java as one implementation of it are understood, however maybe not as deeply as you would expect. Some are especially impressed by the ease of use autoboxing and -unboxing bring to the wrapper classes for primitives. I also find that feature quite useful, because objects falling out of the persistence framework have full-blown object types for boolean values or numbers. This makes it rather cumbersome to work with them. Autounboxing helps a lot there:

if (theBusinessObject.isCompleted().booleanValue() && theBusinessObject.getNumber().intValue() > 5) {
...
}
if (theBusinessObject.isCompleted() && theBusinessObject.getNumber() > 5) {
...
}

Undeniably the second example is much easier to read. It becomes even more obvious once you start doing calculations based on wrapped primitives. Nevertheless problems may arise if you do not know what this new syntax will do under the covers. In the above case it is quite clear that the compiler will just put the calls to ".booleanValue()" and ".intValue()" into the bytecode on your behalf.

Consider this example:

public class Box {

 private static final Integer one = 1;
 private static final Integer two = 2;
 private static final Integer three = 3;

 public static void main(String[] args) {
  int myInt = three;
  switch (myInt) {
   case 1:
    System.out.println("One");
    break;
   case 2:
    System.out.println("Two");
    break;
   case 3:
    System.out.println("Three");
    break;
   default:
    System.out.println("None");
    break;
  }  
 }
}

Thanks to autoboxing the reference type variables "one", "two" and "three" can be assigned using a primitive int on the right hand side of the "=" sign. And because of the autounboxing of "three" in the first line of "main()" it can be assigned to "myInt". After that you find just a regular switch/case construct on that primitive int.

Decompiling this using e. g. javap reveals the "magic" behind this:


D:\temp>c:\jdk1.5.0_12\bin\javap -c Box
Compiled from "Box.java"
public class Box extends java.lang.Object{
...
public static void main(java.lang.String[]);
  Code:
   0:   getstatic       #2; //Field three:Ljava/lang/Integer;
   3:   invokevirtual   #3; //Method java/lang/Integer.intValue:()I
...

static {};
  Code:
   0:   iconst_1
   1:   invokestatic    #10; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   4:   putstatic       #11; //Field one:Ljava/lang/Integer;
   7:   iconst_2
   8:   invokestatic    #10; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   11:  putstatic       #12; //Field two:Ljava/lang/Integer;
   14:  iconst_3
   15:  invokestatic    #10; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   18:  putstatic       #2; //Field three:Ljava/lang/Integer;
   21:  return
}

At index 3 in "main()" you can see the automatically inserted call to java.lang.Integer.intValue(). This is the autounboxing. In the static initializer it goes the other way round: The compiler inserts java.lang.Integer.valueOf(int) at indexes 1, 8, and 15. Here the autoboxing takes place.

So far so easy. Now look at this:

public class Box {

 private static final Integer one = 1;
 private static final Integer two = 2;
 private static final Integer three = 3;

 public static void main(String[] args) {
  int myInt = three;
  switch (myInt) {
   case one:
    System.out.println("One");
    break;
   case two:
    System.out.println("Two");
    break;
   case three:
    System.out.println("Three");
    break;
   default:
    System.out.println("None");
    break;
  }  
 }

}

Trying to compile this will fail:

D:\temp>c:\jdk1.5.0_12\bin\javac Box.java
Box.java:10: constant expression required
                        case one:
                             ^
Box.java:13: constant expression required
                        case two:
                             ^
Box.java:16: constant expression required
                        case three:
                             ^
3 errors

I have seen this pattern numerous times, and whenever someone comes across it they seem to wonder what the difference is compared to the first example and why they get a compile error. They expect unboxing to happen at each "case". However they do not realize that this is not the same as putting the primitive value there, but is resolved to a method call under the covers - which of course is illegal in that context.

I have found it helpful to show people the bytecode output that gets generated in the first case. As a side effect they also usually learn for the first time about the existence of decompilers :)

One last piece of advice: Eclipse has a feature to specify a different syntax coloring for places where autoboxing and -unboxing occur. I recommend defining a clearly recognizable format, e. g. I use underlined text in a dark red color. I find it rather helpful to remind me that sometimes in such situations a null-check is a good idea - after all reference types might be null opposed to primitive values.