Sunday, April 09, 2006

Mind Deprecation Warnings

One of our projects allows to "emergency dump" some database contents to XML files. There are multiple reasons for this, one of them being unreliable networks. However, that's not my point here.

When the code detects its needs to dump out data, it creates XML files that mainly contain base64 encoded CDATA regions, because this proved to be the least problematic way to handle certain types of content. At a later time that data needs to be reinserted into the database. This works just the other way round, calling the corresponding tool in a "reverse" mode to put the decoded base64 data back into the database.

The only thing the people using this stuff do not like is that they have no idea what's encoded in the base64 region. So I started to write a second output module that displayed the data from the XML to the screen instead of putting it directly into the database. To make it look nice I wanted to simply apply a stylesheet after having decoded the base64 in memory.

On my way there I used a java.io.StringBufferInputStream without thinking about it. When I then had the transformer begin its work, it started to complain about

SAXParseException: Character conversion error: "Unconvertible UTF-8 character beginning with ..." (line number may be too low)

This drove me almost crazy, because I did not see the problem, because the file was definitely correctly UTF-8 encoded. Only after two hours of surfing and searching it struck me like lightning: The StringBufferInputStream's constructor was underlined with a nice yellow line in Eclipse. Hovering that line Eclipse happily showed me the deprecation warning that This class does not properly convert characters into bytes. As of JDK 1.1, the preferred way to create a stream from a string is via the StringReader class.

And in fact, after I had removed the offending line, it worked like a charm. Basically what happens is described in the javadoc of the read-method: It returns the low eight bits of the next character in this input stream's buffer [...]. So instead of handing the XML parser the next valid character, it just tore the UTF-8 sequence apart, making the parser fail.

Again I see that deprecation warnings may sometimes be more than just warning; using the marked methods anyway can bring you serious trouble. However I do not get it that Sun tends to mark things deprecated and replace them with something else instead of repairing them in the first place. Backwards compatibility is one thing, but leaving the old and broken stuff lying around clearly seems "Microsoftesque".

No comments: