Friday, August 31, 2007

Comfortable XPath accessor

Three weeks ago I blogged about a "Groovy way" to create XML documents in Java. This article now is about a convenient way to access XML without requiring more than a single class (downloadable here) on top of the JRE's own XML library functions.

In fact I wrote this class before I even looked for an easy way to create XML myself, because at the time I just had to parse some XML and extract the values in an easy to use fashion.

I believe it is easiest to show an example of how to use the class. Consider the following very simple Java class:

import java.math.BigDecimal;

/**
 * Simple value object for a contact.
*/
public class Contact {
public String firstname;
public String lastname;
public boolean withAccount;
public Integer numberOfCalls;
public BigDecimal amountDue;
}

Usually the fields would not be public of course, but for the sake of the example, just imagine the getters and setters being there ;-)

Now consider getting an XML string back from an external system that you cannot change:

<representative>
   <address>
         <street>Somewhere</street>
         <city>Over The Rainbow</city>
         <details>null</details>
   </address>
   <personal>
       <name>Someone</name>
       <firstname>Special</firstname>
       <age>55</age>
       <acct>true</acct>
   </personal>
   <supportHistory>
       <phone>
          <total>14</total>
          <x11>5</x11>
          <x12>9</x12>
       </phone>
       <incident>
          <id>1</id>
          <cost>1.50</cost> 
       </incident>
       <incident>
          <id>2</id>
          <cost>2.50</cost>
       </incident>
       <incident>
          <id>3</id>
          <cost>3.50</cost>
       </incident>
       <incident>
          <id>4</id>
          <cost>4.50</cost>
       </incident>
   </supportHistory>
   <current> 
       <due>44.12</due>
       <total>100.88</total>
   </current> 
</representative>

What's the easiest way to get this into the “Contact” class above, without using persistence frameworks, mapping tools and the like? What about this:

XPathAccessor acc = new XPathAccessor(someXML);
Contact contact = new Contact();

contact.firstname = acc.xp("representative", "personal", "firstname");
contact.lastname = acc.xp("representative", "personal", "name");
contact.withAccount = acc.xpBool("representative", "personal", "acct");
contact.numberOfCalls = acc.xpInt("representative", "supportHistory", "phone", "total");
contact.amountDue = acc.xpBD("representative", "current", "due");

I find this rather straightforward. Of course the Strings should be declared as constants somewhere to prevent typos.

The example is really simple, because we just extract some plain values from the XML. However, you have the full power of XPath at your disposal. This means you could do something like this:

BigDecimal getTotalIncidentCost(String someXML) throws SAXException,      
IOException, ParserConfigurationException, XPathExpressionException {
XPathAccessor acc = new XPathAccessor(someXML);
BigDecimal tResult;
Node historyNode = acc.getNode("representative", "supportHistory");
tResult = acc.xpBD(historyNode, "sum(incident/cost)");
return tResult;
}

The XPathAccessor instance could (and should) be reused, of course, depending on how often you need to access the document. As it caches XPath expressions that have already been used, it saves some cycles to re-use it for all accesses to a particular document.

Getting the “historyNode” first is of course not really necessary in this simple case, however sometimes it can come in handy to keep the expressions readable when you need to access deeper parts of the XML.

As with the XElement class, feel free to use this one, too. I would be happy to get feedback and/or improvements.

Thursday, August 30, 2007

How the Vulcan greeting came about

Ever gave a thought on how the Vulcan greeting (you know, the V shaped hand gesture) came about in Star Trek? Turns out this is really a fun and interesting story, not just some script writer conceiving it out of thin air. Have a look at this very funny video in which Leonard Nimoy, Mr. Spock, explains how it entered the show. 

Wednesday, August 22, 2007

Windows Date Created Timestamp strangeness

I have been using Windows since version 3.0 and thought  I had seen most of its subtleties. However today I found a new "gem" I had not encountered before.

We use two Perl scripts to do some FTP transfers regularly, scheduled by the Windows "Scheduled Tasks" to run several times a day. The scripts both use a common function to append to their respective daily log file. In case a file is older than 5 days, judging by its age in days based on the "Date Created" timestamp, it will be deleted and a new one created under the same name. This is intended to not have the files grow indefinitely.

While one of the jobs worked just fine, appending to its log files and rotating after 5 days, the other seemed to overwrite its log file on each run. Strangely enough  - as said before - they both used the same log function, just with different file names.

After some poking around in the scripts source we decided to make a more low level test to see whether this had anything to do with either Perl or the "Scheduled Tasks". We created a file from the command line by just redirecting an echo command. We then deleted the file and recreated it again by the same means. This is what we got (German Windows XP, but behavior observed was equivalent to Windows Server 2003 and Windows Professional 2000):

C:\quantum>time /t
22:46

C:\quantum>echo "quantum" >> mechanics.txt

C:\quantum>dir /T:C mechanics.txt
 Volume in Laufwerk C: hat keine Bezeichnung.
 Volumeseriennummer: 0CA5-E6F2

 Verzeichnis von C:\quantum

22.08.2007  22:46                 8 mechanics.txt
               1 Datei(en)              8 Bytes
               0 Verzeichnis(se),  8.096.706.560 Bytes frei

C:\quantum>time /t
22:47

C:\quantum>del mechanics.txt

C:\quantum>echo "temporal" >> mechanics.txt

C:\quantum>dir /T:C mechanics.txt
 Volume in Laufwerk C: hat keine Bezeichnung.
 Volumeseriennummer: 0CA5-E6F2

 Verzeichnis von C:\quantum

22.08.2007  22:46                 8 bar.txt
               1 Datei(en)              8 Bytes
               0 Verzeichnis(se),  8.096.706.560 Bytes frei

C:\quantum>

Notice the experiment starts at 22:46. A file is created and its creation timestamp is automatically set to 22:46 (dir /T:C shows the creation time). Then, at 22:47 the file is deleted and recreated with different content. Again the file creation date is 22:46 while the "Date Modified" timestamp is 22:47.

Looking at the timestamps of the log file mentioned above we noticed that the "Date Created" attribute was back several months. Apparently every time the file had been deleted and recreated the original timestamp had been restored as well.

Looking around the net for some explanation for this (we suspected a filesystem bug) after a fair amount of searching I finally came across "The Old New Thing: The apocryphal history of file system tunnelling" and through that Microsoft Knowledge Base article 172190, titled "Windows NT Contains File System Tunneling Capabilities".

Basically what happens is that Windows caches the timestamp of the deleted file for some time. In case a new file appears under the same name, it will get the cached value. The default maximum time period between deleting and re-creating the file is 15 seconds. However we could also see it happen with more than 8 minutes! I do not yet understand how this comes about...

Once I knew the keyword was "tunnelling" it was also easy to find this 2001 post from a discussion on Bugtraq. I completely agree with Ken Brown who wrote that post (especially concerning the 2nd paragraph):

"Tunnelling" is a long way from any keywords that I'd associate with file systems - and a search for "tunnelling and ntfs" turns up a great many references to VPNs and bits of networking. It now turns out that it isn't really a property of the file system at all, which obviously makes the search even harder. 
[...]
Obviously not serious, but I bet that someone, somewhere, has an application that depends on file creation dates and wonders why it goes wrong every now and again. That is a *mild* potential security problem, if only because it could cause confusion. Documentation bugs can be security problems. Unexpected or unwanted behaviour from a machine is always a potential security problem.
[...]
The accumulation of seemed-like-a-good-idea-at-the-time backwards-compatible gotchas in the Windows file systems [...] all combine to introduce uncertainty and unpredictability, which leaves gaps for security errors.

Because one of our scripts took more than 15 seconds (the default time-to-live for cache entries) between deleting and recreating the file as it was scheduled during a high load time, it literally took advantage from the lack of resources.

The other script, running during far less busy periods had always been fast enough to trigger the tunnelling "feature" and get the cached creation time over and over again.  Of course the workaround for the scripts is simple: Consider the "Date Modified" attribute.

A more drastic approach would be to change the registry settings controlling this behavior. See the knowledge base article for details. 

But seriously - I will never understand (and even less appreciate) - why Microsoft tends to always choose backwards compatibility over reliability. Maybe this was a sensible feature right when long filenames on FAT volumes came about, but seriously, is it necessary to carry this all the way to the 2003 server? I bet it is still present in Vista as well....

Friday, August 17, 2007

Gnome Nautilus SSH fails when hostkey changed

Today I tried to upload some files to my server via Nautilus. Months ago I created an SSH connection to my home folder via the "Places - Connect to Server" option on the main menu. It allows you transparently use SSH via the graphical user interface.

However for some reason trying to double click the desktop connection just did not do anything at all. Selecting the entry in an open file manager windows led to a confusing error message:

Nautilus cannot display "ssh://shipdown.de".
Please select another viewer and try again.

Another connection, set up via WebDAV worked without problems. It occurred to me that this might have something to do with the recent crash of the server which made it necessary to set it up freshly. This of course included the generation of a new SSH host key. Trying to connect via the command line confirmed this:

ds@yavin:~$ ssh ds@shipdown.de
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Please contact your system administrator.
Add correct host key in /home/ds/.ssh/known_hosts to get rid of this message.
Offending key in /home/ds/.ssh/known_hosts:1
RSA host key for shipdown.de has changed and you have requested strict checking.
Host key verification failed.

Turns out that Nautilus fails for the same reason. Once I had edited the .ssh/known_hosts file and replaced the old key with the current one, the connection worked again.

There is a Ubuntu Bug (#41738) entry as well as a Gnome upstream report (#322501) describing this. However as this is rated as a low priority bug and has been known since Gnome 2.14 I do not expect it to be fixed very soon, so I thought it was worth noting here.

Wednesday, August 15, 2007

GMail advanced search operators

Several rules in my GMail account apply various labels to mails and mostly archive them right away. This is handy for newsletters, mailing lists and the like. However over time - if you do not read them all - there are unread mails, scattered around several labels which are too old to show on the first 50 items.

Up to now I sometimes went through the following pages, using the "Select unread" and "Mark as read" functions. Today I stumbled across a page that mentions a search term to show only unread mails. I had suspected something like this must exist, but there is no GUI feature I know of to use it.

So in addition to any other search criteria you might have (e. g. "label:newsletter") you can just add "is:unread" and only get those mails that you haven't looked at before. Other things I just tried out and found to work:

  • is:starred - applies to messages with stars.
  • is:unread - applies to unread messages
  • is:read - applies to read messages
Only after that I found this page on the GMail Help Center. Sometimes life can be so easy, if you just know where to look... :-)

 

Sunday, August 05, 2007

Building XML the Groovy way in Java

When working with XML - i. e. creating XML documents - the Java DOM API is a little cumbersome. You have to ask all sorts of factories for instances of themselves, those instances for documents, elements and so forth. It is usually a lot code to write, even if all you want is a little XML fragment with only a few elements, e. g. to be sent over the network to some server. One way to make your life easier is to resign to StringBuilder/StringBuffer and building the XML "by hand". However this is error-prone and not always easy to read.

Recently I had to implement a service that responded with XML over the net, building the document by collecting data from several sources and combining them together. The first version I wrote used the DOM API and once finished was hard to read even for me. I would have liked to use Groovy's MarkupBuilder for this, however company policy does not allow this (yet).

So I looked around for a similarly easy to read solution in plain Java. I found this entry in The Ancient Art Of Programming. It discusses the problem of the verbosity of the Java solution and compares it to the clarity of .NET's rather new Linq feature. In one of the comments Erik-Jan Blanksma posted an implementation that mimics the syntax using variable argument lists. I took the classes and built a second version of my code with it. Now the Java code looks almost like the Groovy MarkupBuilder code, except that I do not need any additional libraries. Take a look at this example:

private String getResult() {
XElement tResult = new XElement("result",
new XElement("error", "0"),
new XElement("receipt",
new XElement("head",
new XElement("id", adapter.getId()),
new XElement("ctry", adapter.getCountryCode()),
new XElement("region", adapter.getRegionCode()),
new XElement("primary", adapter.isPrimary())
)
);
return tResult.toString();
}

This is really great to use. I added some tweaks to the code to make it more robust and suitable for my needs. Apart from adding some error checking (no empty element names, null values etc.) I also created an interface for classes that convert a java.util.Collection into a corresponding array of XElements. This was necessary, because the original version did not allow dynamic additions to the XML. This interface is intended to allow worker classes that somewhat mimic the Groovy syntax of just intermixing loops etc. A simple implementation could look like this:

public class SimpleWorker implements XCollectionWorker<String> {
public XElement[] work(Collection<String> aCollection) {
List<XElement> tList = new ArrayList<XElement>();
for (String tString : aCollection) {
tList.add(new XElement("string", tString));
}
return tList.toArray(EMPTY_XXELEMENT_ARRAY);
}
}

Using it to add a list of Strings to an XML document can be done like this:

private String getResult(Collection<String> aStringCollection) {
XElement tResult = new XElement("result",
new XElement("error", "0"),
new XElement("receipt",
new XElement("head",
new XElement("id", adapter.getId()),
new XElement("ctry", adapter.getCountryCode()),
new XElement("region", adapter.getRegionCode()),
new XElement("primary", adapter.isPrimary())
),
new XElement("someStrings", aStringCollection, new SimpleWorker())
)
);
return tResult.toString();
}

I have packed the necessary files into this ZIP archive. Feel free to use them, if they suit your needs. If you make any improvements, I would be happy to hear from you.