Thursday, January 26, 2006

CVS: Encoding mixup

Having completed the migration of our CVS server I am now slowly coming around to get loose ends tied up. One very unnerving thing was the now mixed encoding of log messages in the CVS histories. Comments that were saved before the migration now show up with little squares instead of the German umlaut characters whereas only the new ones are displayed correctly (UTF-8).

This would not be too much of a problem, one might say, because the older the comments the less they are needed anymore as development goes on. That's what I thought, too, until I tried to get the MySQL based commit database of ViewVC running. Setting up the schema was no problem, however I had to make several columns wider than the default, because they were too small for our needs.

While fixing that I upgraded from MySQL 4.0 to 4.1, not expecting too much trouble. Already anticipating something like older libraries and/or python bindings I disabled to password for the user updating the database. So far it worked, but as soon as ViewVC tried to insert anything into the database lots of warning concerning unknown character sets were issued. I had set the default character set of the MySQL server to UTF-8, because that's what we use on all our servers.

After some googling around I found myself building a new version of the python MySQLdb module. That at least got rid of the annoying warnings over the place.

However I also had to adapt the cvsdb.py script, because when I tried to import all the existing commits into the database, it would insert all the UTF8 sequences in the newer comments directly into the database, making a mess of those. While I was willing to accept the distorted look of the old comments, I surely was not with the more current ones. So after some more googling and fiddling with the python code, I finally managed to get all of them right, by first trying to UTF-8-decode the messages that come from CVS's file history output. In case they only contain ASCII characters, nothing happens. In case I have old comments that contain German umlauts, I get a UnicodeDecodingException which I catch and just insert the comment into the database as is. And, finally, the new comments containing UTF-8 sequences get cleanly resolved and inserted correctly into the database. Although I do not like the idea of flow control by exceptions very much I accept it, because I guess it will just be a helper construct to get the old comments into the DB correctly.

Now the only thing I have to do is generate some nice looking changelogs from the database :)

If anyone's interested in the change I made to the ViewVC script, just post a comment and I will provide a few lines.

Sunday, January 22, 2006

Migrating CVS from Debian to RedHat

A busy week behind me, moving our CVS repository from Debian to RedHat Enterprise 4 on Wednesday and Thursday.

Ready

Three years ago we began development of a new (Java) software solution for our main customer. To handle the several teams and the code they produced, we introduced our very first CVS server. Because many people in our company have a thorough liking for Windows we used a Windows 2000 Server installation with CVSNT. Nobody really cared about the warnings on the net using Eclipse (2.1 at the time) with CVSNT.

However it did not take too long until (almost) everybody realised that a Linux server should be the way to go. We went for a RedHat Enterprise Linux 2 but installed a VMware GSX server on it. Inside that VMware we created two virtual servers with Debian Woody. The one was to be a "dedicated" CVS server, the other one a build and test machine. For almost three years now this has been a (mostly) reliable setup. Sometimes the VMware would crash and take both virtual machines down with it, however this happened only about once per year.

Over these three years our repository has grown steadily to currently about 20GB in size, containing all sorts of stuff. Most of it in terms of the number of files is Java source code and ressource files, however there are also lots of office files, zips, exes, rpms and so on. Especially branching and merging began to show the limits of the existing hardware (Dual P4, 2.4GHz, 2GB RAM, RAID5 minus the overhead of VMware). Some time ago we already moved the build server to a separate physical machine, to ease the load and speed up CVS again. However in November 2005 we ordered a new server (Dual Xeon, 3.2GHz, 4GB RAM, SAN connection) which was finally (I won't elaborate on the unfortunate details between then and now) handed over to us on Wednesday, readily installed with RedHat Enterprise 4.

Set

The main goal was to keep downtime at a minimum and have the developers not notice anything apart from the increased speed. All accounts and passwords were to be migrated to the new machine. Moreover the ViewCVS (now ViewVC, see its new homepage) service was to be installed on the new machine, too.

First thing I did was copy the relevant lines of /etc/passwd and /etc/shadow over. Luckily the transition went from Debian to RedHat and not the other way round, because Debian assigns UIDs from 1000 up while RedHat already starts at 500. So there were no clashes there that would have required later updates to the file permissions and ownerships.

Then I rsynced the old servers three repositories to the new machine. I had thought about using some combination of netcat and tar, but I decided against it, because with rsync I could easily copy the data over and keep the old server online at the same time. Later I would just take it offline and transfer the delta, which I expecetd to save dramatically on the period of time during which the old server was already taken away and the new one not available yet.

Seemingly everything went fine, but some of the lines of rsync's output had scrolled by and somehow looked strange. When I looked a little closer I saw that all filenames containing non ASCII characters had been somehow corrupted. This being a German system with lots of people working on provided for quite some of these cases. The problem was the difference in file system name encodings between the somewhat older Debian installation and the new RedHat system. While the Debian system had stored all file names in ISO8859-15 (German, including the EURO Symbol) RedHat now used an UTF-8 based file name encoding. I did not realize till then, that file names are just bytestreams to the system and that whatever meaning they have soleley depends on the selected encoding of those who read them. This is somewhat different from Windows, where the file system contains some meta information about which encoding was used when the file was created. It did not dive into the details, but started looking a for a solution to this. In the end I came across a perl script called convmv that can recursively rename whole directory trees and translate between file name encodings on the fly. A dry run (very nice feature of the script, you have to explicitly state "--notest" on the command line to have it actually change something on your disk) looked good, so I decided to convert the filenames once the last rsync run would have finished.

Another way to go would have been to change RedHat's idea about file names to ISO8859-15, but as that would probably only have pushed the problem farther into the future but would not have solved it, I decided against it.

The next step was to make ViewCVS work on the new machine as well. Although on both Debian and RedHat I used Apache2, there were some nasty little problems I came across. My first guess would have been that a simple copy and paste cycle on the Apache configuration (taking into the account the IMHO much nicer organisation of the config files on Debian) would do it. However in some details the servers seem to have been compiled with different presets, so I could not get it run just like that. The most important thing was the order of two aliases that would allow me to have both http://cvs-server/viewcvs and http://cvs-server/viewcvs/ (see the trailing slash) be the same as http://cvs-server/cgi-bin/viewcvs.cgi. For some reason the alias definition for the URL including the slash had to be first, which on RedHat led to a recursive cycle that Firefox complained about. However, after some fiddling to get the stylesheets and logos right it finally worked like charm (and way faster, especially when generating a graphical view of the branches and tags of a file with cvsgraph).

After checking the file permissions on some special files (e. g. the CVSROOT directory and its contents) I configured xinetd to offer the CVS pserver. However just to make sure that nobody who had somehow learned the new servers address connected yet, I had it listen on a different port than 2401 by changing the /etc/services file. Connecting my (Windows) Eclipse to this new location was simple enough, however I was somewhat taken aback when I saw the file names of files with Umlauts showing as the UTF-8 sequences instead of the "real" characters. Changing the Server Encoding setting in the repository location's properties dialog solved the problem, however that now meant that all developers would need to update this setting, too. This was against the principle of not having anyone make any changes to work with the new machine, so the discussion about keeping the file names in ISO8859 format came up once again.

Making matters worse is what seems to be an Eclipse bug I filed unter Bugzilla #124499. Eclipse always changes the encoding of a configured repository location back to the default when I change any of the other settings in the dialog. All users of the new CVS server now have to keep this in mind to avoid problems with file name encodings. However finally we still stuck to UTF-8 and prepared an instruction video with Wink, a freeware equivalent of Camtasia or the like. We will still have to see how many problems we will run into...

Go!

Then, finally, I took the old server offline and rsynced the latest changes to the new machine. I must say I had not used rsync before, but it fully met my expectations. While the initial transfer of all data had taken about 75 minutes, updating the new machine with recent updates took only around 12 minutes, and a great share of that time was just for preparing the (rather long) file list. I then ran the convmv script on the data and had our network people update the DNS alias for the CVS server to point to the new machine.

Finally, after some last tests, I could send the mail I had prepared earlier and tell people that CVS was back online, and that they needed to change an Eclipse setting. From what I can tell up to now, there have not been any problems worth mentioning. Some people needed their passwords reset because they had forgotten them and somehow managed to have Eclipse also forget it, but that was about all.

What?s left

One of the open ends is the cvsdb feature of ViewCVS. It allows you to query a Bonsai compatible database of CVS checkins, logs, people etc. ViewCVS provides scripts to fill such a database from an existing repository and scripts to hook to CVS?s handlers. However what is missing is the code to also store CVS tags in the database. This however would be the most important feature for us, because right now we use cvs2cl to generate changelogs between release builds. Those are marked with CVS tags. Currently generating a changelog requires a full update of the corresponding branch and a quite long running collection of log messages and other information. At the moment it takes about 70-120 minutes to generate a changelog between to tags. I very much hope to dramatically accelerate this by using database queries instead of having to bother CVS every time for all files? histories. However I have not been able to get it working so far. On the ViewVC mailing list someone pointed out a few patches, but those aim at older version of ViewCVS. I still have not figured out how to make it work with the current revision. Maybe my Python is just not good enogh (yet J).

Thursday, January 12, 2006

Translucent Windows with Swing

y chance I stumbled across a solution to problem long unnerving me: translucent windows with Swing. We are developing a Java application for till systems with a touchscreen GUI. Part of the applications styleguides includes dialogs and popups with rounded edges, somewhat like the windows in Mac OS X. While we managed to get the rounded edges and a tiled background right with images arranged precisely through a custom border, they were never really rounded in a technical sense. There were always small rectangular remains at the edges, because the JPanel itself was rectangular of course. All we could do was set the color of the dialogs background pane to something very closely matching the overall background color scheme to have these remains as unobtrusive as possible. Every person I asked told me about using JNI and some native code, as long as the underlying platform supported alpha channels and things like that. As the application must run correctly on both Windows (2000/XP) and X11/Linux, this was not a viable option and so we were prepared to resign to the fact, that there was no way to do it... Until today!

By pure coincidence when looking for something completely different (not even Java related) I found this page. It is an excerpt from O'Reilly's Swing Hacks book. It contains a piece of code that gives you the effect of a seemingly translucent window by taking a snapshot of the screen contents behind it and use part of it as the background image for the dialog to be opened. This effectively makes the background shine through the parts of the dialog you do not explictly use for something else. The only thing that does not work is the automatic update in case the dialog is moved around on the screen (resizing is handled, however). But this is no problem for us, as we only use undecorated popups that cannot be moved anyways.

I wonder if I should buy the book and see what other cool stuff can be done without resorting to JNI :-)

Update: I have written a follow-up more recently. Be sure to have a look at it, too.

Tuesday, January 10, 2006

MySQL Indices

I did not get around to go after the sound problem with my new Fedora installation. However I did write a little program I had had in my mind for some time now: A MySQL duplicate index detection tool.

Where I work we use a Java application framework that generates database schemas from UML class models. Unfortunately no one took care to deactivate a feature that creates an extra index on every table, that is the same as the primary key. This is obviuosly a waste of space and takes longer to update. From what I understand from the MySQL manual the optimizer will never use it, because it would be slower than accessing the primary key (which is the clustered index in InnoDB).

So instead of finding all those superfluous indices' (generated) names manually, I decided to write a little tool to help me. While I was working at it I sized it up a little to find any index B that is a complete prefix of a longer index A. Unless A is substantially longer than B I guess it will be faster to use A for reading while dropping B altogether, thus saving the update time and disk space.

I wrote it as a simple Java class, that maybe someone will find useful. It produces output similar to this:

Table foo
--------------------------------------------------------------------------------
PRIMARY (cols: 1:X_usa_id, 2:agsl, 3:a_index)
   obsoletes IDX_1(cols: 1:X_usa_id, 2:agsl,3:a_index)
   obsoletes IDX_3(cols: 1:X_usa_id)
IDX_4 (cols: 1:X_usa_id, 2:a_index, 3:agsl)
   obsoletes IDX_3(cols: 1:X_usa_id)

Table bar
--------------------------------------------------------------------------------
...

Currently it does not take into account the uniqueness and/or cardinality of the indices, because that was not a concern in my project. May I will update it to do so sometime. If anyone else cares to improve it, I should definitely like a copy.

Link works now DuplicateIndexFinder.java

Monday, January 09, 2006

Fedora Core 4

For quite some time I wanted to install FC4 next to my XP at home, however I never found the time. Last week I thought about it again and was on the edge of leaving it be another few weeks, because I was not in the mood to change my hard drives' partitioning. To my surprise I found 60GB of unpartitioned space lying around. Obviously I had left them free when I installed the new hard disk almost a year ago, probably for a Linux installation :)

I downloaded the FC4 DVD image, verified the checksum and burnt it onto a DVD-R. Then I booted from the DVD, seeing the initial screen of Fedora asking for boot parameters (skip hardware detection, rescue mode etc.) Wanting a "regular" install I just hit return. As a good start, a few seconds later, right after loading the kernel and initializing the initrd, I got a kernel panic. I cannot say that I was particularly amused, not even getting anywhere near the actual setup sequence. Trying for about half an our modifying the kernel parameters I got nowhere further. On the verge of giving up and using the unpartitioned space for Windows, too, I googled for "fedora core 4 kernel panic". I did not hope for anything helpful with such generic search terms, but to my surprise one of the top results actually offered help.

It said something like "enter some garbage at the boot prompt, then hit enter". And in fact it worked! When the boot prompt had complained about the unknown input (something like sdjkghfskjdhf) I just pressed enter again and the installer started! Now that's something to remember, isn't it?

Installation went fine and after some fiddling with the xorg.conf I got the dual-head setup with the nVidia driver right as well. However there still seems to be some problem with the sound card. Maybe it is confused because there are both a SB Live and the onboard audio chip. I will look into that tonight. Maybe I just have to enter some garbage into a config file...

Thursday, January 05, 2006

Microsoft Natural Ergonomic Keyboard 4000

For a very long time one of the first Microsoft Natural Keyboards has served me dutifully, however when I read about the new 4000 I decided to retire the old one and replace it with its modern successor. I especially appreciate the "sane" layout of the Home, PageUp, PageDn etc. and cursor keys, one of the reasons I did never buy any of the other variants of MS's keyboards, that had them shuffled around for no reason immediately apparent to me.

So I ordered a 4000 from amazon.de for 43€ and was very pleased with its look, when it finally arrived. However reality soon had me back when I found out that the volume control keys did not work at all. The mute key and all of the other special keys worked perfectly, however. So I tried the usual procedure of uninstalling and reinstalling the IntelliType driver software, numerous reboots and unplug/replug cycles. Bottom line: Nothing made the volume keys work. I even began to suspect a hardware malfunction, but several people on the net reported the exact same problem. I tried on Windows 2000 SP4, others on XP and various patchlevels; nothing seemed to help.

For a while I decided to ignore the problem and guess what... Someone else solved it in the meantime. Googling around, looking for some new ideas I stumbled across a forum entry (don't remember the URL) and found the following solution:

rundll32.exe hid.dll,HidservInstaller /install

As it seems sometimes Windows gets confused about some HID settings and running the above command from a CMD shell or via the Start Menu's Run... option fixes the keys. On one machine they started working immediately, on my other computer which I tried with the same keyboard, I had to reboot once.

So thanks again to the unknown person who posted the original solution, you have saved my keyboard from returning to amazon :)

Update: In case this does not work for you, try one of the following tips (I have not had to try those out, but found them on www.codinghorror.com and think I might copy it here if you came here via a search engine):

  • MS Natural Ergonomic Keyboard 4000: Volume Control Keys not working...
    
    Volume control seems to work fine if you start the "HID Input Service".
    If you are not able to start it (happened to me), fire up the registry
    editor and find:
    
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HidServ\Parameters
    
    If there's a string called "ServiceDll" delete it, and create a new one
    (expandable string named "ServiceDll") with this value:
    
    %SystemRoot%\System32\hidserv.dll
    
    Gary
    Gary on December 29, 2005 10:46 PM
  • The post by Gary worked for me too. I had the same issue with the volume buttons...
    
    My problem was I could not start the HID service...
    
    After adding the registry value that Gary mentioned:
    
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HidServ\Parameters
    
    If there's a string called "ServiceDll" delete it, and create a new one
    (expandable string named "ServiceDll") with this value:
    
    %SystemRoot%\System32\hidserv.dll
    
    Then starting the HID service through the control panel, the volume button worked great.
    
    Thanks Gary!
    
    
    Ken on January 5, 2006 12:50 PM 

Be sure to have a look at the site mentioned above, there is lots of discussion going on there about the keyboard.

If all of the above still does not work, maybe this could help (found here):

  • Hi Sharon,
    
    Thanks for the response! Found the below from another MVP over in the
    general group and checked my system. Sure enough two of the three files are
    not on my XP installation. Will try this idea first as it probably can not
    start without the needed files.
    
    > Could not start the Human Interface Device Access service on Local
    > Computer. Error 126: the specified module could not be found
    >
    >  Extract hidserv.dll, mouhid.sys and mouclass.sys from drivers.cab in the
    >  Windows XP CD to "\Windows\System32\" to fix the issue.
    >
    >  1.Insert Windows XP Setup CD.
    >  2.Browse to :\i386\drivers.cab
    >  3.Double click the drivers.cab file. The compressed files in the Cab file
    >  will be listed.
    >  4.Copy the three files to \windows\system32\. Reboot the
    >  computer.
    >
    >  The Human Interface Device Access service will then be started.

Subversion vs CVS

Yesterday I trieb to setup subversion for the first time. At work we are currently managing a large project with a lot of subcomponents with CVS and Eclipse. For almost three years now everything has gone alright, however the server is getting problems now, handling the steadily growing repository.

As a first step we are going to install new, more powerful hardware, but I have been wondering for some time, if subversion would not be better for our needs. We do lots of labelling and some branches are not so uncommon, too. Considering the size of our repository (some 16GB and lots and lots of sourcecode, documentation and library files) some operations just tage ages to complete.

But what I hate most is the non-atomic commits CVS has and that you cannot see, which files were affected by a particular checkin. It very hard to get people to repeat the names of everything they touched for a commit in the checkin comment. Some developers don't even try to commit everything at once, they do it file-by-file, so even cvs2cl is only of limited use, because the timestamps vary for a single changeset.

Having not too much work between the holidays and the new year I spent some time to setup a subversion 1.2.3 server on an already present Debian system. I chose to use the Apache (svn_dav) based variant and the file system based storage backend ("fsfs") because I think I could not sleep well, knowing that the whole of our project depends on the "well-being" of a BDB database. That's the main reason we did not consider subversion when we started our project, because back then this was the only option you had and the prospect of losing everything due to a corrupt database file was not soo appealing to us.

The other major criteria that has to be met is integration into Eclipse. Because most of our developers are not too much interested in the inner workings of a revision control system, they would not accept anything less comfortable than the current Eclipse CVS integration. And one has to admit that this a very fine piece of software, indeed.

So I installed subclipse from the subversion homepage into my Eclipse 3.1. I did not take another look at any documentation, just to make sure I get the "feeling" right and act as if I still had CVS under my hands. I connected to the repository without problems, except for a lower-case drive letter in my workspace path that subclipse complained about. Then I created a simple Java project and a Hello World class. Sharing the project, i. e. putting it under version control worked quite ok and all the familiar markers and icons were put to Eclipses package explorer view.

Then I tried to get place a currently updated revision history view to my perspective. This alone worked without problems. What I did not manage to get was the automatic update. Even though I checked the "Sync with Editor" option, I did not get the list of revisions of the file currently open in the active editor. Clicking the refresh-view button did not work either. So I looked at the new icons in the views toolbar and clicked on "get all". This actually showed me the single revision of the file (or: the single revision of the repository that particular file state was in) I expected. I wondered for a moment, why the lowest revision number shown was "2", even though I had added the file in one single operation with sharing the whole project. So just out of curiosity I clicked on the "next 25 revisions" button, just the be confronted with an ugly message dialog, complaining about a "path not present" error. However on another machine I do not even see that button, so maybe I just a pre-release of the plugin. I will see to that again next week, now it's time get ready for the new year's eve party...

Sunday, January 01, 2006

Impressum

Daniel Schneller
Kamper Str. 14
42699 Solingen
Deutschland /Germany

Dies ist eine private Internetpräsenz.
Internet: www.danielschneller.de / www.danielschneller.com
Haftungshinweis: Trotz sorgfältiger inhaltlicher Kontrolle übernehmen wir keine Haftung für die Inhalte externer Links. Für den Inhalt der verlinkten Seiten sind ausschließlich deren Betreiber verantwortlich.
Layout und Gestaltung dieser Website sowie die enthaltenen Informationen sind gemäß dem Urheberrechtsgesetz geschützt. Das ist auch zu beachten, wenn auf diesen Internetseiten erscheinende Materialien Dritter zur Informationsgewinnung verwendet oder kopiert werden.
Alle Angaben erfolgen ohne Gewähr. Eine Haftung für Schäden, die sich aus der Verwendung der veröffentlichten Inhalte ergeben, ist ausgeschlossen.
Sollte der Inhalt oder die Aufmachung dieser Seiten Rechte Dritter oder gesetzliche Bestimmungen verletzen, bitten wir um eine entsprechende Nachricht ohne Kostennote.
Wir greifen auf Drittanbieter zurück, um Anzeigen zu schalten, wenn Sie unsere Website besuchen. Diese Unternehmen nutzen möglicherweise Informationen zu Ihren Besuchen dieser und anderer Websites, damit Anzeigen zu Produkten und Diensten geschaltet werden können, die Sie interessieren. Falls Sie mehr über diese Methoden erfahren möchten oder wissen möchten, welche Möglichkeiten bestehen, diese Informationen nicht von den Unternehmen verwenden zu lassen, klicken Sie bitte hier: http://www.google.de/privacy.html
Diese Website benutzt Google Analytics, einen Webanalysedienst der Google Inc., um eine Analyse der Benutzung der Website zu ermöglichen. Die durch den Einsatz von Google Analytics erzeugten Informationen über Ihre Nutzung dieser Website (einschließlich Ihrer IP-Adresse) werden an einen Server der Google Inc. in den USA übertragen und dort gespeichert. Google wird diese Informationen lediglich dazu verwenden, die Nutzung der Website zu analysieren, indem anonymisierte Auswertungen und Grafiken zur Anzahl der Besuche, Anzahl der pro Nutzer aufgerufenen Seiten usw. erstellt werden.
Das Copyright für Bilder auf der Seite www.danielschneller.de bzw. www.danielschneller.com liegt bei Daniel Schneller bzw. den jeweilig angegeben Personen.
Dieses Impressum basiert auf der Arbeit von Martin Hamann