Tuesday, September 25, 2007

The value of a CVS commit database

Due to some discrepancies between the Eclipse 3.2.2 compiler and Sun's javac we needed to upgrade our development environments to Eclipse 3.3. Otherwise we could not tell for sure that something that looked ok in Eclipse would compile in the daily build process.

Even though I had used 3.3 privately for some time now, there is always some tension when switching a whole bunch of developers in a really large project.

At first everything seemed fine, apart from some minor issues that could be easily worked around.

However I ran into a nasty little bug regarding the CVS integration when I had to switch a workspace that had been checked out on HEAD to another branch. That branch had been created to keep the Helpers- and Utilities-Refactoring I wrote about before separate from the HEAD until it is complete.

Within Eclipse you can just select "Switch to another branch or version" from the context menu on any project (or file for that matter) and select the branch you would like. I had done this with 3.2.2 several times without any problems. So not suspecting any problems I switched my local working copy to the branch and started checking in modified files after some more refactoring.

However shortly after I had done that a colleague complained that there were compile errors on HEAD now. Turns out that Eclipse 3.3.0 has a well hidden bug in that feature: The version switch involves some requests to the CVS server. This works alright for files that have no changes in the working copy, however for files with outgoing changes the server response is not handled correctly and those remain on HEAD. Because I had made some changes before the switch already, part of my changes went to the branch, the rest of them to HEAD, leaving both in an incomplete state. For details on the bug see Eclipse Bug #192392.

The files I had checked in spanned several projects and where of course committed in little chunks with different comments. At that point I was very glad that I had my ViewVC commit database to query for anything I had done over the last few hours that had gone to HEAD. Without it it would probably have taken me hours to just find out which files I had checked in on the wrong branch. While it was still some tedious work to actually restore everything back to the state I wanted, just identifying the files affected was done with a rather simple SQL query in no time.

I can only advise anyone working on a project with more that just a few files to set up a database that stores all commits by type (addition, change, deletion), file, branch, date and author. This wasn't just my life insurance in this case, but in combination with a full text index on the commit comment field it is also a very good base for change logs - using simple SQL they can be generated very flexibly and within just the blink of an eye.

The version of ViewVC we use is rather old and contains some custom changes that probably would not be required with a more recent release. So I recommend taking a look at the current version the project offers.

9 comments:

Cristian said...

I can only advise anyone working on a project with more that just a few files to set up a database that stores all commits by type (addition, change, deletion), file, branch, date and author. This wasn't just my life insurance in this case, but in combination with a full text index on the commit comment field it is also a very good base for change logs - using simple SQL they can be generated very flexibly and within just the blink of an eye.

WTF?! This information is in the version control system. Any normal/decent system can offer you a log of what happened with the source code. And then you can search in that log for what you want.

Daniel Schneller said...

WTF?! This information is in the version control system. Any normal/decent system can offer you a log of what happened with the source code.

Oh you're right, stupid me. While we're at it, why not get rid of all those useless search engines? After all any normal/decent browser can get you from link to link and just download every page off the web. How long could it possibly take to have a look at everything to find what you are looking for.

Cristian said...

svn log followed by a grep. How hard could it be? I agree with the fact that a GUI (or other program) for helping with this search would be useful, but I don't think it's mandatory, or needed too often.

Daniel Schneller said...

I understand what you mean, and yes, you are right (that part of my previous comment was not ironic). However as I tried to convey in the blog post this database solution gives you a lot more flexibility.

Of course you can do all the same stuff with cvs/svn log commands and some shell scripts. But a) the database is *much* faster and b) you get a powerful and versatile query syntax for free.

For small projects it might be sort of overkill, however we have more than 3 million lines of code in our product which makes it a little hard to come by with just the CVS out-of-the-box tools.

Cristian said...

Indeed. Out of curiosity, why are you still using CVS?

Daniel Schneller said...

It's a very large project with tens of thousands of files and about 3 million lines of code. We have already considered the move to subversion several times, however they (or the Eclipse plugins) do not seem to be able to cope with this order of magnitude too well. We experienced lockups and other stability problems.

Fred said...

have you considered git ? (simple curiosity)

Daniel Schneller said...

No, he have not considered git (or any other kind of distributed VCS). We work basically in a one-location setup and would like to continue with a central repository. As we do not do too much branching/merging we would not make use of one of the key features of these systems anyway.

Anonymous said...

Have you considered SVN (just out of curiosity)?