CVS Performance: I/O Bottleneck because of locks
Today we found the reason for our sometimes abysmal CVS performance, even though the repository is located on a fast SAN: Locks were written to the local disks rather than the SAN.
For years we have been using CVS as our internal version control system. The repository has grown to about 20GB and covers 6 years of code and resources in several hundred thousand files and their history revisions.
Some time ago we migrated the repository data from an internal disk subsystem to a SAN based on 15k hard drives. The server is a Dual Xeon HP box with 4GB of RAM serving about 50 users.
Even though we thought we had a decently fast setup here, sometimes – especially when more than 10 or 15 people started to checkout or synchronize their Eclipse workspaces – performance would start to really degrade. In some cases working on branches only was possible after increasing the client timeouts to 3 minutes!
Our datacenter administrators provided us with I/O statistics in which we recently realized (don’t ask why they did not tell us about this earlier when we complained about bad I/O rates) that while the SAN interface was never really getting busy, local disks where 100% busy most of the time, especially during times when we had the most complaints from users.
Turns out that in the early days when we set up the repository configuration and obviously did not know enough about it, we configured the CVS LockDir variable in CVSROOT/config to /var/lock/cvs. This had never been changed, not even when we moved machines and put the data to the SAN. In effect each and every read or write operation on the repository caused locks to be created and deleted on the local hard drive of the server – completely breaking down the supposedly superb I/O performance.
As this is Friday afternoon we changed the configuration to place the locks on the SAN as well but could not really see the effect, because most people had already left for the weekend. We will certainly have a close look at the system on Monday to see if performance gets better. Probably we’ll set up a tmpfs ram disk to hold the locks if we see any improvements.
Up to this point I never even knew for sure how and when CVS put locks. I was under the impression that they were only required for commit or tagging operations. Because of that (false) assumption I would never have thought we could get a problem with local I/O, but as we have very complex directory structures lots and lots of per-directory read locks could of course pose a problem for a rather slow local disk.
For more information on CVS locks (not related to RCS locks) see the CVS manual.