Saturday, May 31, 2008

How To Drive CVS Totally Insane

Okay, I do have to admit: posting on why CVS sucks or how to make it slow is the lamest thing ever.  I mean, duh!  Everyone knows that CVS is 1970's technology, 3 parts dope to 1 part disco, and there are better alternatives.

But...I think the performance situation I found is interesting because of when it happens.  (In particular, this performance problem has been with us for months and I only noticed now.)

The airport database is a 40 MB, 800,000 line text file.  We checked it into CVS

Hey, stop seemed like a good idea at the time!

What's interesting is that CVS can (or at least could) check revisions onto the main branch fast enough that we didn't realize we were getting into trouble, and check out the head branch at linear speeds, limited only by net connection.

Poking into the repository, it turns out that this is because the head-branch revision is stored in its entirety with diff instructions to move back a step each way toward the first revision. 

This goes a ways toward explaining why CVS doesn't become incrementally slower at getting code as you use it: in truth it does, but the cost is born on the older versions of the repository, not the newer ones!

Where we got into trouble was when I checked data into a branch.  The branch cannot be the head revision, so it is stored as a delta off the version it came from.  With a huge file, the cost of applying even one diff on the fly is quite large.

In our case, getting the head revision takes less than a minute; getting either the previous revision of this file or the branched latest version take over 45 minutes!

I am not yet sure what we can do about this - the fact that historical check-outs will be slow is annoying, but clearly we don't use that feature very often.  (Fortunately this is a rarely-changing file.)

1 comment:

  1. cvs add -kb?

    or after the fact:

    cvs admin -kb

    See the Cedarqvist for details.