Feedback from evaluation in a corporate environment

Tue Jan 12 21:06:39 GMT 2010

Stephen J. Turnbull wrote:
> Uri Moszkowicz writes:
>  > Yes I think [some compression] is necessary. I could also use a
>  > pipe but I was hoping to keep it around. I expect that the
>  > conversion would still take a really long time though (maybe 1 week
>  > at the rate it was going?).

If you don't want to pipe the output into fast-import, you could at
least pipe it into gzip, bzip2, etc.  The fast-import stream should
compress pretty well.

In fact, you could "tee" it into both a compression program (so that you
have a permanent record of the contents) and also a fast-import process.
 Though you might want to make sure that whatever utility you use for
this doesn't terminate if the fast-import process should happen to crash.

> So what?  Surely you've already spent a week (calendar time) working
> on this.  Once you have *one* copy of a reasonably fresh repo, you're
> done with this.  With an adapted workflow, all the rest will use bzr
> protocol and be much faster (measurable in 10s of minutes at worst,
> and probably a handful of seconds).  A project with 10GB repos and
> 1000 developers can surely afford the cost of one workstation with 1TB
> of local storage, and a week of calendar time to do the conversion.
> That is the extent of the cost of conversion itself (if it works -- so
> it's a shame you didn't let it run for a week if necessary to find out).

The problem is not the cost of running a server for a week.  The problem
is that cvs2bzr is a one-shot conversion.  That means that it has to
operate on a static snapshot of the CVS repository, meaning that nobody
can commit to CVS for the whole week (or whatever) that the conversion
takes.  Thus the cost of the conversion is mostly the reduced efficiency
of all of those developers for the duration of the conversion.

cvs2svn/cvs2bzr/etc conversions are rather expensive, especially in
terms of I/O.  Make sure you have lots of RAM and fast disk drives.
Multiple CPUs doesn't currently help much because most of cvs2svn is
single-threaded.  I have several ideas for speeding it up, but not much
time to work on it :-(

Michael
(the cvs2svn maintainer)