Call for testing: cvs2bzr

Michael Haggerty mhagger at alum.mit.edu
Thu Aug 20 21:49:20 BST 2009


Greg Ward wrote:
> On Wed, Aug 19, 2009 at 12:30 AM, Ian
> Clatworthy<ian.clatworthy at canonical.com> wrote:
>> That's pretty well it. We *could* handle a separate blobs file but it's
>> nicer w.r.t. memory consumption for us to go the inline blob path ala
>> hg. Unlike hg though, bzr has no limitations w.r.t. merge parent count.
> 
> But keep in mind that inline blobs make the dump file much much
> larger.

There doesn't seem to be a fundamental reason why this should be the
case.  In most cases, the blobs for different file revisions should be
different and have to be written anyway.  I suppose that the reason is
that during branch and tag creation, the filesystem contents are not
implied copied from the primary parent branch but rather created afresh.
 This does not create much overhead when using git and a separate blob
file (i.e., the original implementation), and in fact it saved cvs2git
some work in OutputPass.  But of course it is quite expensive if each
blob has to be output again for each branch/tag creation.

So I suggest that we work to make cvs2bzr avoid writing redundant blobs
when possible, rather than introducing yet another caching mechanism on
the importer side.

> (My "clever" idea for handling blobs: keep a dict mapping blob mark to
> file offset.  Then when we need a blob, seek to that offset and read
> the required number of bytes.  Never got around to implementing this,
> and I'm not sure if it would save much I/O.  Fewer writes I suppose.)

Regarding in-memory maps: The largest cvs2svn conversion that I know of
was KDE, which had about 3.3M CVS file revisions resulting in about 409k
Subversion commits.  If you want cvs2bzr and cvs2hg to be able to handle
conversions of this size, you have to be careful about what you keep in
memory.  On the other hand, one could argue that when using bzr and hg,
one typically wouldn't keep so many sub-projects in a single repository;
perhaps the largest cvs2bzr and cvs2hg conversions will be considerably
smaller.

Michael



More information about the bazaar mailing list