Call for testing: cvs2bzr

Ian Clatworthy ian.clatworthy at canonical.com
Thu Aug 20 01:08:28 BST 2009


Greg Ward wrote:
> On Wed, Aug 19, 2009 at 12:30 AM, Ian
> Clatworthy<ian.clatworthy at canonical.com> wrote:
>> That's pretty well it. We *could* handle a separate blobs file but it's
>> nicer w.r.t. memory consumption for us to go the inline blob path ala
>> hg. Unlike hg though, bzr has no limitations w.r.t. merge parent count.
> 
> But keep in mind that inline blobs make the dump file much much
> larger.  That'll be troublesome for large conversions.  I implemented
> a rather vile hack in hg-fastimport to make it handle separate blobs:
> write each blob to .hg/blobs/<blobmark>.  Then rm -rf .hg/blobs at the
> end of conversion.  It's slow and doubles the disk space overhead, but
> at least it doesn't suck up RAM.  And it's still less disk space than
> inline blobs.

bzr fast-import will handle blobs being defined once and reused over and
over again. The trouble is that it doesn't know which ones get reused
unless it does two passes, so it acts conservatively and keeps all of
them in memory. Fine for small imports but lousy for large ones. Reusing
mark idrefs or using inline blobs solves the problem implicitly.

> (My "clever" idea for handling blobs: keep a dict mapping blob mark to
> file offset.  Then when we need a blob, seek to that offset and read
> the required number of bytes.  Never got around to implementing this,
> and I'm not sure if it would save much I/O.  Fewer writes I suppose.)

stdin as the data stream might be a problem, though.

> But there is risk of conflict
> if you touched git_run_options.py or git_output_option.py: I
> refactored a bunch of stuff that is common to the git and hg backends
> out of those files.  If you start feeling like refactoring the heck
> out of one of those files, it'll cause pain for one of us.

I'm pretty sure I changed nothing in there.

> Absolutely!  But this whole idea of making conversion "user friendly"
> ... sheesh.  What are you *thinking*?!?

I was thinking I'll get a less support questions if I set most the
options correctly for each of the xxx-fast-export tools myself, rather
than documenting the options and hoping everyone (1) reads the doc, and
(2) follows it.

Radical I know. Some call it usability, others call it laziness. :-)

Ian C.




More information about the bazaar mailing list