Call for testing: cvs2bzr

Thu Aug 20 15:11:01 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Greg Ward wrote:
>> On Wed, Aug 19, 2009 at 12:30 AM, Ian
>> Clatworthy<ian.clatworthy at canonical.com> wrote:
>>> That's pretty well it. We *could* handle a separate blobs file but it's
>>> nicer w.r.t. memory consumption for us to go the inline blob path ala
>>> hg. Unlike hg though, bzr has no limitations w.r.t. merge parent count.
>> But keep in mind that inline blobs make the dump file much much
>> larger.  That'll be troublesome for large conversions.  I implemented
>> a rather vile hack in hg-fastimport to make it handle separate blobs:
>> write each blob to .hg/blobs/<blobmark>.  Then rm -rf .hg/blobs at the
>> end of conversion.  It's slow and doubles the disk space overhead, but
>> at least it doesn't suck up RAM.  And it's still less disk space than
>> inline blobs.
> 
> bzr fast-import will handle blobs being defined once and reused over and
> over again. The trouble is that it doesn't know which ones get reused
> unless it does two passes, so it acts conservatively and keeps all of
> them in memory. Fine for small imports but lousy for large ones. Reusing
> mark idrefs or using inline blobs solves the problem implicitly.

Note that you could just track sha1 => (file_id, revision_id) and then
go to the target repository to extract the text you need in the future.

That would allow you to only store a mapping in memory, rather than
actual content.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqNWXUACgkQJdeBCYSNAAP/4wCdFmIFHRxGCLmLY6H842LDbrWP
fOAAn2HemKZRJmdE8Cb2+DsPaxEqA9fj
=/VUs
-----END PGP SIGNATURE-----