Improved conversion from cvs => bzr

Sun Sep 3 19:47:57 BST 2006

I had an idea about how the conversion process could be changed to make
it faster.

Basically, the idea is that in a cvs => bzr conversion, you first read
the rlog, and you parse that so that you get all the revisions you want
to generate, similar to how tailor does it (and I assume cscvs does it),
only you actually assign the final revision id for every changeset.

Then, you actually convert each file individually. You know what
revisions it changed in, so you could convert a CVS rcs file directly
into a bzr .knit + .kndx file.

So first, you go to every file in the cvs repository, and do a
conversion to knit form (you need the revision ids to be stable, so that
all files stay in sync).

Once all files have been converted, then you switch to creating
inventories and revisions.

I could see doing this in a couple different ways. My favorite would be
to batch up inventories into say 50 at a time. So you find all of the
files that are involved in 50 revisions, and you convert each ,v file
into a complete knit. Then you convert the inventories for those 50
revisions, and the revision xml.

Then as you go along, there will be some overlap between the files that
you have already converted, and new files that are created.

The basic idea is just to fast path the things that are the same in cvs
versus the same in bzr. By converting one ,v => knit at a time, you can
keep the full parent text cache in memory, which should save a lot of
extraction time. And similarly for inventories. You don't have to keep
reading all the important inventories, then writing them out, then
reading them in again.

I honestly think that doing it in this manner could make the conversion
10-100 times faster. Ignoring the fact that cvs has to pause to do any
updates. Just in the time that 'bzr' is spending to commit everything.
Though I would also guess that the 'cvs update' time would also be much
faster, if you just read in a single ,v file, and parse it into all of
its revisions right away.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060903/50a53068/attachment.pgp