Data migration to Bazaar 2.x status update

Ian Clatworthy ian.clatworthy at canonical.com
Mon Aug 31 01:06:31 BST 2009


Thanks to those who helped with the data migration testing last week ...

I made more progress on bzr-fastimport late last week and over the
weekend. Building on the discussions with the fastimport guys from other
 VCS projects, "round-tripping" of basic Bazaar repositories now mostly
works, i.e.

  bzr fast-export --no-plain my-branch my.fi.gz
  bzr fast-import my.fi.gz my-repo

will give you back a Bazaar repository with history (almost) equivalent
between my-branch and my-repo/trunk. To be specific, on the small
repositories (e.g. bzr-fastimport itself with 600+ revisions) I tried:

  bzr log -v --forward --include-merges my-branch > source.log
  bzr log -v --forward --include-merges my-repo/trunk > dest.log
  diff source.log dest.log

showed no differences. Hooray! Note the -v shows the history including
file operations: adds, modifies, renames, deletes, etc. If -v is
replaced with -v -p, then the diffs are showed as well. With -p, I'm
seeing some timestamp differences in the diff headers but that's all.
(I'm yet to track down why.)

On larger repositories, things aren't acceptable yet. On the bright
side, bzr-fastimport can export and import bzr.dev (at long last)
without falling over. On the downside, the revision total in the
generated trunk is off by a few so the conversion obviously failed in
some way. I'm yet to look into why.

I also tried converting:

* FireFox 3.5 from hg (26K revisions)
* Linux 2.6.30 from git (146K revisions)

Firefox took 2 hours to import and 30 minutes to export. The Linux
kernel took 13 hours to import. The export of the kernel from the
generated bzr repository fell over after 3.5 hours (112k revisions)
because a file-id was missing, i.e. thanks to a bug in the import. Damn.

To complete the migration stress test, I'll throw in MySql 5.1 (58K
revisions) this week.

In summary, data migration isn't there yet. For small repositories (less
than 10K revisions say), I suspect it's close enough in practice not to
matter much. For larger repositories, smooth, reliable migration is
still several days of engineering time (a few weeks elapsed time) away.

Ian C.

PS: Note that fast-export --no-plain uses experimental "features" yet to
be reviewed by the vcs-fastimport-devs. I'll discuss those features with
them real soon now once I'm sure they cover what we need. If you're
curious, the fast-export help in the latest Bazaar Data Migration Guide
(http://doc.bazaar-vcs.org/migration/en/data-migration/fast-export.html)
outlines the proposed features: "multiple-authors", "commit-properties"
and "empty-directories".



More information about the bazaar mailing list