New Emacs Bazaar Repository II

Ian Clatworthy ian.clatworthy at internode.on.net
Tue Feb 24 23:35:34 GMT 2009


Jason Earl wrote:

> For one thing, posting here allows me to thank the bzr developers for
> their fine work and specifically thank Ian for his work on the
> fast-import plugin.  I really appreciate it.

Thanks for the positive feedback.

> For another, the Emacs
> development crowd tends to be a bit judgmental about bzr and there are
> some issues with the newest conversion.  I'm hoping that I can get some
> help sorting things out before involving emacs-devel.

We certainly appreciate the honest feedback we're receiving from the Emacs
developers. We're obviously very focussed on the top priorities on our
roadmap - http://bazaar-vcs.org/Roadmap/ - because they help give a better
bzr experience to everyone. *But* Karl and I have also been chipping away
at the issues identified as Emacs adoption blockers  (e.g. logging
subdirectories, tweaking bzr status) as well.

> Anyway, I've made some progress with the Emacs bzr repository.
> Actually, it can more honestly be said that Ian's changes to fast-import
> now do a better job with the Emacs repository.  Specifically the missing
> tags have appeared.  Unfortunately, however, these tags have appeared as
> branches.  The overall size of the repository has also ballooned
> significantly, probably due to the increase in the number of branches.

*If* fast-import is working correctly, then branches will only be
created for repository heads, i.e. revisions which are not merged into
another in the repository. The per-branch overhead is very low, at least
until one checks out the working tree for one. Tags in bzr are a per
branch thing (though they map to revisions) so fast-import collects all
the tag definitions in the fast-import stream and then determines which
tags are applicable to each branch as that branch is created.

> In fact, it has almost doubled in size.  Even the smaller 1.9 formatted
> repository still weighs in at over 700M with no trees.  As a comparison
> the old repositories that were missing tags (but apparently not
> revisions) weighed in around 400M (without trees) and the git repository
> that the bzr repository is based on weighs in at a svelte 306M with a
> tree.

What version of bzr-fastimport did you use? I wonder if it has something
to do with the reset support Brian de Alwis and I recently added?

> You can get the new repositories at:
> 
> http://bzr.notengoamigos.org/emacs-merges.tar.gz (or lzma)
> http://bzr.notengoamigos.org/emacs-merges-ce.tar.gz (or lzma)
> 
> The emacs-merges-ce repository is uses the 1.9 format, and the
> emacs-merges repository uses the 0.92 pack format.  These repositories
> are somewhat improved over their predecessors.  For example, I upgraded
> the branch format in the the 1.9 format repository to be 1.9 as well.
> In the old ce repositories I used a 1.9 format for the repository but
> the branches were actually still in 0.92 format as that is the default.
> The parent of each branch in these new repositories is also set
> correctly, in the old branch it was set to one of my test servers.
> 
> Now, here are my observations (and questions) for the bzr community.
> 
> First of all, how would I go about turning some of these branches back
> into tags?

See above. According to fast-import's head tracking logic, these really
are branches. Note however that the head tracking logic was recently
tweaked to support resets. There's a chance I've broken it. FWIW,
fast-import-info still uses the *old* head tracking logic so I'd be
curious as to whether the branches that got created matched what f-i-i
said would happen.

> Secondly, I am somewhat confused about why 0.92 pack branches are the
> default branch type in a 1.9 format repository.  Everything worked the
> same, but you would get warnings when stacking on the branches.  To my
> limited experience that seems like a very odd default.  When pulling
> branches into a 1.9 formatted repository it seems like you should get
> 1.9 format branches.

If you haven't already, please raise this as a bug. It sounds high
priority to me.

> Thirdly, the fast-import-info bit of fast-import is broken with regards
> to the Emacs repository.  Emacs has so many "stickies" that it creates a
> ConfigObj that ConfigObj can't read in a reasonable amount of time (I
> let it run for 24 hours).  I fixed this by using a special cache_manager
> that put stickies in an anydbm hash (it creates a 27G file).  I am
> somewhat embarrassed by the sheer hackiness of this, but it actually
> works surprisingly well, and runs faster than scanning the huge
> fast-import file twice.  Perhaps if I had access to a machine with a
> ridiculous amount of memory this wouldn't be a problem, but with 4G of
> memory and twice that in swap I had problems.

Wow - that's weird/interesting. fast-import-info is typically really
fast. I tested it on the Linux kernel the other day and it only took
17 minutes. (Which sounds long but isn't given the 300K blobs and 130K
revisions in that f-i stream). What sections takes up all the space?

Let me also state publicly that 4G ought to be plenty for bzr-fastimport
and the commands it provides.

> Finally, should I consider creating a 1.9-rich-root format repository?
> The fast-import plugin will now create rich-root repositories, but the
> serializer will not successfully convert between a rich-root repository
> and a non-rich-root repository.  It tries, but it fails with an XML
> error (which, I understand, is a known problem).  Is rich root likely to
> become a default anytime soon?  If it is going to become the default
> does it make sense to bite the bullet and save a re-serialization later?

We're pushing hard to get a much better repo format available by end March.
It ought to also remove the whole plain vs rich-root duality.

Also, I'm not done with bzr-fastimport. Here's my planned changes before
announcing 0.8, hopefully around the same time as bzr 1.13rc goes out.

1. Make it fast on brisbane-core - the test bench branch for the new
   repo format.

2. Better bzr-like & deterministic mapping of branch names:
   * refs/heads/master -> trunk
   * refs/heads/foo -> foo
   * refs/tags/foo -> foo.tag
   * refs/remotes/origin/foo -> foo.remote

3. Maybe add a --sandbox option to create a lightweight checkout switched
   to the current branch. That ought to make people migrating from git
   feel more at home. See http://bazaar-vcs.org/DraftSpecs/EasyWorkspaceSetup
   for some of my thinking here.

(2) will permit bzr fast-export to round-trip a repo to multiple branches
and not just support exporting of a single branch.

Putting all the above together, the combination of new repo format +
fasting networking operations + better fast-import mapping + better log
ought to go a long way to making bzr a pleasure for the Emacs developers
to use real soon now.

Ian C.




More information about the bazaar mailing list