New Emacs Bazaar Repository II

Wed Feb 25 00:19:40 GMT 2009

Ian Clatworthy <ian.clatworthy at internode.on.net> writes:

> Jason Earl wrote:
>
>> For one thing, posting here allows me to thank the bzr developers for
>> their fine work and specifically thank Ian for his work on the
>> fast-import plugin.  I really appreciate it.
>
> Thanks for the positive feedback.

You deserve it.  I apparently should have made it more positive though.
I'll try and rectify that with this round.

>> For another, the Emacs development crowd tends to be a bit judgmental
>> about bzr and there are some issues with the newest conversion.  I'm
>> hoping that I can get some help sorting things out before involving
>> emacs-devel.
>
> We certainly appreciate the honest feedback we're receiving from the
> Emacs developers. We're obviously very focussed on the top priorities
> on our roadmap - http://bazaar-vcs.org/Roadmap/ - because they help
> give a better bzr experience to everyone. *But* Karl and I have also
> been chipping away at the issues identified as Emacs adoption blockers
> (e.g. logging subdirectories, tweaking bzr status) as well.

When I first started playing with bzr I was not a fan.  I was simply
trying to help out the Emacs development team move to something
(anything) besides CVS.  Thanks to the work I have done on this project
I have become a huge fan of bzr.  The bzr development team is doing a
great job.  I have been amazed by the amount of progress.  It has been
very gratifying to be even peripherally involved.

That being the case, the emacs-devel group is as opinionated as any
group around, and version control systems are one of those areas where
people have strong opinions.  Many in the Emacs community were upset
when bzr was chosen, and several of my original posts to emacs-devel
started minor flame-wars.

I wanted to post here first because it is quite possible that problems
with the conversion are my fault, and this list is less likely to take
my mistakes and blame them on bzr.

If it seemed that my post was in any way critical then I sincerely
apologize.

>> Anyway, I've made some progress with the Emacs bzr repository.
>> Actually, it can more honestly be said that Ian's changes to
>> fast-import now do a better job with the Emacs repository.
>> Specifically the missing tags have appeared.  Unfortunately, however,
>> these tags have appeared as branches.  The overall size of the
>> repository has also ballooned significantly, probably due to the
>> increase in the number of branches.
>
> *If* fast-import is working correctly, then branches will only be
> created for repository heads, i.e. revisions which are not merged into
> another in the repository. The per-branch overhead is very low, at
> least until one checks out the working tree for one. Tags in bzr are a
> per branch thing (though they map to revisions) so fast-import
> collects all the tag definitions in the fast-import stream and then
> determines which tags are applicable to each branch as that branch is
> created.

OK, that's what I understood.  As a concrete example, the
amigados-branch in my repository apparently corresponds to a tag in git
the upstream git repository.

The git repo is available at:

git://repo.or.cz/emacs

If you are curious.

>> In fact, it has almost doubled in size.  Even the smaller 1.9
>> formatted repository still weighs in at over 700M with no trees.  As
>> a comparison the old repositories that were missing tags (but
>> apparently not revisions) weighed in around 400M (without trees) and
>> the git repository that the bzr repository is based on weighs in at a
>> svelte 306M with a tree.
>
> What version of bzr-fastimport did you use? I wonder if it has
> something to do with the reset support Brian de Alwis and I recently
> added?

I am using a gently (maybe 4 lines) modified version of the head I
pulled from here:

http://bazaar.launchpad.net/~bzr/bzr-fastimport/fastimport.dev/

The change certainly did coincide with the reset support change.

>> You can get the new repositories at:
>> 
>> http://bzr.notengoamigos.org/emacs-merges.tar.gz (or lzma)
>> http://bzr.notengoamigos.org/emacs-merges-ce.tar.gz (or lzma)
>> 
>> The emacs-merges-ce repository is uses the 1.9 format, and the
>> emacs-merges repository uses the 0.92 pack format.  These
>> repositories are somewhat improved over their predecessors.  For
>> example, I upgraded the branch format in the the 1.9 format
>> repository to be 1.9 as well.  In the old ce repositories I used a
>> 1.9 format for the repository but the branches were actually still in
>> 0.92 format as that is the default.  The parent of each branch in
>> these new repositories is also set correctly, in the old branch it
>> was set to one of my test servers.
>> 
>> Now, here are my observations (and questions) for the bzr community.
>> 
>> First of all, how would I go about turning some of these branches
>> back into tags?
>
> See above. According to fast-import's head tracking logic, these
> really are branches. Note however that the head tracking logic was
> recently tweaked to support resets. There's a chance I've broken
> it. FWIW, fast-import-info still uses the *old* head tracking logic so
> I'd be curious as to whether the branches that got created matched
> what f-i-i said would happen.

I can test that.

This is going to get fixed, and I am going to be very glad I talked to
the bazaar people first :).

>> Secondly, I am somewhat confused about why 0.92 pack branches are the
>> default branch type in a 1.9 format repository.  Everything worked
>> the same, but you would get warnings when stacking on the branches.
>> To my limited experience that seems like a very odd default.  When
>> pulling branches into a 1.9 formatted repository it seems like you
>> should get 1.9 format branches.
>
> If you haven't already, please raise this as a bug. It sounds high
> priority to me.

OK, I will do that.

>> Thirdly, the fast-import-info bit of fast-import is broken with
>> regards to the Emacs repository.  Emacs has so many "stickies" that
>> it creates a ConfigObj that ConfigObj can't read in a reasonable
>> amount of time (I let it run for 24 hours).  I fixed this by using a
>> special cache_manager that put stickies in an anydbm hash (it creates
>> a 27G file).  I am somewhat embarrassed by the sheer hackiness of
>> this, but it actually works surprisingly well, and runs faster than
>> scanning the huge fast-import file twice.  Perhaps if I had access to
>> a machine with a ridiculous amount of memory this wouldn't be a
>> problem, but with 4G of memory and twice that in swap I had problems.
>
> Wow - that's weird/interesting. fast-import-info is typically really
> fast. I tested it on the Linux kernel the other day and it only took
> 17 minutes. (Which sounds long but isn't given the 300K blobs and 130K
> revisions in that f-i stream). What sections takes up all the space?
>
> Let me also state publicly that 4G ought to be plenty for
> bzr-fastimport and the commands it provides.

When I was still trying to use fast-import-info the problem was that it
created a configuration file that ConfigObj couldn't read in a
reasonable amount of time.  It would get caught in a regex and never
come out.

So fast-import-info would run and create a file, but when I went to use
fast import using the file it would run forever without making headway.
Heck, it's even possible that this is fixed.  I haven't tried using a
fast-import-info config file in a long time.

Like I said, I will do some testing and get back to you.

>> Finally, should I consider creating a 1.9-rich-root format
>> repository?  The fast-import plugin will now create rich-root
>> repositories, but the serializer will not successfully convert
>> between a rich-root repository and a non-rich-root repository.  It
>> tries, but it fails with an XML error (which, I understand, is a
>> known problem).  Is rich root likely to become a default anytime
>> soon?  If it is going to become the default does it make sense to
>> bite the bullet and save a re-serialization later?
>
> We're pushing hard to get a much better repo format available by end
> March.  It ought to also remove the whole plain vs rich-root duality.
>
> Also, I'm not done with bzr-fastimport. Here's my planned changes
> before announcing 0.8, hopefully around the same time as bzr 1.13rc
> goes out.
>
> 1. Make it fast on brisbane-core - the test bench branch for the new
>    repo format.
>
> 2. Better bzr-like & deterministic mapping of branch names:
>    * refs/heads/master -> trunk
>    * refs/heads/foo -> foo
>    * refs/tags/foo -> foo.tag
>    * refs/remotes/origin/foo -> foo.remote
>
> 3. Maybe add a --sandbox option to create a lightweight checkout switched
>    to the current branch. That ought to make people migrating from git
>    feel more at home. See http://bazaar-vcs.org/DraftSpecs/EasyWorkspaceSetup
>    for some of my thinking here.
>
> (2) will permit bzr fast-export to round-trip a repo to multiple branches
> and not just support exporting of a single branch.
>
> Putting all the above together, the combination of new repo format +
> fasting networking operations + better fast-import mapping + better
> log ought to go a long way to making bzr a pleasure for the Emacs
> developers to use real soon now.

I'm excited.  Bzr is already a way better system than CVS, and they've
been happy with that.  Right now the Emacs devel group is doing most of
its merging in Arch, and bzr is worlds better than that as well.  Thanks
to the support of the bzr team I think that the conversion is likely to
go very well.

Jason