observation about trac+bzr and bzr smart server with big repo

Fri Oct 24 09:57:49 BST 2008

Robert Collins writes:
 > Sorry for taking so long to reply;

No hurry.  A more measured approach to format changes is a long-term
issue, anyway.

 > Ok. Perhaps we have quite different userbases to git; I know we
 > have had many users complain when we *contemplated* forced upgrades
 > in the past.

I think you do have a different userbase.  Specifically:

 > I think missing object errors are really not that easy to debug for an
 > average user; for someone used to working with databases of various
 > kinds - sure; but a graphic artist, or a documentation writer, or
 > project manager - no way is that easy for them.

Well, maybe ... I think you underestimate the ability of the average
user to retain information about problems with simple remedies that
they've previously encountered.  True, this one is not susceptible to
solution by the catchall "just reboot Windows"<wink>, but it's not
that hard to remember.

However, there's no arguing that what gives me confidence in git is
not available to them:

 > you have a perception about git that isn't true. It is true that if
 > you are careful about the features and capabilities you use you can
 > force things out to a disk layout an older git will read,

Yes and no.  In a simple test the only thing that the older git
(1.0.9) had problems with was cloning (I cp'd the object database from
the submodule into the project database -- in git, this is safe -- and
it worked), and of course it failed to check out the submodule.
Otherwise everything was there.  And the submodules are guaranteed to
be readable since submodules are not allowed to be recursive.  So I
could use rsync to clone, and then work with the older git, if I had
to.  (In practice, of course I'd upgrade.)

As you point out, your userbase includes a lot of people who may not
know an rsync from the kitchen sink, and shouldn't have to.  My point
about the stability and robustness of the git odb is that those of us
who *do* know better than to wash our hands at an rsync can use the
standard file utilities to profitably manipulate the git odb.  That
gives me a warm fuzzy feeling about my own git repos (and CVS for that
matter).

This is a point that goes back to Tom Lord's choice of the git object
store for his Arch 2 format.  That's a pretty strong recommendation
considering that Tom is the kind of hacker who makes me want to revise
the expansion of NIH to "not intellible here".<wink>  But he always
did insist that the ability to use standard utilities to examine Arch
repos was a major strength of Arch.

 > I don't think that the odb level is 'finished' in bzr, hg, or
 > git. I recall a git changelog not that long ago that changed the
 > index layout in git packs - thats right down at the level that
 > makes you nervous.

Indeed.  But there's this claim in the git log:

commit c0a5e2d477baa9d3ebf7d3303a7d2b5dbc7c2ffe
Author: Nicolas Pitre <nico at cam.org>
Date:   Wed Jun 25 00:25:53 2008 -0400

    pack.indexversion config option now defaults to 2

    As announced for 1.6.0.

    Git older than version 1.5.2 (or any other git version with this option
    set to 1) may revert to version 1 of the pack index by manually deleting
    all .idx files and recreating them using 'git index-pack'.  Communication
    over the git native protocol is unaffected since the pack index is never
    transferred.

Git really does try to ensure that almost all of its meta data stores
are just caches.

 > Thank you for providing input here; I think I have more information
 > on the topic; though I suspect my conclusions may still differ from
 > yours :).

Oh, I expect that; your goals are different.  I just think it's a
shame that VCS projects keep inventing new object stores that don't
outperform git, which is still using its original model with a couple
of obvious speedups (for typical Unixoid file systems, anyway).