History horizons: how hard can they be ?

Mon Nov 16 16:10:44 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> 2009/11/16 Robert Collins <robertc at robertcollins.net>:
>> On Sun, 2009-11-15 at 18:54 +0100, Jelmer Vernooij wrote:
>>> As far as I can tell the only things required to support history
>>> horizons properly are:
>>>
>>>  * have a --horizon X option to 'bzr branch' that limits the revisions
>>> to fetch
>>>  * have a branch.conf option that limits the revisions to fetch
>>>  * support ghosts on the mainline well (there are some open bugs in this
>>> area)
>>>
>>> Am I missing anything important?
>> Yes, stopping fetch ever grabbing those revisions again: history
>> horizons are _not_ the same thing as 'just download a bit less'. They
>> are intended as a 'hard stop' on history - the history behind the
>> horizon is forever inaccessible.
> 
> I think this is just a confusion of terminology, and Jelmer is talking
> about what has previously been called 'shallow branches'.  In that
> case I think his list is pretty much correct - and personally I would
> leave out the second until I had some experience with it.
> 
> There is another issue that may come up though and that's whether you
> care about revnos being different between the shallow and non-shallow
> copies of the branch.  it might be nice to at least avoid people
> thinking they're confusable when they have what should otherwise be a
> mirror.  That may be too hard for dotted revnos or when the cutoff of
> the local branch is 'ragged'.
> 
> It would be very cool.

So if we're talking about shallow branches, I'll mention something.

I did prototype this a bit back when python was looking at us. The idea
being that 'bzr co --lightweight' could equivalently grab a
shallow-branch, and whatever content was downloaded to build a working
tree, would then be saved in the local shallow branch.

The #1 problem was that now "readonly" operations could cause the local
repository to write data. So things that would call "branch.lock_read()"
would really mean:
  local_branch.lock_write(); master_branch.lock_read()

Also, since repositories now need a
"start_write_group()/commit_write_group()" it is non-trivial to work out
when those things should trigger.

The #2 problem is our layering that if the 'revision_id' is in
'repo.revisions' then you can assume that the inventory is in
inventories and the texts are available, etc.

Matthew Fuller mentions the idea of "wraiths", about 3 years ago I
called them "skeletons". But something along the idea of having the full
revision-id graph, but marking which ones were not *actually* present,
but just present for stuff like merge graphs, etc.

There are quite a few unanswered questions, too. Like what happens if
you get a "leak". For example, you have a history-horizon of:

A - B - C - D
        ^- horizon

and then you get

A - B - C - D - G
     \        /
      - E - F

Do you mark a new horizon? Is it just 'all ancestors of C', etc. (this
is the strict horizon case, versus the shallow branch case, though.)

Anyway, this has become too long already...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksBeYQACgkQJdeBCYSNAAMkZQCeM/KQyZCxwO2jDZPJtCaTsRJb
rd0AnR9IQPGCm2udJWdRGt7BKg26wB/W
=ijv5
-----END PGP SIGNATURE-----