rich roots conversion

John Arbash Meinel john at arbash-meinel.com
Wed Apr 15 12:34:52 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...
>> So it was mostly a feeling that the reason we were generating these
>> extra nodes seemed a bit bogus, and it means we carry that cruft around
>> for a long time.
> 
> The ghost handling works, or we couldn't convert bzr.dev :).
> 
> So - I'm not working on this, Andrew and Martin aren't - AFAIK Ian isn't
> - if you or Aaron aren't, should I ignore this for planning?
> 
> -Rob

The idea Aaron had was that if we modified the root revision at every
commit, then if you had 2 conversions, one with a different number of
ghosts, you would still end up with identical results.

While this is true, you have the same problem with every single file and
directory in the entire tree. We don't advocate setting their last
modified to a new value every revision....

The problem is that if you have the graph:

  A *
  |/
  B

* being a ghost

If you have any changes between A & B, you don't know whether they were
merged in from '*', or newly introduced in B. As we don't know, we just
set the value to B. You can extend this with someone else having:

    *
    |
  A C
  |/
  B

And now *they* can attribute changes to C, which the previous conversion
could not. So in this second conversion, we may have some ie.revision =
C, which isn't possible in the former conversion.

The only way to get a conversion to handle all possible ghosts is to set
every last-modified revision to the current revision.

A better solution is to recognize that a conversion + ghosts can never
give deterministic values. And instead, work on making it so that when
ghosts are filled, we update the appropriate fields.

This, unfortunately, means regenerating an arbitrary number of
inventories (consider a file could have been modified in C, and then
never modified again for the next 100k revisions). CHK makes that a lot
cheaper, but still potentially mutates a lot of inventories.

(An alternative solution is to turn last-modified into a 'non-official'
value, somehow splitting it out of the inventory, etc.)

Anyway, to implement... we basically just change the extra loop that
inserts rich-roots, so that it only inserts one when the root actually
changed.

I can work on it, but I wanted to make sure everyone had signed off on
it, since obviously enough people had signed off on the current design.
IIRC Robert & Aaron were the primary ones who gave the current layout. I
feel like Aaron begrudgingly accepted. AFAICT, he would *really* like to
have certain "committed" data which never mutates, but given that we
have different serialization formats, etc, we will never have exactly
the same 'inventory_sha1'. (We could make the field format specific,
etc, but we need a specific answer, not just forcing fields in the
current structure.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknlxlwACgkQJdeBCYSNAAOLmACcC81G6C1gRCTMeSylwvvf0ApK
VpcAoJwipfRsoFdVbkQumCZ4ZLaaBQCj
=yR5q
-----END PGP SIGNATURE-----



More information about the bazaar mailing list