iter_changes, delta consistency and revert

Robert Collins robertc at robertcollins.net
Wed Aug 5 00:07:15 BST 2009


On Tue, 2009-08-04 at 09:50 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > Now, you may be wondering why 'f->j' is in the status output. Its there
> > because without it the path of j/b, if applied to an inventory delta,
> > would be wrong.
> 
> I'm not sure what you mean by "applied to an inventory delta". I think
> you mean if an inventory delta was generated from it?

Yes I do. My patch changes iter_changes, so that naively generating an
inventory delta from iter_changes generates consistent deltas in the
source->target direction. 


>   If so, you have
> choice in how you generate paths.  Generating all the paths in an
> inventory delta should be pretty cheap in bulk, since it's just a matter
> of concatenating the Inventory.name attributes that you already have.

Performance of generating the paths isn't really the concern for me: its
having them be accurate and understandable.

Take commit, for instance. If we run 'bzr commit a' in this sample tree,
the 'f->j' rename should take place because:
 - a/b is user request [under our children-of expansion]
 - j/b is the path a/b ended up at, and f->j is required to make that
correct.

If we don't commit the f->j rename, then commit would be committing f/b,
and we would have to choose between:
 - report a/b -> f/b as a rename, even though f/b isn't in the wt
 - report a/b -> j/b even though j/b isn't in the committed revision 

> > This matter for dirstate and other path based storage
> > systems. It also matters for humans, if we say we commit j/b, but
> > actually commit a/b, thats really confusing - at best.
> 
> True, but it is recoverable.
> 
> > There is some friction here:
> >  - revert wants 'enough changes to put specified things back the way
> >    they were'
> >  - commit and merge want 'enough changes so that the output tree looks
> >    like the user expects and isn't corrupted'
> 
> Can you expand on that description of merge?  The TreeTransform code is
> very robust, and it will create missing parent directories as needed,
> for example.

As I understand it, merge does a three way merge across Base, This,
Other, and uses iter_changes to get the difference from Base->Other. So
if a user says 'I want to merge a/b', should they end up with f/b or
j/b.

> >  - status wants to 'show what commit will do'
> >  - I think revert is using iter_changes in a symmetrical way, but
> >    inventory deltas built on iter_changes are not symmetric, and the
> >    iter_changes work I've been doing is likewise not symmetric.
> >    Specifically tree.iter_changes(basis, specific=foo) != 
> >    basis.iter_changes(tree, specific=foo), where expansion for 
> >    consistency is concerned.
> 
> Please fix this.  iter_changes is supposed to be symmetric.

In trunk it is symmetric. As we're changing it though, we have the
ability to decide if we want to keep it symmetric. Keeping it symmetric
will lead to significant additional work - calculating whats required
for bidirectional consistent deltas - that we don't need.

The drivers for asymmetry are unlinked files - via reparenting or
deletes. File path correctness is able to be made symmetric quite
easily.

For unlinked files, they can be caused two ways - by applying a delta
(whether inventory delta or just interpreting the iter_changes stream
directly) where one of the changes refers to a parent_id that isn't
present in the target or included in the delta. Secondly they can be
caused by applying a delta where a parent id that is still used is
removed or stops being a container, and any of its prior children are
not also included in the delta.

Given trees A and B, the expansion I've added is that:
all parents[1] fields in B.iter_changes(A) have their parent included if
it also has changed.
all entries in the changes list which have their kind[0] == 'directory'
and their kind[1] != 'directory' have their children in A also included
if the child has changed (and it always has, by being reparented at a
minimum).

Making these symmetrical would just mean making those tests symmetric -
check all parents[0] and [1], and check children in B when kind[0] !=
'directory' and kind[1] == 'directory'.

Expansion wise this would make for even larger commits [though generally
in corner cases]. For instance, a file in the basis tree that becomes a
directory in the target would have its new children all included, even
if the user had requested to commit just one child.

e.g.
bzr init
bzr touch foo
bzr commit
rm foo
mkdir foo
bzr touch foo/bar
bzr touch foo/quux
bzr commit foo/bar

With my patch, as it stands, this will commit [foo, foo/bar]. With
symmetrical expansion, it will commit [foo, foo/bar, foo/quux].

We expand to prevent invalid deltas; I don't think we should expand more
than needed to prevent that - and because deltas aren't symmetric we
have an expectation difference between iter_changes and inventory
deltas. To fix the long standing bug where commit has its own changes
generator we need to reconcile the differences here.

> > I have thought of of a few options:
> >  - inspect the iter_changes output inside revert, and discard things
> >    that are not needed.
> 
> As far as I'm concerned, revert is using iter_changes exactly the way it
> was meant to be used.  It may be that inventory delta generation needs
> something similar to, but not the same as, iter_changes.  Perhaps both
> could be built on the same primitive.

I could certainly rename the function called iter_changes in my branch
to _iter_changes, add enough data to permit stripping out the expansion
it does, and then have a trivial iter_changes that strips out the
expanded data. 

However, I think our original definition of iter_changes, while pretty
good, has permitted folk writing code on top of it to run with scissors.
Commit based on iter_changes is very fast, but we only just caught the
fact that there were major corruptions possible. While I do appreciate
an argument that we do need API's that allow users to cut themselves, it
doesn't make sense to me here - its non obvious to someone using
iter_changes, or inventory_deltas, that things can be wrong.

> >  - make consistent expansion an optional flag to iter_changes and have
> >    revert use that.
> > 
> > However, we've had a number of bugs (going from memory) to do with
> > revert and dirstate; I'm not at all sure that this is a coincidence -
> > the very bug I'm fixing, to ensure consistent deltas, may well prevent
> > those bugs in revert.
> 
> Revert is built on the TreeTransform code, and therefore cannot produce
> an inconsistent transform at a logical level.  Whatever bugs there are
> in the way TreeTransform generates inventory deltas need to be fixed
> once, and should stay fixed forever.

I don't think TT was generating bad deltas so much as starting with too
small a set of changes. Perhaps I misunderstand where the logic is
intended to flow from in these cases.

> > I'd rather not have more flags to iter_changes, I think being
> > conservative and expanding as needed is the safest option. I think
> > revert is well placed as a special-case
> 
> I don't.  In my thinking, I rely on the fact that iter_changes is
> symmetric, and it comes as a shock that this has changed.

It hasn't landed on trunk.

> Having iter_changes emit things that haven't changed is bad enough.
> Having it emit things that *have changed* but should be ignored is very bad.
>
> I think we're running into trouble because we're changing the original
> contract of iter_changes into something much more complex.  I think we
> should re-examine whether iter_changes is actually the issue here.
> Maybe we need a new API that gives the data inventory deltas need, instead.

So, the problem I see is that iter_changes doesn't provide any support
for intent - it doesn't know if you want to make a delta with the
output, or if you want to undo parts of the output / all the output.

I'm not aware of any code users of iter_changes other than revert that
will be unhappy with the proposed change - are you aware of some others
I can go look at?

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090805/5194dcb3/attachment-0001.pgp 


More information about the bazaar mailing list