_iter_changes API changes...

Sun Feb 25 23:28:09 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
>     I'm preparing to do a rather large commit to dirstate altering
> _iter_changes, and I'd like to run the conceptual differences past you
> to ensure I'm not doing something silly.
> 
> First I've folded the paths2ids call into _iter_changes, because: all
> users (I tracked em down) of _iter_changes were starting with user
> supplied pathnames, and folding the call into the iterator allows
> specialisation by dirstate, for extra tasty low overhead.

I'm not so keen on *only* allowing paths to be supplied, but if either
paths or file_ids could be supplied, that would be fine with me.  While
no callers currently use a different algorithm for translating
user-supplied paths into filenames, commit uses a different algorithm in
which supplying a pathname causes both the children of the path, and its
ancestors to be selected.

That is, "foo/bar/baz" would select not only "foo/bar/baz/qux", but also
"foo" and "foo/bar".  So if we want to use iter_changes in commit, this
would be a serious drawback.

The other issue is that the ids_across_trees algorithm seems very much a
UI thing.  I think our low-level interfaces should support more
precision than that.

> Second, the current API returns adds/changes before deletes, but with
> dirstate we get those in-order as we walk.

I don't consider "adds/changes before deletes" an important guarantee.

>  The fold-in of paths2ids
> also means that we can get some entries out of order: if we start with a
> user path of 'subdir', which includes in the source tree another dir
> called 'moved-dir' which was moved to '/moveddir', the entries for
> '/moveddir' should be output before those for subdir if we require
> sorted order. So I've changed the definition slightly to say that it
> *may* do this. I figure its cheaper to sort a list of (say) 100 changed
> entries in the output layer, than to have to do two passes to identify
> all entries in the [large] memory structure and then diff them.

I believe revert expects parents to be emitted before children, but most
code won't care, and we can fix revert pretty cheaply.  So I think this
is fine.

> Thirdly, something I have not done yet is to expose unversioned files in
> the tree delta output, which will let tree delta clients avoid a
> tree-scan to find unknowns - they will need to filter the unversioned
> list to determine what is unknown, but thats relatively cheap.

Dealing with unversioned files in a delta interface is kinda wacky,
because you don't really care about the delta of unversioned files.

Say a working tree 'foo' has unknown file 'foo/bar'.  If you compare
that against the basis, or any revision tree, 'foo/bar' will always be
emitted, because revision trees cannot unversioned files.  But if you
compare 'foo' against itself, a true delta interface would not emit
'foo/bar', because 'foo' is exactly the same as itself.

But I understand the performance improvement of avoiding a tree scan, so
we should probably do as you're suggesting, but make it clear that
unversioned files will be emitted even if they are unchanged.

> Lastly, I have not yet, but I plan to today, changed the path output
> field to (oldpath, newpath), to eliminate all id lookups.

That would be fine.  I was trying to avoid unnecessary old_path lookups,
but as it turns out, almost all clients need the old paths.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF4huJ0F+nu1YWqI0RAsvOAJ9HkJhaS0pz0xOgMMhFxk9EJ+q7kwCfdFvY
HffhIm/oMf7lTCLMVdKeR+U=
=bi7Y
-----END PGP SIGNATURE-----