_iter_changes API changes...
Aaron Bentley
aaron.bentley at utoronto.ca
Sun Feb 25 23:28:09 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> I'm preparing to do a rather large commit to dirstate altering
> _iter_changes, and I'd like to run the conceptual differences past you
> to ensure I'm not doing something silly.
>
> First I've folded the paths2ids call into _iter_changes, because: all
> users (I tracked em down) of _iter_changes were starting with user
> supplied pathnames, and folding the call into the iterator allows
> specialisation by dirstate, for extra tasty low overhead.
I'm not so keen on *only* allowing paths to be supplied, but if either
paths or file_ids could be supplied, that would be fine with me. While
no callers currently use a different algorithm for translating
user-supplied paths into filenames, commit uses a different algorithm in
which supplying a pathname causes both the children of the path, and its
ancestors to be selected.
That is, "foo/bar/baz" would select not only "foo/bar/baz/qux", but also
"foo" and "foo/bar". So if we want to use iter_changes in commit, this
would be a serious drawback.
The other issue is that the ids_across_trees algorithm seems very much a
UI thing. I think our low-level interfaces should support more
precision than that.
> Second, the current API returns adds/changes before deletes, but with
> dirstate we get those in-order as we walk.
I don't consider "adds/changes before deletes" an important guarantee.
> The fold-in of paths2ids
> also means that we can get some entries out of order: if we start with a
> user path of 'subdir', which includes in the source tree another dir
> called 'moved-dir' which was moved to '/moveddir', the entries for
> '/moveddir' should be output before those for subdir if we require
> sorted order. So I've changed the definition slightly to say that it
> *may* do this. I figure its cheaper to sort a list of (say) 100 changed
> entries in the output layer, than to have to do two passes to identify
> all entries in the [large] memory structure and then diff them.
I believe revert expects parents to be emitted before children, but most
code won't care, and we can fix revert pretty cheaply. So I think this
is fine.
> Thirdly, something I have not done yet is to expose unversioned files in
> the tree delta output, which will let tree delta clients avoid a
> tree-scan to find unknowns - they will need to filter the unversioned
> list to determine what is unknown, but thats relatively cheap.
Dealing with unversioned files in a delta interface is kinda wacky,
because you don't really care about the delta of unversioned files.
Say a working tree 'foo' has unknown file 'foo/bar'. If you compare
that against the basis, or any revision tree, 'foo/bar' will always be
emitted, because revision trees cannot unversioned files. But if you
compare 'foo' against itself, a true delta interface would not emit
'foo/bar', because 'foo' is exactly the same as itself.
But I understand the performance improvement of avoiding a tree scan, so
we should probably do as you're suggesting, but make it clear that
unversioned files will be emitted even if they are unchanged.
> Lastly, I have not yet, but I plan to today, changed the path output
> field to (oldpath, newpath), to eliminate all id lookups.
That would be fine. I was trying to avoid unnecessary old_path lookups,
but as it turns out, almost all clients need the old paths.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF4huJ0F+nu1YWqI0RAsvOAJ9HkJhaS0pz0xOgMMhFxk9EJ+q7kwCfdFvY
HffhIm/oMf7lTCLMVdKeR+U=
=bi7Y
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list