[RFC] Tree.iter_changes
Aaron Bentley
aaron.bentley at utoronto.ca
Wed Sep 20 23:33:29 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Aaron Bentley wrote:
>
>>John Arbash Meinel wrote:
>>
>>So from my POV, an interface that only supports diffing against parents
>>is too limited.
>
>
> I realize it is limited, but it is a common case that our data storage
> model (both dirstate and our current .knit format) heavily optimize.
I agree that it's valuable to optimize this case, but I think the
proposed iter_changes can support that.
>>>>For example, dirstate lets us iterate through the whole inventory, and
>>>>compare against all parents as we go.
>>
>>AFAIK, commit is the only command that can make use of that functionality.
>
> Sure, but it would be helpful for 'bzr status' and 'bzr diff' to not
> have to create a complete inventory, just for the 3 files that have changed.
That's not what I meant. I meant that most commands only compare two
trees, and so comparing against more than one parent efficiently doesn't
help them.
I agree that comparing against one parent efficiently is a win for other
commands like diff, but I think we can support maximum efficiency doing
that for iter_changes, as it stands.
>>The patch implements the functionality as an InterTree method. So if
>>the basis tree is a special type, e.g. a BasisTree or ParentTree, then
>>we can implement an optimizer for that case, to use dirstate in the most
>>efficient way possible.
> I like the approach. But I think we need a way to get at this stuff
> without having to build the inventory.
I don't understand. Why would that require building the inventory?
By default, it only emits information about modified files, and it does
not emit inventory entries, just tuples.
> It is also useful for even stuff like 'log -v', which wants the changed
> file list for each revision.
Well, there's always the file_ids_altered_by_revision_ids hack, but
doing it right for a comparison interface requires another inventory
format, I think.
> As an example of how much we could benefit from it...
>
> Consider a tree with 10,000 files. In any given commit, you probably
> only change < 10 of them. With your method, you would need to extract
> the basis tree (possibly by doing a bunch of patch applications to the
> inventory texts),
The inputs are trees, but I don't think it's necessary to generate the
inventories.
> and then reading the full text, and creating 10,000
> InventoryEntry objects at 3us per python object.
This method certainly doesn't need InventoryEntry objects. It does emit
data such as name and executability, but those are essential to the task.
> And contrast that to the time it takes to just extract the 10 single
> line changes from the .knit file, and print them out.
I think this API can accomodate that kind of optimization.
> This is some of the motivation of dirstate. To create tuples, and
> process as much as possible in tuple form, before we get into Inventory
> objects.
Yes, I already understood that object creation can be expensive, which
is why iter_changes outputs only tuples.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFFEcG50F+nu1YWqI0RAqe2AJ4ujXukxRwvwJVZIUjnmt6vbqW6pQCdE0Od
zXmNlOahfoXvlmTCtmUdDOo=
=V0pH
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list