[RFC] Tree.iter_changes

John Arbash Meinel john at arbash-meinel.com
Wed Sep 20 21:14:54 BST 2006


Aaron Bentley wrote:
> John Arbash Meinel wrote:
> 
>>> I haven't looked over the complete patch, but from my understanding, I
>>> think it is still a little bit too low level.
>>>
>>> Specifically, dirstate is optimized for the case of 'compare current
>>> versus parents'. And I think we also have a similar possibility in our
>>> storage model. We want to be able to produce changes of this versus a
>>> parent, without having to create a complete inventory.
> 
> That aspect of dirstate is nice, but covering only that case is
> inadequate.  Most commands that compare against basis can also compare
> against any arbitrary revision.  (e.g. revert, status).
> 
> And some, (e.g. diff, merge --uncommitted) compare a working tree
> against another working tree.
> 
> So from my POV, an interface that only supports diffing against parents
> is too limited.

I realize it is limited, but it is a common case that our data storage
model (both dirstate and our current .knit format) heavily optimize.
Meaning we can get the answer very cheaply. It may be a single function
that might get special cased by a more generic function. But I think it
is both a common case, and easy to optimize with our current storage model.

> 
>>> For example, dirstate lets us iterate through the whole inventory, and
>>> compare against all parents as we go.
> 
> AFAIK, commit is the only command that can make use of that functionality.
>

Sure, but it would be helpful for 'bzr status' and 'bzr diff' to not
have to create a complete inventory, just for the 3 files that have changed.


>>> I wonder if it wouldn't be worthwhile to have an 'iter_changes' that
>>> only worked against the leftmost parent. Because I think it is
>>> frequently needed, and our storage models make it reasonably cheap to
>>> compute. (We could extract it without having to extract the complete
>>> inventories, and diffing them)
> 
> The patch implements the functionality as an InterTree method.  So if
> the basis tree is a special type, e.g. a BasisTree or ParentTree, then
> we can implement an optimizer for that case, to use dirstate in the most
> efficient way possible.
> 
> Aaron

I like the approach. But I think we need a way to get at this stuff
without having to build the inventory. It may be that we can do it with
lazy RevisionTree objects. But really, if we want to scale to large
trees, we need to avoid creating and diffing complete inventories when
possible.

It is also useful for even stuff like 'log -v', which wants the changed
file list for each revision.

As an example of how much we could benefit from it...

Consider a tree with 10,000 files. In any given commit, you probably
only change < 10 of them. With your method, you would need to extract
the basis tree (possibly by doing a bunch of patch applications to the
inventory texts), and then reading the full text, and creating 10,000
InventoryEntry objects at 3us per python object. Meaning it will take at
least 30ms to just get the basis inventory.

And contrast that to the time it takes to just extract the 10 single
line changes from the .knit file, and print them out.

Now, admittedly our storage format isn't quite up to snuff for doing
inventory changes, because we don't record deletes properly. And renames
may do weird things.

Just to show the difference in creating times:
liliana % python -m timeit -s "class foo(object):
dquote>   def __init__(self, x, y, z, d):
dquote>     self.x = x
dquote>     self.y = y
dquote>     self.z = z
dquote>     self.d = d
dquote> " "x = foo(1,2,3,'4')"
100000 loops, best of 3: 3.03 usec per loop

liliana % python -m timeit "x = (1,2,3,'4')"
10000000 loops, best of 3: 0.102 usec per loop

liliana % python -m timeit "x = [1,2,3,'4']"
1000000 loops, best of 3: 0.434 usec per loop

So it takes 3us to create a single object, versus 0.1us to create a
tuple, and 0.4us to create a list.

This is some of the motivation of dirstate. To create tuples, and
process as much as possible in tuple form, before we get into Inventory
objects.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060920/5fe981df/attachment.pgp 


More information about the bazaar mailing list