loggerhead takes a while to fetch revisions
Robey Pointer
robey at lag.net
Mon Jan 15 21:42:19 GMT 2007
On 4 Jan 2007, at 12:50, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robey Pointer wrote:
> ...
>
>> One thing that would help also is including the two RevisionTree
>> objects
>> in a Delta. Would that break any abstractions? The reason it would
>> help is: When I fetch a pile of deltas, the repository collects a
>> temporary cache of RevisionTrees to build them up, but doesn't save
>> them. If I want to compute 'diffs' for these deltas, I have to go
>> fetch
>> these same RevisionTrees all over again.
>
> I think we could return them with an alternative api.
>
> The original point (IMO) of having something like get_revision_delta()
> was so that you could create one without having to actually create 2
> full Revision Trees. Theoretically most of the info for the delta is
> already stored in the files. Which would be a lot better than
> creating 2
> full revision trees, and then finding the delta between them.
>
> However, I see your point, and it is reasonable to have a better
> api for it.
>
> Something like:
>
> def get_deltas_for_revisions_with_trees(self, revisions):
[...]
> Basically, just a copy of the old code, and changing the yield
> statement.
>
> Actually, everything in there is a public api, so you could write the
> same thing as a helper function.
So I did just that. :)
I'm not sure it's much faster, but it's obviously less redundant, so
I'm going to keep it.
>> I took a new lsprof snapshot, this time of get_changes() fetching 100
>> revisions, and posted it here:
>>
>> http://www.lag.net/~robey/code/get_change2.html
>>
>> The one thing I notice right away is that 4 seconds out of 9 seem
>> to be
>> spent in xml_serializer.py.
>>
>> robey
>
> This is probably exacerbated by having to create a RevisionTree a
> second
> time if you have already created 1. Does 'get_changes()' actually
> compute the file-level diffs for everything? Or is it just the
> inventory-level diffs?
It's only the inventory-level diffs in the common case: just the list
of files touched, etc.
> I'm guessing it is just inventory-level, since I don't see any other
> diffs going on in the lsprof results.
>
> One thing to note, lsprof does penalize xml_serializer a little bit
> more
> than other functions. So while it is slow, it isn't quite as slow as
> lsprof says.
>
> You might try writing a helper for get_revision_deltas_with_trees
> (), and
> see if that helps at all.
Looking at the lsprof, I guess it did help by around 0.75 seconds. I
posted the new lsprof output here:
http://www.lag.net/robey/code/get_change3.html
This may not be an interesting thing to optimize, if it's not done
often outside of tools like loggerhead, but I thought I'd post the
lsprof in case anyone is curious.
robey
More information about the bazaar
mailing list