loggerhead takes a while to fetch revisions

Robey Pointer robey at lag.net
Mon Jan 15 21:42:19 GMT 2007


On 4 Jan 2007, at 12:50, John Arbash Meinel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robey Pointer wrote:
> ...
>
>> One thing that would help also is including the two RevisionTree  
>> objects
>> in a Delta.  Would that break any abstractions?  The reason it would
>> help is: When I fetch a pile of deltas, the repository collects a
>> temporary cache of RevisionTrees to build them up, but doesn't save
>> them.  If I want to compute 'diffs' for these deltas, I have to go  
>> fetch
>> these same RevisionTrees all over again.
>
> I think we could return them with an alternative api.
>
> The original point (IMO) of having something like get_revision_delta()
> was so that you could create one without having to actually create 2
> full Revision Trees. Theoretically most of the info for the delta is
> already stored in the files. Which would be a lot better than  
> creating 2
> full revision trees, and then finding the delta between them.
>
> However, I see your point, and it is reasonable to have a better  
> api for it.
>
> Something like:
>
> def get_deltas_for_revisions_with_trees(self, revisions):

[...]

> Basically, just a copy of the old code, and changing the yield  
> statement.
>
> Actually, everything in there is a public api, so you could write the
> same thing as a helper function.

So I did just that. :)

I'm not sure it's much faster, but it's obviously less redundant, so  
I'm going to keep it.


>> I took a new lsprof snapshot, this time of get_changes() fetching 100
>> revisions, and posted it here:
>>
>>     http://www.lag.net/~robey/code/get_change2.html
>>
>> The one thing I notice right away is that 4 seconds out of 9 seem  
>> to be
>> spent in xml_serializer.py.
>>
>> robey
>
> This is probably exacerbated by having to create a RevisionTree a  
> second
> time if you have already created 1. Does 'get_changes()' actually
> compute the file-level diffs for everything? Or is it just the
> inventory-level diffs?

It's only the inventory-level diffs in the common case: just the list  
of files touched, etc.


> I'm guessing it is just inventory-level, since I don't see any other
> diffs going on in the lsprof results.
>
> One thing to note, lsprof does penalize xml_serializer a little bit  
> more
> than other functions. So while it is slow, it isn't quite as slow as
> lsprof says.
>
> You might try writing a helper for get_revision_deltas_with_trees 
> (), and
> see if that helps at all.

Looking at the lsprof, I guess it did help by around 0.75 seconds.  I  
posted the new lsprof output here:

     http://www.lag.net/robey/code/get_change3.html

This may not be an interesting thing to optimize, if it's not done  
often outside of tools like loggerhead, but I thought I'd post the  
lsprof in case anyone is curious.

robey




More information about the bazaar mailing list