loggerhead takes a while to fetch revisions

Mon Dec 18 04:33:22 GMT 2006

On 15 Dec 2006, at 8:17, John Arbash Meinel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robey Pointer wrote:
>> (See previous post about what loggerhead is.)
>>
>> One thing I ran into when writing a web front-end was that the  
>> process
>> of fetching and processing a revision from a bazaar repository can  
>> take
>> a while.  I don't think I'm doing that much post-processing on a
>> revision (the relevant code is history.py:get_change) so I suspect
>> something is going on within bzrlib.
>>
>> I added a caching system that saves the pickled data structure that I
>> create after fetching a revision and its delta (specifically: the  
>> files
>> that are involved), and it helps IMMENSELY.  With a full cache,  
>> almost
>> every operation can complete in <= 2 seconds, even annotating a file
>> like bzr.dev's NEWS file.  Without it, some operations take 2 or more
>> *minutes*.
>>
>> Maybe this is normal behavior, but I thought it was worth
>> investigating.  I did an lsprof of the get_cache() call and posted  
>> it here:
>>
>>     http://www.lag.net/~robey/code/get_change.html
>
>
> Looking at the performance graph, I'm pretty sure the bulk of time is
> spent here:
> http://www.lag.net/~robey/code/get_change.html#_..._bzrlib_knit.py_ 
> +1062(__init__)
>
> That seems to be 654/937ms And of those 650ms, I'm guessing the  
> bulk of
> it is spent reading the inventory.knit header. Considering:
>
> http://www.lag.net/~robey/code/ 
> get_change.html#_..._bzrlib_repository.py_+460(get_inventory_weave)
> get_inventory_weave() is taking 395ms.
>
> So there are a couple things...
>
> 1) If you can maintain a read lock inbetween actions, then I  
> thought it
> keeps the inventory knit cached in memory, and doesn't have to  
> parse it
> again. So while a lock won't help much for a single action, between
> actions it should help a lot.

Grabbing the repository lock doesn't seem to help, but it led me in  
the right direction.  Fetching revisions using get_revisions()  
instead of N calls to get_revision() helps a LOT.  (from 70 revs/ 
minute to 460)

Hmm, and there's a get_deltas_for_revisions() method which bumps it  
up to about 700/minute.  That's huge!

With a couple of other tweaks, that knocks the load time for the  
revision-list down to around 6-7 seconds, which is almost good enough  
to do without the cache.  Not quite (I would prefer around 1-2  
seconds, max) but much, much better.  At least generating the cache  
will no longer take all night.

One thing that would help also is including the two RevisionTree  
objects in a Delta.  Would that break any abstractions?  The reason  
it would help is: When I fetch a pile of deltas, the repository  
collects a temporary cache of RevisionTrees to build them up, but  
doesn't save them.  If I want to compute 'diffs' for these deltas, I  
have to go fetch these same RevisionTrees all over again.

> 2) We need to work on getting _KnitIndex.__init__() faster anyway.
> Dmitry has a couple of optimizations he has proposed, and I have
> approved, but nobody else has looked at them.
> http://bundlebuggy.aaronbentley.com/request/% 
> 3C45700C04.6040706 at hlabs.spb.ru%3E
>
> In deference to Aaron, I want to ask that we avoid the spurious
> whitespace changes (changing whitespace just for cleanliness sake,  
> since
> it might make future merging conflict more).

I tried this patch, and it didn't appear to help (even after the  
above improvements) so it may be a red herring.  But also the patch  
didn't apply cleanly (at least a half dozen conflicts) so I can't  
honestly say that I tried a functional version of the patch.  It may  
have been messed up.  A new, clean-applying version of the patch  
would probably be worth trying.

> 3) We really want to switch so that revision_ids are just UTF-8  
> strings
> in memory, rather than having to encode/decode them as they are
> read/written. It means adding another encode() and decode() step in  
> the
> XML serializers, because cElementTree wants to work in Unicode  
> objects.
> So in the short term, it won't help a whole lot. But it will help  
> while
> parsing the knit index files. And in the long run, we'll probably  
> switch
> away from cElementTree, and then it will help a lot there too.

My gut says that cElementTree verses pickle is the root cause of the  
remaining speed difference (which looks like another 10x).

> 4) We might also be able to greatly revise how the index files are
> parsed. Right now, we do a readlines() call and then iterate  
> through the
> lines. It might actually be a lot faster to do:
>
> text = fp.read()
> recs = [line.split() for line in text.split('\n')
>                        if line.endswith(':')]
> for rec in recs:
>   parents = []
>   for value in rec[4:-1]:
>     if value.startswith('.'):
>       parents.append(value[1:]) # No decode step, right?
>     else:
>       parents.append(self._history[int(value)])

Obviously this means reading the entire index file into memory.  Are  
those generally small?

> Also, I honestly wonder if we are better of inlining the functions. I
> know --lsprof makes it look bad, but in my experience it over  
> estimates
> the function call overhead.
>
> It might also be possible to switch it around and do something like
> several generators. So you have:
>
> def parse_parents(recs):
>   for rec in recs:
>     parents = []
>     for value in rec[4:-1]:
>       if value.startswith('.'):
>         parents.append(value[1:]) # No decode step, right?
>       else:
>         parents.append(self._history[int(value)])
>     yield parents
>
> recs = [line.split() for line in fp.read().split('\n')
>                       if line.endswith(':')]
> for rec, parents in izip(recs, parse_parents(recs)):
>   version_id = rec[0]
>   if version_id not in self._cache:
>     index = len(self._history)
>   else:
>     index = self._cache[version_id][5]
>   self._cache[version_id] = (version_id,
>                              rec[1].split(','),
>                              int(rec[2]),
>                              int(rec[3]),
>                              parents,
>                              index,
>                             )
>
> I don't really know if it is faster, but if the penalty is function  
> call
> overhead, a generator avoids that overhead. I would actually guess  
> that
> it is slower, because of the double loop overhead...
>
> Anyway, it would be nice to have the option to split
> _KnitIndex.__init__() into stuff that could be profiled a little bit
> better, to figure out where the time is actually being spent.
>
> It may not even be the encode/decode steps, since those seem to only
> account for 1000 calls, and about 8ms of the total time.

I took a new lsprof snapshot, this time of get_changes() fetching 100  
revisions, and posted it here:

	http://www.lag.net/~robey/code/get_change2.html

The one thing I notice right away is that 4 seconds out of 9 seem to  
be spent in xml_serializer.py.

robey