loggerhead takes a while to fetch revisions
Robey Pointer
robey at lag.net
Mon Dec 18 04:33:22 GMT 2006
On 15 Dec 2006, at 8:17, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robey Pointer wrote:
>> (See previous post about what loggerhead is.)
>>
>> One thing I ran into when writing a web front-end was that the
>> process
>> of fetching and processing a revision from a bazaar repository can
>> take
>> a while. I don't think I'm doing that much post-processing on a
>> revision (the relevant code is history.py:get_change) so I suspect
>> something is going on within bzrlib.
>>
>> I added a caching system that saves the pickled data structure that I
>> create after fetching a revision and its delta (specifically: the
>> files
>> that are involved), and it helps IMMENSELY. With a full cache,
>> almost
>> every operation can complete in <= 2 seconds, even annotating a file
>> like bzr.dev's NEWS file. Without it, some operations take 2 or more
>> *minutes*.
>>
>> Maybe this is normal behavior, but I thought it was worth
>> investigating. I did an lsprof of the get_cache() call and posted
>> it here:
>>
>> http://www.lag.net/~robey/code/get_change.html
>
>
> Looking at the performance graph, I'm pretty sure the bulk of time is
> spent here:
> http://www.lag.net/~robey/code/get_change.html#_..._bzrlib_knit.py_
> +1062(__init__)
>
> That seems to be 654/937ms And of those 650ms, I'm guessing the
> bulk of
> it is spent reading the inventory.knit header. Considering:
>
> http://www.lag.net/~robey/code/
> get_change.html#_..._bzrlib_repository.py_+460(get_inventory_weave)
> get_inventory_weave() is taking 395ms.
>
> So there are a couple things...
>
> 1) If you can maintain a read lock inbetween actions, then I
> thought it
> keeps the inventory knit cached in memory, and doesn't have to
> parse it
> again. So while a lock won't help much for a single action, between
> actions it should help a lot.
Grabbing the repository lock doesn't seem to help, but it led me in
the right direction. Fetching revisions using get_revisions()
instead of N calls to get_revision() helps a LOT. (from 70 revs/
minute to 460)
Hmm, and there's a get_deltas_for_revisions() method which bumps it
up to about 700/minute. That's huge!
With a couple of other tweaks, that knocks the load time for the
revision-list down to around 6-7 seconds, which is almost good enough
to do without the cache. Not quite (I would prefer around 1-2
seconds, max) but much, much better. At least generating the cache
will no longer take all night.
One thing that would help also is including the two RevisionTree
objects in a Delta. Would that break any abstractions? The reason
it would help is: When I fetch a pile of deltas, the repository
collects a temporary cache of RevisionTrees to build them up, but
doesn't save them. If I want to compute 'diffs' for these deltas, I
have to go fetch these same RevisionTrees all over again.
> 2) We need to work on getting _KnitIndex.__init__() faster anyway.
> Dmitry has a couple of optimizations he has proposed, and I have
> approved, but nobody else has looked at them.
> http://bundlebuggy.aaronbentley.com/request/%
> 3C45700C04.6040706 at hlabs.spb.ru%3E
>
> In deference to Aaron, I want to ask that we avoid the spurious
> whitespace changes (changing whitespace just for cleanliness sake,
> since
> it might make future merging conflict more).
I tried this patch, and it didn't appear to help (even after the
above improvements) so it may be a red herring. But also the patch
didn't apply cleanly (at least a half dozen conflicts) so I can't
honestly say that I tried a functional version of the patch. It may
have been messed up. A new, clean-applying version of the patch
would probably be worth trying.
> 3) We really want to switch so that revision_ids are just UTF-8
> strings
> in memory, rather than having to encode/decode them as they are
> read/written. It means adding another encode() and decode() step in
> the
> XML serializers, because cElementTree wants to work in Unicode
> objects.
> So in the short term, it won't help a whole lot. But it will help
> while
> parsing the knit index files. And in the long run, we'll probably
> switch
> away from cElementTree, and then it will help a lot there too.
My gut says that cElementTree verses pickle is the root cause of the
remaining speed difference (which looks like another 10x).
> 4) We might also be able to greatly revise how the index files are
> parsed. Right now, we do a readlines() call and then iterate
> through the
> lines. It might actually be a lot faster to do:
>
> text = fp.read()
> recs = [line.split() for line in text.split('\n')
> if line.endswith(':')]
> for rec in recs:
> parents = []
> for value in rec[4:-1]:
> if value.startswith('.'):
> parents.append(value[1:]) # No decode step, right?
> else:
> parents.append(self._history[int(value)])
Obviously this means reading the entire index file into memory. Are
those generally small?
> Also, I honestly wonder if we are better of inlining the functions. I
> know --lsprof makes it look bad, but in my experience it over
> estimates
> the function call overhead.
>
> It might also be possible to switch it around and do something like
> several generators. So you have:
>
> def parse_parents(recs):
> for rec in recs:
> parents = []
> for value in rec[4:-1]:
> if value.startswith('.'):
> parents.append(value[1:]) # No decode step, right?
> else:
> parents.append(self._history[int(value)])
> yield parents
>
> recs = [line.split() for line in fp.read().split('\n')
> if line.endswith(':')]
> for rec, parents in izip(recs, parse_parents(recs)):
> version_id = rec[0]
> if version_id not in self._cache:
> index = len(self._history)
> else:
> index = self._cache[version_id][5]
> self._cache[version_id] = (version_id,
> rec[1].split(','),
> int(rec[2]),
> int(rec[3]),
> parents,
> index,
> )
>
> I don't really know if it is faster, but if the penalty is function
> call
> overhead, a generator avoids that overhead. I would actually guess
> that
> it is slower, because of the double loop overhead...
>
> Anyway, it would be nice to have the option to split
> _KnitIndex.__init__() into stuff that could be profiled a little bit
> better, to figure out where the time is actually being spent.
>
> It may not even be the encode/decode steps, since those seem to only
> account for 1000 calls, and about 8ms of the total time.
I took a new lsprof snapshot, this time of get_changes() fetching 100
revisions, and posted it here:
http://www.lag.net/~robey/code/get_change2.html
The one thing I notice right away is that 4 seconds out of 9 seem to
be spent in xml_serializer.py.
robey
More information about the bazaar
mailing list