slower performance of log and annotate in 1.6?

Tue Aug 12 02:52:08 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Martin Pool wrote:
>> The benchmarks <http://benchmark.bazaar-vcs.org/usertest/log/usertest.log> running on Orcadas seem to show log and annotate may be slower in trunk than they were in 1.5, which would be a concern for 1.6.  It may be that this is just measurement error on that machine, so there is no immediate need to panic, but if anyone either knows of a change that may have caused this or would like to try to reproduce it that would be helpful.
> 
> 
>> 1:min bzr.dev	bzr.1.3	bzr.1.5	Action
>> 1.23	49.437	44.208	40.159	script_common.LogTask:log
>> 1.84	77.094	41.851	48.409	script_common.LogTask:logOneFile
>> 1.20	8.787	8.733	7.332	script_common.AnnotateTask:annotate
> 
>> 1.20	7.627	7.837	6.356	script_common.TagsTask:tags
> 
> 
>> -- 
>> Martin <http://launchpad.net/~mbp/>
> 
> 
> Well, I notice that 'bzr annotate' is the same time between 1.3 and bzr.dev.
> 
> If I was guessing, I would say that we made access to the per-file graph
> slower than it used to be. This would impact logOneFile and annotate.
> But I'm purely speculating at this point.
> 
> I know for 'annotate', Robert also changed it to annotate the working
> tree, which would be a t least 1 more 'diff' operation. Though that
> shouldn't account for 1.5s of time.
> 
> John
> =:->

I just did an '--lsprof' of libmysqld/libmysqld.c for both 1.3 and 1.5.
Wall clock time, the difference was something like 33s versus 49s.
(baseline 'bzr log' for the whole tree was 30s for 1.3).

Under lsprof, things get *much* worse, it goes from 1m24s => 3m27s.

Where the time goes changes dramatically, but I think that is because
Robert changed how we bisect pack indices.

In bzr-1.3 the top performers according to --lsprof are:
     1869 24.7378 17.2688   bzrlib.index:861(_parse_lines)
 +1203501  4.5148  4.5148   +<method 'split' of 'str' objects>
  +634008  1.5348  1.5348   +<len>
  +883973  1.4194  1.4194   +<method 'append' of 'list' objects>
        5 38.8056 11.5699   bzrlib.index:287(_buffer_all)
       +5 21.3826 15.0642   +bzrlib.index:861(_parse_lines)
  +252489  5.2040  4.4091   +bzrlib.index:384(_resolve_references)
  2597163 62.8746 17.6399   bzrlib.index:430(_parsed_key_index)
 +2597163 45.2347 23.6188   +bzrlib.index:403(_find_index)
  5763935 12.3487 12.3487   <method 'append' of 'list' objects>
  2207371 42.7289 11.2234   bzrlib.index:417(_parsed_byte_index)
 +2207371 31.5055 17.2078   +bzrlib.index:403(_find_index)
    13234 16.5868  0.5536   bzrlib.bisect_multi:24(bisect_multi_bytes)

In bzr.dev, the top was:

  4804534  76.7402 40.8266   bzrlib.index:403(_find_index)
 +4804534  21.7914 21.7914   +<_bisect.bisect_right>
 +4804534  14.1222 14.1222   +<len>
     6232 169.5549 38.2373   bzrlib.index:571(_lookup_keys_via_location)
 +2597163  62.8746 17.6399   +bzrlib.index:430(_parsed_key_index)
 +2203763  42.6633 11.2050   +bzrlib.index:417(_parsed_byte_index)
     6162 178.6479  5.5336   bzrlib.bisect_multi:24(bisect_multi_bytes)

So we have 2.6M calls to _lookup_keys, and 4.8M calls to _find_index.

The thing that stands out IMO, is the _buffer_all call. We already know
that calling _buffer_all works a lot better when the majority of history
is accessed. Especially under lsprof, which penalizes all of the
function calls in a bisect_*** action.

The _buffer_all seems to be triggered by
"_filter_revisions_touching_file_id" which was calling
knit.versions(), which was causing  "_strip_prefix(iter_all_entries())".

iter_all_entries() triggers _buffer_all.

So in the end...

Performance was better in 1.3 because we didn't try to be lazy about
loading the whole index. In bzr.dev, everything is going through
_lookup_keys_via_location, which is slower when accessing most of history.

The new code does:

parent_map = dict(((key, value) for key, value in
    graph.iter_ancestry(mainline_revisions[1:]) if value is not None))

^- grab the complete revision graph, for the whole ancestry of the branch.

text_keys = [(file_id, rev_id) for rev_id in sorted_rev_list]
modified_text_versions = branch.repository.texts.get_parent_map(text_keys)

^- Assume that this file might have been modified in *every* revision,
and then use a get_parent_map() call to find out what actual keys are
present.

The problem is that second call has probably 90% misses, and misses are
actually pretty expensive in pack indices.

The only other way I see that we could do it, would be to grab the tip
revision tree, grab the inventory, find the files last modified
revision, and then use graph.iter_ancestry([(file_id, ie.revision)]), to
grab only keys that are present.

As I see it, the problem is that to find the per-file graph, you have to
indirect through the inventory, as we no longer have a way to say "give
me the graph of *this* file_id" because we don't have VersionedFile
(singular). It was bad to use "VF.versions()" because it had to load the
whole index, but we don't have an *entry-point* into the partial graph
anymore.

John
=:-.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkig7MgACgkQJdeBCYSNAAN3qACgl29Qp5DtavcNF2UHHO6l8E/h
V80Ani2wFGa8BLjrg/Mw35p6oKf5eAjj
=DlK1
-----END PGP SIGNATURE-----