[RFC] Change to use 'merge_sort' for per-file-log
John Arbash Meinel
john at arbash-meinel.com
Mon Sep 22 22:58:14 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
>> I don't really like hacking around GraphIndex in log.py code, as it is likely
>> to be worse for BTreeIndex when we get around to them. However, it does show a
>> dramatic improvement, so I think we should consider doing *something*.
>
> Agreed. Given the above, making chunk_size a parameter to the routine in
> question and defaulting the value to 1000 looks a good step forward.
>
> Now for a review of the other bits ...
I'm not sure what "routine in question" you are mentioning.
"get_parent_map()" ? It is extending the get_parent_map api, so we would
need to update all of the other implementations to accept it. If we
*were* to do that, I would probably prefer to make a 'Hints' class and
pass it in that way.
The layers we have to go through is:
KnitVersionedFiles.get_parent_map()
KVF._get_parent_map_with_sources()
CombinedGraphIndex.get_parent_map()
CombinedGraphIndex.iter_entries()
GraphIndex.iter_entries()
Inside GI.iter_entries() is where we would like to know that this is a
clustered, likely-to-fail request, so we don't really want to trigger
'buffer_all'.
>
>> IMPROVEMENTS:
>>
>> + * The per-file log algorithm now uses a lighter-weight way of tracking
>> + when changes to a file are merged into another revision. The old
>> + method created a new set at every merge point, which was wasteful
>> + when the merge did not introduce new revisions.
>> + (John Arbash Meinel)
>> +
>
> This latest version has changed the output behaviour so that needs to be
> mentioned under the CHANGES section, not IMPROVEMENTS. I'm not sure the
> 2nd sentence is needed either - the first is a clear enough description of
> the change IMO.
>
Well, the output change is a CHANGES but there is a distinct improvement
in performance. (The old form would consume enough memory to start
swapping before the log would actually complete.)
Anyawy, I'll move it to CHANGES. Though I'll note that I really don't
think people understand what 'bzr log file' is really doing or how it is
changing and why. I know Robert is thinking to change his vote over it,
because he thinks it only displays the entries in the per-file graph.
I'll also note that if you have 2 branches that have merged eachother
(such that they have 99% the same revisions), but a different mainline.
Then "bzr log --show-ids FILE" can give a different set of revision ids
in each branch. (The existing form at least gives all the same
revisions. And one that showed *only* the per-file graph would also show
the same values.)
For most people, those extra revisions are "noise" and make it harder to
understand (look at 'bzr log bzr.dev/bzr' and try to understand why you
see so many 'Merged bzr.dev' entries.)
>> def _filter_revisions_touching_file_id(branch, file_id, mainline_revisions,
>> - view_revs_iter):
>> + view_revs_iter, direction):
>> """Return the list of revision ids which touch a given file id.
>>
>> The function filters view_revisions and returns a subset.
>
> The docstring needs updating. It refers to view_revisions when the param
> is now called view_revs_iter and the direction param needs a :param: bit.
> I also like the example you gave in the email more so perhaps you could
> use it in place of the one given. The existing text needs a grammar fix
> as well ...
>
> s/will also can be/will also be/
>
I actually switched the code back to using "view_revisions" as it
assumes it is a list, and not an iterable (it does multiple passes over
the list). I also got rid of "mainline_revisions" because it is
no-longer used. And if I'm going to do an incompatible change by
bringing in 'direction', I may as well get rid of old cruft.
>> + assert depth <= len(current_merge_stack)
>> + if depth == len(current_merge_stack):
>> + current_merge_stack.append(info)
>> else:
>
> We don't allow asserts any more. Please extend the following if/else block
> to produce an AssertionError if you want to keep the check.
Yeah, 'test_source' caught that for me.
>
> The changes also break the test suite so I'm voting resubmit instead of
> tweak. Making the direction parameter optional may be enough to fix that
> though?
>
> bb:resubmit
>
> Ian C.
>
I fixed it up in my local branch, but I'll resubmit it for you anyway.
(though this is technically not even tracked in BB :).
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjYFPYACgkQJdeBCYSNAAOp1ACgy5mh/x9vkGBgoi/L0cE5BGMB
BUwAniZgdN07Jgb6x5jrCzlGky2HCzaY
=sAj4
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list