[RFC] Change to use 'merge_sort' for per-file-log

John Arbash Meinel john at arbash-meinel.com
Mon Sep 22 22:58:14 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...

>> I don't really like hacking around GraphIndex in log.py code, as it is likely
>> to be worse for BTreeIndex when we get around to them. However, it does show a
>> dramatic improvement, so I think we should consider doing *something*.
> 
> Agreed. Given the above, making chunk_size a parameter to the routine in
> question and defaulting the value to 1000 looks a good step forward.
> 
> Now for a review of the other bits ...

I'm not sure what "routine in question" you are mentioning.
"get_parent_map()" ? It is extending the get_parent_map api, so we would
need to update all of the other implementations to accept it. If we
*were* to do that, I would probably prefer to make a 'Hints' class and
pass it in that way.

The layers we have to go through is:

 KnitVersionedFiles.get_parent_map()
 KVF._get_parent_map_with_sources()
 CombinedGraphIndex.get_parent_map()
 CombinedGraphIndex.iter_entries()
 GraphIndex.iter_entries()

Inside GI.iter_entries() is where we would like to know that this is a
clustered, likely-to-fail request, so we don't really want to trigger
'buffer_all'.

> 
>>    IMPROVEMENTS:
>>  
>> +    * The per-file log algorithm now uses a lighter-weight way of tracking
>> +      when changes to a file are merged into another revision. The old
>> +      method created a new set at every merge point, which was wasteful
>> +      when the merge did not introduce new revisions.
>> +      (John Arbash Meinel)
>> +
> 
> This latest version has changed the output behaviour so that needs to be
> mentioned under the CHANGES section, not IMPROVEMENTS. I'm not sure the
> 2nd sentence is needed either - the first is a clear enough description of
> the change IMO.
> 

Well, the output change is a CHANGES but there is a distinct improvement
in performance. (The old form would consume enough memory to start
swapping before the log would actually complete.)

Anyawy, I'll move it to CHANGES. Though I'll note that I really don't
think people understand what 'bzr log file' is really doing or how it is
changing and why. I know Robert is thinking to change his vote over it,
because he thinks it only displays the entries in the per-file graph.

I'll also note that if you have 2 branches that have merged eachother
(such that they have 99% the same revisions), but a different mainline.
Then "bzr log --show-ids FILE" can give a different set of revision ids
in each branch. (The existing form at least gives all the same
revisions. And one that showed *only* the per-file graph would also show
the same values.)

For most people, those extra revisions are "noise" and make it harder to
understand (look at 'bzr log bzr.dev/bzr' and try to understand why you
see so many 'Merged bzr.dev' entries.)


>>  def _filter_revisions_touching_file_id(branch, file_id, mainline_revisions,
>> -                                       view_revs_iter):
>> +                                       view_revs_iter, direction):
>>      """Return the list of revision ids which touch a given file id.
>>  
>>      The function filters view_revisions and returns a subset.
> 
> The docstring needs updating. It refers to view_revisions when the param
> is now called view_revs_iter and the direction param needs a :param: bit.
> I also like the example you gave in the email more so perhaps you could
> use it in place of the one given. The existing text needs a grammar fix
> as well ...
> 
>   s/will also can be/will also be/
> 

I actually switched the code back to using "view_revisions" as it
assumes it is a list, and not an iterable (it does multiple passes over
the list). I also got rid of "mainline_revisions" because it is
no-longer used. And if I'm going to do an incompatible change by
bringing in 'direction', I may as well get rid of old cruft.


>> +        assert depth <= len(current_merge_stack)
>> +        if depth == len(current_merge_stack):
>> +            current_merge_stack.append(info)
>>          else:
> 
> We don't allow asserts any more. Please extend the following if/else block
> to produce an AssertionError if you want to keep the check.

Yeah, 'test_source' caught that for me.

> 
> The changes also break the test suite so I'm voting resubmit instead of
> tweak. Making the direction parameter optional may be enough to fix that
> though?
> 
> bb:resubmit
> 
> Ian C.
> 

I fixed it up in my local branch, but I'll resubmit it for you anyway.
(though this is technically not even tracked in BB :).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjYFPYACgkQJdeBCYSNAAOp1ACgy5mh/x9vkGBgoi/L0cE5BGMB
BUwAniZgdN07Jgb6x5jrCzlGky2HCzaY
=sAj4
-----END PGP SIGNATURE-----



More information about the bazaar mailing list