[RFC] Change to use 'merge_sort' for per-file-log
Ian Clatworthy
ian.clatworthy at canonical.com
Mon Sep 22 06:54:56 BST 2008
John Arbash Meinel wrote:
> I personally believe the new algorithm is overall better, because these sorts
> of changes are 'noise'.
Agreed.
> With unlimited chunk-size (all at once) we trigger buffer_all (the current
> code). Memory is around 244MB (down from 400MB with the earlier code), and it
> takes 28s to process.
>
> With chunk size of 1 (the other extreme), we never buffer all, memory
> consumption stays low (82MB), but we still take 23s.
>
> Chunk_size = 100 we are at 82MB and 12s.
>
> Chunk_size = 1000 and we get buffer_all on the smaller indexes, and it takes
> 10.9s and 94MB.
>
> I don't really like hacking around GraphIndex in log.py code, as it is likely
> to be worse for BTreeIndex when we get around to them. However, it does show a
> dramatic improvement, so I think we should consider doing *something*.
Agreed. Given the above, making chunk_size a parameter to the routine in
question and defaulting the value to 1000 looks a good step forward.
Now for a review of the other bits ...
> IMPROVEMENTS:
>
> + * The per-file log algorithm now uses a lighter-weight way of tracking
> + when changes to a file are merged into another revision. The old
> + method created a new set at every merge point, which was wasteful
> + when the merge did not introduce new revisions.
> + (John Arbash Meinel)
> +
This latest version has changed the output behaviour so that needs to be
mentioned under the CHANGES section, not IMPROVEMENTS. I'm not sure the
2nd sentence is needed either - the first is a clear enough description of
the change IMO.
> def _filter_revisions_touching_file_id(branch, file_id, mainline_revisions,
> - view_revs_iter):
> + view_revs_iter, direction):
> """Return the list of revision ids which touch a given file id.
>
> The function filters view_revisions and returns a subset.
The docstring needs updating. It refers to view_revisions when the param
is now called view_revs_iter and the direction param needs a :param: bit.
I also like the example you gave in the email more so perhaps you could
use it in place of the one given. The existing text needs a grammar fix
as well ...
s/will also can be/will also be/
> + assert depth <= len(current_merge_stack)
> + if depth == len(current_merge_stack):
> + current_merge_stack.append(info)
> else:
We don't allow asserts any more. Please extend the following if/else block
to produce an AssertionError if you want to keep the check.
The changes also break the test suite so I'm voting resubmit instead of
tweak. Making the direction parameter optional may be enough to fix that
though?
bb:resubmit
Ian C.
More information about the bazaar
mailing list