[MERGE] Fix for poor bzr rm * performance in trunk
John Arbash Meinel
john at arbash-meinel.com
Tue May 5 15:16:03 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Johan Walles wrote:
> Here's a patch that fixes the poor bzr rm * performance in bzr trunk:
> https://bugs.launchpad.net/bugs/180116
>
> The patch does two things:
> * It calls osutils.minimum_path_selection() from within
> workingtree_4.iter_changes() instead of using a copy of
> osutils.minimum_path_selection() in there.
> * It converts osutils.minimum_path_selection() into an O(n log n) algorithm.
>
> Timings of bzr trunk without this patch (notice how it's slower than
> 1.13.1, not my fault though :-):
> $ time rm --keep *
> real 0m11.291s
> user 0m7.304s
> sys 0m3.880s
>
> New timings on my system with this patch applied:
> $ time bzr rm *
> real 0m11.124s
> user 0m7.088s
> sys 0m3.920s
>
> $ time bzr rm --keep *
> real 0m9.936s
> user 0m6.768s
> sys 0m3.076s
>
> So bzr rm now performs about the same with and without --keep. As a
> side effect, anybody else calling osutils.minimum_path_selection() will
> be faster as well.
>
> Have fun :-) //Johan
I'm happy to see this being worked on.
+ search_paths = []
+ sorted_paths = list(paths)
+ sorted_paths.sort()
^- I think you can just do "sorted_paths = sorted(paths)"
Which should be approximately the same, but is a bit more
straightforward (and possibly optimizable by python, etc.)
+ for path in sorted_paths:
+ if len(search_paths) == 0:
+ # Result is empty, add first path
+ search_paths.append(path)
+ continue
+ if not is_inside(search_paths[-1], path):
+ # This path is unique, add it
+ search_paths.append(path)
+ continue
+ return set(search_paths)
^- So I believe the only question is whether this optimization is always
legal. Once sorted, will a containing entry always be the directly
previous entry added to search_paths....
My quick answer is that it might depend on the sorting. Consider:
a
a/b
a-b
a0b
if you sort this, you get:
a
a-b
a/b
a0b
See the potential problem there? If I understand it correctly, 'a/b'
won't be properly filtered from 'a', because 'a-b' will be the last
added to the unique search paths.
I think the correct fix is to do:
def sort_key(path):
return path.split('/')
sorted_paths = sorted(paths, key=sort_key)
I think it is important for us to add some test cases for these
edge-cases in bzrlib/tests/test_osutils.py
BB:resubmit
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkoASiIACgkQJdeBCYSNAAOw6wCfc/cj2vF1jdhxe9IKaEOUDVUF
mq8An2K4K7KofLdfnFJRmwdD5goggmwg
=mI6V
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list