[MERGE] Fix for poor bzr rm * performance in trunk

John Arbash Meinel john at arbash-meinel.com
Tue May 5 15:16:03 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johan Walles wrote:
> Here's a patch that fixes the poor bzr rm * performance in bzr trunk:
> https://bugs.launchpad.net/bugs/180116
> 
> The patch does two things:
> * It calls osutils.minimum_path_selection() from within
> workingtree_4.iter_changes() instead of using a copy of
> osutils.minimum_path_selection() in there.
> * It converts osutils.minimum_path_selection() into an O(n log n) algorithm.
> 
> Timings of bzr trunk without this patch (notice how it's slower than
> 1.13.1, not my fault though :-):
> $ time rm --keep *
> real    0m11.291s
> user    0m7.304s
> sys     0m3.880s
> 
> New timings on my system with this patch applied:
> $ time bzr rm *
> real    0m11.124s
> user    0m7.088s
> sys     0m3.920s
> 
> $ time bzr rm --keep *
> real    0m9.936s
> user    0m6.768s
> sys     0m3.076s
> 
> So bzr rm now performs about the same with and without --keep.  As a
> side effect, anybody else calling osutils.minimum_path_selection() will
> be faster as well.
> 
>  Have fun :-) //Johan


I'm happy to see this being worked on.

+    search_paths = []
+    sorted_paths = list(paths)
+    sorted_paths.sort()

^- I think you can just do "sorted_paths = sorted(paths)"

Which should be approximately the same, but is a bit more
straightforward (and possibly optimizable by python, etc.)


+    for path in sorted_paths:
+        if len(search_paths) == 0:
+            # Result is empty, add first path
+            search_paths.append(path)
+            continue
+        if not is_inside(search_paths[-1], path):
+            # This path is unique, add it
+            search_paths.append(path)
+            continue
+    return set(search_paths)

^- So I believe the only question is whether this optimization is always
legal. Once sorted, will a containing entry always be the directly
previous entry added to search_paths....

My quick answer is that it might depend on the sorting. Consider:

 a
 a/b
 a-b
 a0b

if you sort this, you get:

 a
 a-b
 a/b
 a0b

See the potential problem there? If I understand it correctly, 'a/b'
won't be properly filtered from 'a', because 'a-b' will be the last
added to the unique search paths.

I think the correct fix is to do:

def sort_key(path):
  return path.split('/')
sorted_paths = sorted(paths, key=sort_key)

I think it is important for us to add some test cases for these
edge-cases in bzrlib/tests/test_osutils.py

BB:resubmit

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoASiIACgkQJdeBCYSNAAOw6wCfc/cj2vF1jdhxe9IKaEOUDVUF
mq8An2K4K7KofLdfnFJRmwdD5goggmwg
=mI6V
-----END PGP SIGNATURE-----



More information about the bazaar mailing list