[MERGE] Fix for poor bzr rm * performance in trunk

Johan Walles johan.walles at gmail.com
Wed May 6 07:23:17 BST 2009


And here comes the patch as well...

2009/5/6 Johan Walles <johan.walles at gmail.com>:
> Thanks for the quick review John!
>
> Here's an updated patch with the following new changes in it:
> * Existing unit test updated.
> * The resulting test failure fixed.
> * Updated the NEWS file.
>
> I've also sent a copyright-assignment e-mail to mbp at canonical.com as
> requested on http://bazaar-vcs.org/BzrDevelopment.
>
>  Cheers //Johan
>
> 2009/5/5 John Arbash Meinel <john at arbash-meinel.com>:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Johan Walles wrote:
>>> Here's a patch that fixes the poor bzr rm * performance in bzr trunk:
>>> https://bugs.launchpad.net/bugs/180116
>>>
>>> The patch does two things:
>>> * It calls osutils.minimum_path_selection() from within
>>> workingtree_4.iter_changes() instead of using a copy of
>>> osutils.minimum_path_selection() in there.
>>> * It converts osutils.minimum_path_selection() into an O(n log n) algorithm.
>>>
>>> Timings of bzr trunk without this patch (notice how it's slower than
>>> 1.13.1, not my fault though :-):
>>> $ time rm --keep *
>>> real    0m11.291s
>>> user    0m7.304s
>>> sys     0m3.880s
>>>
>>> New timings on my system with this patch applied:
>>> $ time bzr rm *
>>> real    0m11.124s
>>> user    0m7.088s
>>> sys     0m3.920s
>>>
>>> $ time bzr rm --keep *
>>> real    0m9.936s
>>> user    0m6.768s
>>> sys     0m3.076s
>>>
>>> So bzr rm now performs about the same with and without --keep.  As a
>>> side effect, anybody else calling osutils.minimum_path_selection() will
>>> be faster as well.
>>>
>>>  Have fun :-) //Johan
>>
>>
>> I'm happy to see this being worked on.
>>
>> +    search_paths = []
>> +    sorted_paths = list(paths)
>> +    sorted_paths.sort()
>>
>> ^- I think you can just do "sorted_paths = sorted(paths)"
>>
>> Which should be approximately the same, but is a bit more
>> straightforward (and possibly optimizable by python, etc.)
>>
>>
>> +    for path in sorted_paths:
>> +        if len(search_paths) == 0:
>> +            # Result is empty, add first path
>> +            search_paths.append(path)
>> +            continue
>> +        if not is_inside(search_paths[-1], path):
>> +            # This path is unique, add it
>> +            search_paths.append(path)
>> +            continue
>> +    return set(search_paths)
>>
>> ^- So I believe the only question is whether this optimization is always
>> legal. Once sorted, will a containing entry always be the directly
>> previous entry added to search_paths....
>>
>> My quick answer is that it might depend on the sorting. Consider:
>>
>>  a
>>  a/b
>>  a-b
>>  a0b
>>
>> if you sort this, you get:
>>
>>  a
>>  a-b
>>  a/b
>>  a0b
>>
>> See the potential problem there? If I understand it correctly, 'a/b'
>> won't be properly filtered from 'a', because 'a-b' will be the last
>> added to the unique search paths.
>>
>> I think the correct fix is to do:
>>
>> def sort_key(path):
>>  return path.split('/')
>> sorted_paths = sorted(paths, key=sort_key)
>>
>> I think it is important for us to add some test cases for these
>> edge-cases in bzrlib/tests/test_osutils.py
>>
>> BB:resubmit
>>
>> John
>> =:->
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (Cygwin)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iEYEARECAAYFAkoASiIACgkQJdeBCYSNAAOw6wCfc/cj2vF1jdhxe9IKaEOUDVUF
>> mq8An2K4K7KofLdfnFJRmwdD5goggmwg
>> =mI6V
>> -----END PGP SIGNATURE-----
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bzr-quick-rm-star.patch
Type: text/x-diff
Size: 13899 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090506/8988c5da/attachment-0001.bin 


More information about the bazaar mailing list