[merge] Don't .split() on basenames Was: x.split('/') < y.split('/') is still wrong

John Arbash Meinel john at arbash-meinel.com
Tue Jul 10 21:16:33 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> I've been looking into my pyrex dirstate reader, and I figured I would cleanup
> the bisect-on-disk code while I'm there. And it turns out that some of our
> assumptions about the paths were sorted are incorrect.
> 
> At one point we were comparing "path1 < path2" which we know is incorrect.
> So we switched to "path1.split('/') < path2.split('/')".
> But I now think that it is incorrect as well.
> 
> Our sorting is actually:
> 
>  1) create an entry for the root, set root as 'current' directory
>  2) list all children of the current directory
>  3) put all children which are directories into a (FIFO) queue
>  4) move to the next directory in the queue
>  5) goto step 2 until done
> 
> If you are more programmatically minded it is:
> 
>         def _key(entry):
>             # sort by: directory parts, file name, file id
>             return entry[0][0].split('/'), entry[0][1], entry[0][2]
> 
> This is subtly, but importantly, different from "path.split('/')".


I just worked out why _iter_changes() is not effected by this. Specifically, we
only compare by directory blocks. So either we are comparing basename (which
doesn't need to be split, and can be compared exactly) or we are comparing by
directory name (which can always be split).

So effectively we *are* doing the correct comparison. The attached patch makes
it a little bit more obvious, by not calling .split('/') on the basename.

Also, I looked at 'set_state_from_inventory()' it is also using (effectively):

  dirname.split('/') < dirname2.split('/')
  and basename < basename
  ...

So it is also correct, though mostly by accident than because I specifically
designed it to work.

So my fears are temporarily unfounded. Though it is something I need to think
about, because I have some C code that I need to decide how it should work.
(cmp_by_dirs).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGk+kgJdeBCYSNAAMRAtGaAJ9C3xBWX3AqB0P+s79oxGZogL+owQCgmaLv
tczzjA2BwVsttRw/c1dbmaM=
=ha4n
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: simple_no_split.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070710/aba479da/attachment.diff 


More information about the bazaar mailing list