[MERGE/RFC] Support symlinks to Unicode file names (bug #272444)

Daniel Clemente dcl441-bugs at yahoo.com
Mon Feb 16 22:03:50 GMT 2009


  Hi,

  I send a proto-patch which apparently solves bug 272444 (https://bugs.launchpad.net/bzr/+bug/272444), „Support symlinks to non-ascii file names“.

  I don't know if it's correct because I have no experience at all with the inner workings of Bazaar. But it works, at least under these conditions:
 - in my system: GNU/Linux Ubuntu, with utf-8 locale (LANG=ca_ES.UTF-8), Python 2.5.2 and latest bzr
 - with a simple testcase like this one:    cd /tmp; bzr=/w/bzr/arre_272444/bzr;  rm -rf br1 br2; mkdir br1; cd br1; $bzr init .; touch més; ln -s més prova; $bzr add prova; $bzr commit -m "link to utf-8 file name"; cd ..; $bzr branch br1 br2
 - with all the other tests from „bzr selftest --no-plugins“


  I had to discover the right encoding for each data. After collecting many useful comments (specially from John Arbash Meinel), I came to think the following (warning: these are just suppositions and may be wrong):

- when you read a symlink target from disk, you must decode it to Unicode via the file system encoding
- when you write a symlink to disk, you must encode it from Unicode to the file system encoding
- internally, Bazaar will use Unicode objects in memory and UTF-8 strings in files
- dirstate, according to its documentation, stores/handles everything in UTF-8. I think this is for performance
- somethink called „inventory text“ which I don't understand much must be stored in UTF-8
- I think that dirstate has a part in disk and other in memory; the disk one is in UTF-8 and the memory one in Unicode
- the fingerprint (which for symlinks is equal to the symlink target) is UTF-8 because it is part of dirstate
- inv_entry.symlink_target should be Unicode


  I hope you can correct or improve my patch, specially:
- test in other systems (other platforms, other Python versions, other encodings)
- check that it is complete (for instance, when reading from disk we should probably decode the link target to Unicode -- I don't know if the patch does this)
- remove outdated and incorrect comments
- correct misunderstandings about encodings... and maybe write more explanations about this in the code or in the wiki

  Feel free to modify it as you like. 

  Thanks,
Daniel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_272444_v1.patch
Type: text/x-diff
Size: 6013 bytes
Desc: first version of the patch for #272444
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090216/08d15ad3/attachment-0001.bin 


More information about the bazaar mailing list