unicode symlink_target handling

John Arbash Meinel john at arbash-meinel.com
Thu Jun 5 22:41:20 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just a quick note...

It seems we are very loose with our 'symlink_target' handling. Specifically we
tend to treat it as an 8-bit string, except we store it in XML which would break
if the target was non-ascii.

If you grep the code base for "\<readlink\>" you'll find a few places that are
using it:

~ DirState._read_link() returns the raw os.readlink
~ _PreviewTree.get_symlink_target()  returns the raw os.readlink()
~ bzrlib.transform._content_match() also uses the raw os.readlink()
~ WorkingTree.path_content_summary() uses raw os.readlink()
~ WT.get_symlink_target() does as well.

In fact, I didn't find any places that encode the symlink target.

Now WT4._generate_inventory() does do:

~  inv_entry.symlink_target = utf8_decode(fingerprint)[0]


But DirStateRevisionTree.get_symlink_target() just does:
~            # At present, none of the tree implementations supports non-ascii
~            # symlink targets. So we will just assume that the dirstate path is
~            # correct.
~            return entry[1][parent_index][1]

xml8.write_inventory uses:

~    append('<symlink file_id="%s name="%s%s%s revision="%s '
~        'symlink_target="%s />\n' % (
~        _encode_and_escape(ie.file_id),
~        _encode_and_escape(ie.name),
~        parent_str, parent_id,
~        _encode_and_escape(ie.revision),
~        _encode_and_escape(ie.symlink_target)))

_encode_and_escape uses XML code escapes rather than something like UTF-8.
(&#229; instead of u'\xe5'). So cElementTree would read those back as Unicode
objects if they contain unicode, or plain 8-bit strings if they don't.

I suppose the bug is that we just don't support non-ascii symlink targets. I'm
just trying to work out the right solution for:
~  https://bugs.launchpad.net/bzr/+bug/135320

Because *sometimes* the symlink_target is a Unicode object, and sometimes it is
a plain 'str'. I suppose I'll do the easy thing and just str(fingerprint) since
the rest of the code doesn't support non-ascii symlink targets.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhIXYAACgkQJdeBCYSNAAMUpACfZXlW8EUOSyzNzq31oyC/aD3F
PDEAoMCpoEIAaj0QeiBJI+7b55MqreQu
=5jlR
-----END PGP SIGNATURE-----



More information about the bazaar mailing list