[MERGE] Test for bug #272444 (symlinks to Unicode file names)

Daniel Clemente dcl441-bugs at yahoo.com
Mon Oct 27 01:26:10 GMT 2008

Andrew Bennetts <andrew.bennetts at canonical.com> writes:

> I don't know what needs fixing, but I do think that it would be better for
> commit to fail than to make an unbranchable/uncheckoutable branch.

  I don't think it's much use trying to displace the source of the error, when we can try to fix bug #272444 and make both commit and branch work.
  Maybe later, commit() can raise a warning if it can't detect the system encoding for the symlink target, but that's another feature to implement, and for a very special case.

> Interestingly, when I try from the command line to commit a symlink to adiós I
> get a traceback,

  I can do that commit. Try this test, it fails for me (with Python 2.5) on branch, not commit: https://bugs.launchpad.net/bzr/+bug/272444/comments/1
  If you see new strange behaviours, you could write a new test or expand the existing ones.

>>   ...should I change something else? I'll send the patch when we decide if we need both tests or not.
> Unfortunately, yes: make the tests pass with Python 2.4, and ideally Python 2.6.

I attach a new patch which:
 - breaks long lines
 - provides in a comment the path of the other test's file
 - gives some reasons for having two tests
 - passes on Python 2.4, 2.5 and 2.7a0 (so probably 2.6 too). Both tests

Still remaining:
 - Aaron, could you tell if a branch_implementation test is needed? (see parent message)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_272444_v8.patch
Type: text/x-diff
Size: 11605 bytes
Desc: eigtht version of new test
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20081027/7da42941/attachment.bin 
-------------- next part --------------

> And ideally fix the UnicodeWarnings that the tests emit in 2.5.

  I think this is part of actually fixing Bazaar, not the tests.
  I still don't have the knowledge to write a proper and clean patch, but I'm trying my first steps to fix the bug. Let me explain what I noticed:

  The problem shows itself at repository.py, line ~400:

        elif kind == 'symlink':
            current_link_target = content_summary[3]
            if not store:
                # symlink target is not generic metadata, check if it has
                # changed.
                if current_link_target != parent_entry.symlink_target:
                    store = True

  At that line, current_link_target is '\xce\xa9' but parent_entry.symlink_target is u'\u03a9'.
  My uncertain interpretation is that the second one is wrong because it should have been encoded to utf-8 before being written into the repository. Either both stay in unicode or in utf-8, and I think it must be utf-8 because that's what fingerprints are.
  My guess is that whoever wrote that u'\u03a9' to the inventory should have encoded it to utf-8.
  And that would be workingtree_4.py, around line 1590:

                elif kind == 'symlink':
                    inv_entry.executable = False
                    inv_entry.text_size = None
                    inv_entry.symlink_target = utf8_decode(fingerprint)[0]

  That could be changed to just „inv_entry.symlink_target = fingerprint“ to always use utf-8. Other code should be adapted accordingly.

  Is that the right way? I'd appreciate anyway some pointers about how to implement this.


More information about the bazaar mailing list