[MERGE][1.14] Fix bug #355454 by casting to unicode.

Vincent Ladeuil v.ladeuil+lp at free.fr
Tue Apr 7 17:22:24 BST 2009


>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:

    jam> Vincent Ladeuil wrote:
    >> This fixes the problem encountered by Bob while preparing the
    >> 1.14rc1 release.
    >> 
    >> I'm a bit surprised that we need to use absolute paths here since
    >> the call sites should be mainly dirstate where relative paths
    >> should be cheaply available and may avoid that double handling.
    >> 
    >> Anyway, this was blocking the release so we may work on a better
    >> fix later.
    >> 
    >> Vincent
    >> 
    >> 

    jam> BB:comment

    jam> I think this is probably the wrong fix, but I'm not 100% so I'm
    jam> commenting rather than resubmitting.


    jam> If someone is passing in an abspath that isn't Unicode,
    jam> then the caller is not transmitting a proper path.

As mentioned in the cover letter, dirstate is the expected caller.

./_dirstate_helpers_c.pyx:1148:                                    self.state._sha1_provider.stat_and_sha1(
./dirstate.py:3063:                                    self.state._sha1_provider.stat_and_sha1(

seems to be the only non-test code calling stat_and_sha1.

sha1 only relevant call seems to be:
./dirstate.py:1613:        return self._sha1_provider.sha1(abspath)

    jam> John
    jam> =:->

    jam> *paths* in memory should be unicode objects.

Except when they are utf8 encoded as this bug revealed :)

    jam> The fix you supplied here will break when whatever is
    jam> calling this function supplies an encoded string that
    jam> has non-ascii characters.

AFAICT this is called with an utf8 encoded path hence the need to
decode it. I suspect it can be called with unicode paths directly
too so I can't decode unconditionally.

I'd be very surprised if it can be called with invalid paths
though.

    jam> I might be wrong, but I *think* this is exposing a bug
    jam> in whoever is calling these functions.

I suspect the bug is that the API should accept utf8 encoded
*relative* paths, not arbitrary absolute paths, but that's far
more than I'm willing to address here.

We should certainly provides a better fix, I brought this one to
un-block Bob.

I'll continue investigating but at least we have that fix
available in the mean time.

       Vincent



More information about the bazaar mailing list