[RFC] decoding environment variables to unicode? (was: bugfix #131100)

Martin Pool mbp at sourcefrog.net
Tue Sep 4 08:07:46 BST 2007


On 9/4/07, Alexander Belchenko <bialix at ukr.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> My patch for #131100 reveal one hidden bug with show_version.
> If I have BZR_HOME env variable set to non-ascii value,
> then current bzr.dev show it as is (and probably in wrong encoding),
> but with my patch it throws UnicodeDecodeError, because config.config_dir()
> returns non-ascii plain string. This is very rare case, and I stepped
> into it only because I try to unicodize BZR_HOME in test cases.
>
> In module win32utils I have special function _ensure_unicode()
> that decode plain strings to unicode with user encoding.
>
> What the best way to handle this case:
>
> 1) convert location of config dir to unicode in config_dir() function?
> But how to handle non-windows platforms? Is it OK to use _ensure_unicode
> for them?
>
> 2) convert location of config dir to unicode only in show_version()
> function? Is it OK to use win32utils._ensure_unicode on non-win32
> platforms? (in this case I need to move this function to osutils, IMO).
>
> 3) ignore this case until bug report from real users will be filed?
>
> I think that variant 1 is wrong and vulnerable on non-win32 platforms.
> And variant 3 here only because I think it's very very very rare case.

In general we keep paths in memory as unicode, and #1 is consistent
with that approach.  I guess this will be a problem if the user's
$HOME or similar is in an encoding we can't understand, but that does
work when passed to filesystem functions.  I think you should try to
decode it using the user encoding.

-- 
Martin



More information about the bazaar mailing list