[RFC] external diff on win32 and locale

Alexander Belchenko bialix at ukr.net
Fri Dec 22 17:31:29 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil пишет:
>>>>>> "bialix" == Alexander Belchenko <bialix at ukr.net> writes:
> 
>     bialix> There are 2 tests for external diff that failed on win32
>     bialix> if gettext library installed:
> 
>     bialix> test_diff.TestDiff.test_external_diff_binary_lang_c                                               ERROR
>     bialix> 129ms
>     bialix>     external diff failed with exit code 2; command: ['diff', '--label', 'old',
>     bialix> 'c:\\docume~1\\modul98\\locals~1\\temp\\bzr-diff-old-wuhbly', '--label', 'new',
>     bialix> 'c:\\docume~1\\modul98\\locals~1\\temp\\bzr-diff-new--qepfs', '--binary', '-u']
>     bialix> test_diff.TestDiffFiles.test_external_diff_binary                                                 ERROR
>     bialix> 440ms
>     bialix>     external diff failed with exit code 2; command: ['diff', '--label', 'old',
>     bialix> 'c:\\docume~1\\modul98\\locals~1\\temp\\bzr-diff-old-2neogw', '--label', 'new',
>     bialix> 'c:\\docume~1\\modul98\\locals~1\\temp\\bzr-diff-new-nk4wyp', '--binary', '-u']
> 
> 
>     bialix> IIRC Vincent try to address this problem,
> 
> Initially no. But a discussion followed that arrived to the
> locale problems on windows.
> 
>     bialix> but I don't remember what result of review of his
>     bialix> patch.
> 
> As I had no solution, I excluded the concerned part of the patch
> in order to have it merged.
> 
>     bialix> I'm quickly looking at gettext sources (from
>     bialix> http://gnuwin32.sf.net project) and see in comments:
> 
>     bialix> /* Determine the user's language preferences, as a colon separated list of
>     bialix>    locale names in XPG syntax
>     bialix>      language[_territory[.codeset]][@modifier]
>     bialix>    The result must not be freed; it is statically allocated.
>     bialix>    The LANGUAGE environment variable does not need to be considered; it is
>     bialix>    already taken into account by the caller.  */
> 
>     bialix> This is from the code related to MacOs support but I try to check
>     bialix> $LANGUAGE environment variable. And it really works, i.e. diff
>     bialix> response for this variable and switch from my language (russian)
>     bialix> to english:
> 
>     bialix> $ set LANGUAGE=C
>     bialix> $ python bzr --no-plugins selftest -v test_external_diff_binary
> 
>     bialix> test_diff.TestDiff.test_external_diff_binary_lang_c                                                  OK
>     bialix> 110ms
>     bialix> test_diff.TestDiffFiles.test_external_diff_binary                                                  FAIL
>     bialix> 430ms
>     bialix>     ['Files old and new differ\r\n', '\n'] != ['Files old and new differ\n', '\n']
> 
>     bialix> So, I think our current support of external diff need
>     bialix> to be adjusted for win32.
> 
>     bialix> Does my analysis correct?
> 
> I would say yes, but last time, the discussion went like this:
> 
>>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
>     jam> Andrew Bennetts wrote:
> 
> <snip/>
> 
>     >> 
>     >> 
>     >>> out, err = self.run_bzr('init', 'subdir2/nothere', retcode=3)
>     >>> self.assertEqual('', out)
>     >>> +        # Rely on bzr error message part only or we fail on
>     >>> +        # non-english systems
>     >>> self.assertContainsRe(err,
>     >>> -            r'^bzr: ERROR: .*'
>     >>> -            '\[Errno 2\] No such file or directory')
>     >>> -        
>     >>> +            r'^bzr: ERROR: No such file:.*')
>     >>> +
>     >> 
> 
>     >> Can we do the equivalent of setting LANG=C on windows to
>     >> avoid this problem?  run_bzr happens in-process, so
>     >> perhaps it should do "locale.setlocale(locale.LC_ALL,
>     >> 'C')" (with an appropriate cleanup to restore the locale
>     >> to the previous state)?
> 
>     jam> Well, I have been unsuccessful in getting LANG=C to mean
>     jam> anything on Windows. So I'm not sure how it determines
>     jam> the locale and how to generate internationalized error
>     jam> messages.
> 
>     jam> We have a good resource in Alexander, since his machine
>     jam> uses a non-english Windows install.
> 
>     jam> It does seem like you can call 'setlocale'. But I wonder
>     jam> if there is something that you could use when calling
>     jam> out to another process. (We would like to use it when
>     jam> spawning an external diff).
> 
> 
> I.e. in the general case, setting LANGUAGE is not good enough,
> but I, for one, knows no better way to reliably set an
> environment under windows with the correct locale settings.

OK. We have 2 different corner cases so I'll try to divide one from other.

1. External diff used by bzr and locale settings.
2. Locale settings for bzr itself at runtime.

You point me to discussion about 2. I miss this discussion,
probably because then I can't say anything useful to you.

My initial point in my original post is about 1: external diff
and locale settings on win32. But it leads to version of used
diff utility and version of gettext library that used by diff
utility.

May I say something about 1 (diff and locale) before
I say something about 2 (bzr and locale)? OK? OK.

(1) About external diff and locale.

Documentation for GNU gettext (info gettext) has section
"10.5 Being a `gettext' grok". In particular it says:

<blockquote>
     In the function `dcgettext' at every call the current setting of
     the highest priority environment variable is determined and used.
     Highest priority means here the following list with decreasing
     priority:

       1. `LANGUAGE'

       2. `LC_ALL'

       3. `LC_xxx', according to selected locale

       4. `LANG'
</blockquote>


IIUC, $LANGUAGE has highest priority to specify language settings
for GNU gettext.

GNU gettext (ported to another platforms) still use $LANGUAGE for detecting
language settings but also use platform-specific methods to determine
locale. Again, IIUC, unix/linux platform-specific is $LC_ALL and $LANG.
On Windows GNU gettext port does not try to look at this variables
and use windows-specific methods. But maybe it's not true in more
wide cases. Especially, gettext library in Python itself is different:
it require $LANG and don't use windows-specific methods to determine
language (I check this with simple test, and I was very surprised).

So my resume: when bzr try to use external diff it could assume
that it is GNU diff + GNU gettext, and $LANGUAGE should work.
So this code in diff.py:

def _spawn_external_diff(diffcmd, capture_errors=True):
    """Spawn the externall diff process, and return the child handle.

    :param diffcmd: The command list to spawn
    :param capture_errors: Capture stderr as well as setting LANG=C
        and LC_ALL=C. This lets us read and understand the output of diff,
        and respond to any errors.
    :return: A Popen object.
    """
    if capture_errors:
        if sys.platform == 'win32':
            # Win32 doesn't support preexec_fn, but that is
            # okay, because it doesn't support LANG and LC_ALL either.
            preexec_fn = None
        else:
            preexec_fn = _set_lang_C
        stderr = subprocess.PIPE
    else:
        preexec_fn = None
        stderr = None

    try:
        pipe = subprocess.Popen(diffcmd,
                                stdin=subprocess.PIPE,
                                stdout=subprocess.PIPE,
                                stderr=stderr,
                                preexec_fn=preexec_fn)
    except OSError, e:
        if e.errno == errno.ENOENT:
            raise errors.NoDiff(str(e))
        raise

    return pipe


Should be adjusted to set $LANGUAGE for diff to 'C'.
This could be done by pass corresponding argument named `env'
to subprocess.Popen() function. And this should be portable
solution instead of `preexec_fn' argument that not supported
on win32 at all.

So, we went to (2): locale and win32. Look at the code above.
You could see the comment:

            # Win32 doesn't support preexec_fn, but that is
            # okay, because it doesn't support LANG and LC_ALL either.

Last line was added by Wouter van Heyst. This is the answer
for comment of Andrew Bennetts to your patch.

Reading gettext sources and later MSDN I found this explanation
of locale settings on win32:

<blockquote>
The system assigns a locale to each thread. Initially, the system assigns the system default locale to the thread. This
default locale is set by the user when the system is installed or through the International applet of the Control Panel.
If a thread is run in a process belonging to a user, the system assigns the user default locale to the thread. An
application can override either default by using the the SetThreadLocale function to explicitly set the locale for a thread.

There are two predefined locale identifiers: LOCALE_SYSTEM_DEFAULT, which identifies the system default locale, and
LOCALE_USER_DEFAULT, which identifies the locale of the current user. An application can retrieve the current locale
identifiers by using the GetSystemDefaultLCID and GetUserDefaultLCID functions. Similarly, an application can retrieve
the current language identifiers by using the GetSystemDefaultLangID and GetUserDefaultLangID functions.
</blockquote>

There is Pywin32 library for Python that provide mentioned functions,
like GetThreadLocale, SetThreadLocale, GetSystemDefaultLCID
and GetUserDefaultLCID.

So theoretically we can use pair (GetThreadLocale, SetThreadLocale)
to save, change and restore locale settings. I can write
corresponding patch if you can test on your machine those
cases that rely on locale settings (on my russian machine
I yet can't see the problems with locales). If this will help
then we will need to add pywin32 to obligatory dependencies
of bzr on win32.


Alexander

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFjBZwzYr338mxwCURAkVJAKCBjYTwBj3fDVAdpp6v+vGI1nSGVwCeOvS/
aWuP5wrr2zKZzIANHU8mqcQ=
=f69h
-----END PGP SIGNATURE-----





More information about the bazaar mailing list