[BUG] bzr changeset generation fails with non-ascii characters
Robey Pointer
robey at lag.net
Sat Jul 16 06:42:13 BST 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15 Jul 2005, at 13:50, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi all,
>
> My python installation thinks 'ascii' is a good character encoding,
> and
> who am I to argue? This means that William Dodé is my constant
> nemesis,
> because wherever his distinctly non-ascii name appears, trouble is
> sure
> to follow.
>
> In this case, I get an error with bzr changeset (full traceback
> below).
> Essentially, it says that bzrlib.diff.internal_diff can't convert
> 0xc3 (acute e) to ASCII. That sounds fair enough, but what may not be
> obvious here is that it shouldn't need to. iternal_diff should be
> operating in a binary/8-bit fashion on all sequence data-- otherwise,
> you can get lossy character conversions, or errors because a certain
> Unicode codepoints are undefined. Bzr isn't interested in these files
> as text; it's their byte streams that matter.
>
> So we need to figure out what is provoking unicode handling of this
> data, and get it to use and 8-bit, encoding-ignorant approach instead.
The problem of course is line 103 of changeset/__init__.py:
outf = codecs.getwriter(user_encoding)(sys.stdout,
errors='replace')
FWIW, I agree that the cset should be treated as being in no encoding
(using whatever encoding is used for each file), and that means being
8-bit clean with no codec. In my python (2.3.5) it appears that you
can write 8-bit clean data to sys.stdout even in US-ASCII mode
(LC_ALL=C) so I bet just removing that codec line above will fix it.
robey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)
iD8DBQFC2J45QQDkKvyJ6cMRAkFOAKDErZY9x1mhOz7c6Q8GbZSJyWPhCgCggiP7
SaTIpeqwvtvUG5FAT5KKIgc=
=eNot
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list