[merge][0.11] bugfix 56815: export unicode files to tar+zip
John Arbash Meinel
john at arbash-meinel.com
Fri Sep 22 21:19:57 BST 2006
Alexander Belchenko wrote:
> John Arbash Meinel пишет:
>> There are two patches attached. The first fixes 'bzr export' for both
>> tar and zipfiles so that if the tree contains unicode filenames, they
>> will be exported as utf-8 paths.
>>
>> This fixes bug:
>> https://launchpad.net/products/bzr/+bug/56815
>>
>> We had done some discussion, and it is possible that you would want to
>> use a different path depending on the platform. But utf-8 is probably
>> the most consistent thing. And it only might fail on windows. But since
>> right now we fail on all platforms, it is still better to do this.
>
> -0.5 on having utf-8 encoding for filenames in zip archives on Windows.
> It will be absolutely unusable to use with most windows zip archiver.
Well, for right now, not having any encoding is 100% unusable by anyone.
>
> I see, you already submit this work, but it's not working for windows.
> So my comments is late. But I personally prefer to don't have at all
> unicode support for zip archives.
>
> Zip archives should use OEM encoding for files because zip born in DOS
> times. I could write simple decoder but it will works only on Windows.
>
> --
> Alexander
So I understand your point, and I think we would accept patches, but for
now, some functionality is better than 0.
I just did a test, trying to have Windows create a zipfile, and it just
complained that 'Cannot create zipfile because it contains a name that
zipfiles cannot contain'. At least that is true for "جوجو.txt".
So windows itself doesn't know how to handle these.
Now, I also tried "bågfors.txt", and there both 7zip and Windows create
a zip file with 'b\x86gfors.txt'. This seems to be 'cp437' encoding. And
I haven't figured out why it would use that code page.
None of locale.*encoding values are for cp437, the only things I can
find are cp1252, and mbcs for the filesystem. Maybe cp437 is MS-DOS
codepage?
http://en.wikipedia.org/wiki/CP437
So we could chose cp437 if we prefer.
However, I can say that Linux 'zip' creates the files as utf8 (I assume
it just uses whatever the filesystem path is).
And we *can* create utf8 filenames in the zipfiles, and windows will
survive. (It just will treat them as the wrong paths when it extracts them).
But Linux will create the wrong filenames if we use cp437. So for now, I
think we should discuss usability, etc. And perhaps we will end up with
a flag to 'bzr export' that defines what encoding the final paths should
be in.
I think the patch as it stands still has a lot of utility. And we can
expand it as necessary for people. (We can end up with cp437, but as far
as I can tell, .zip doesn't declare its encoding, so no matter what we
do, the zip files aren't going to be portable).
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060922/0a1f6cae/attachment.pgp
More information about the bazaar
mailing list