[merge][0.11] bugfix 56815: export unicode files to tar+zip
Alexander Belchenko
bialix at ukr.net
Sat Sep 23 10:08:15 BST 2006
John Arbash Meinel пишет:
> Alexander Belchenko wrote:
>> John Arbash Meinel пишет:
>>> There are two patches attached. The first fixes 'bzr export' for both
>>> tar and zipfiles so that if the tree contains unicode filenames, they
>>> will be exported as utf-8 paths.
>>>
>>> This fixes bug:
>>> https://launchpad.net/products/bzr/+bug/56815
>>>
>>> We had done some discussion, and it is possible that you would want to
>>> use a different path depending on the platform. But utf-8 is probably
>>> the most consistent thing. And it only might fail on windows. But since
>>> right now we fail on all platforms, it is still better to do this.
>> -0.5 on having utf-8 encoding for filenames in zip archives on Windows.
>> It will be absolutely unusable to use with most windows zip archiver.
>
> Well, for right now, not having any encoding is 100% unusable by anyone.
It's not true. It's unusable only if tree contains non-ascii filenames.
>> I see, you already submit this work, but it's not working for windows.
>> So my comments is late. But I personally prefer to don't have at all
>> unicode support for zip archives.
>>
>> Zip archives should use OEM encoding for files because zip born in DOS
>> times. I could write simple decoder but it will works only on Windows.
>>
> So I understand your point, and I think we would accept patches, but for
> now, some functionality is better than 0.
May be yes, may be not. It's just question of time when new bug report
from windows users appears that bzr create zip file that cannot be
opened on windows.
> I just did a test, trying to have Windows create a zipfile, and it just
> complained that 'Cannot create zipfile because it contains a name that
> zipfiles cannot contain'. At least that is true for "جوجو.txt".
>
> So windows itself doesn't know how to handle these.
>
> Now, I also tried "bågfors.txt", and there both 7zip and Windows create
> a zip file with 'b\x86gfors.txt'. This seems to be 'cp437' encoding. And
> I haven't figured out why it would use that code page.
>
> None of locale.*encoding values are for cp437, the only things I can
> find are cp1252, and mbcs for the filesystem. Maybe cp437 is MS-DOS
> codepage?
> http://en.wikipedia.org/wiki/CP437
>
> So we could chose cp437 if we prefer.
No.
For each ANSI codepage (that used by Windows) there is corresponding OEM
codepage (back to DOS days). So, if your ANSI codepage is cp1252 then
your OEM codepage is cp437. But for russian users it's different: my
ANSI is cp1251 and OEM is cp866.
On Windows there is API function with name CharToOem that can translate
unicode or ANSI-encoded strings to OEM equivalent. I think using this
function is only one right way for windows.
See:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/resources/strings/stringreference/stringfunctions/chartooem.asp
My proposal is to use CharToOem to decode unicode filenames to OEM ones.
But it will works only on Windows. So bzr could have different zip
exporter for Linux and Windows platform.
And how I can write test for it?
> However, I can say that Linux 'zip' creates the files as utf8 (I assume
> it just uses whatever the filesystem path is).
It's means for me that zip files from Linux never be usable on Windows.
>
> And we *can* create utf8 filenames in the zipfiles, and windows will
> survive. (It just will treat them as the wrong paths when it extracts them).
>
> But Linux will create the wrong filenames if we use cp437. So for now, I
> think we should discuss usability, etc. And perhaps we will end up with
> a flag to 'bzr export' that defines what encoding the final paths should
> be in.
>
> I think the patch as it stands still has a lot of utility. And we can
> expand it as necessary for people. (We can end up with cp437, but as far
> as I can tell, .zip doesn't declare its encoding, so no matter what we
> do, the zip files aren't going to be portable).
>
> John
> =:->
>
--
Alexander
More information about the bazaar
mailing list