[Bug 580961] Re: unzip fails to deal correctly with filename encodings

Alkis Georgopoulos 580961 at bugs.launchpad.net
Thu Nov 7 21:56:06 UTC 2013


Thanks, that partially solves the problem (Precise, 6.0-4ubuntu2).

The question marks are gone, so LP: #1199239 can be marked 'Fix released'.
This (LP: #580961) bug is NOT fully addressed though.
The problem that remains is that only a few codepages are supported.

For example, unzipping a file that contains Greek (cp737) filenames doesn't work out of the box:
$ unzip -l biografiko.zip
biografiko/ОЫЮЪхЬк ймгзвуирйЮк ШхлЮйЮк_Щ.doc (wrong codepage)

On the other hand, if someone manually specifies the codepage, it does work:
$ unzip -O cp737 -l biografiko.zip
biografiko/Οδηγίες συμπλήρωσης αίτησης_β.doc (right codepage)

I'm guessing it would work if we added this line:
+    { "CP1253", "CP737" },
...in the mapping table defined in debian/patches/06-unzip60-alt-iconv-utf8:

+/* A mapping of local <-> archive charsets used by default to convert filenames
+ * of DOS/Windows Zip archives. Currently very basic. */
+static CHARSET_MAP dos_charset_map[] = {
+    { "ANSI_X3.4-1968", "CP850" },
+    { "ISO-8859-1", "CP850" },
+    { "CP1252", "CP850" },
+    { "UTF-8", "CP866" },
+    { "KOI8-R", "CP866" },
+    { "KOI8-U", "CP866" },
+    { "ISO-8859-5", "CP866" }
+};


I'm not changing the verification-needed flag because it's only partially fixed,
which might mean "verification-failed, affected people do list your codepages here so that we add them to the mapping table before we upload this to -updates",
or it might mean "verification-done and open a new bug report for adding more codepages",
your call. Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to unzip in Ubuntu.
https://bugs.launchpad.net/bugs/580961

Title:
  unzip fails to deal correctly with filename encodings

Status in File Roller:
  Confirmed
Status in The Linux Mint Distribution:
  Triaged
Status in Ubuntu Japanese Kaizen Project:
  Fix Committed
Status in unzip - free software .zip unarchiver:
  Unknown
Status in “unzip” package in Ubuntu:
  Triaged
Status in “unzip” source package in Precise:
  Fix Committed
Status in “unzip” source package in Quantal:
  Fix Committed
Status in “unzip” source package in Raring:
  Fix Committed
Status in “unzip” package in Debian:
  Confirmed
Status in Gentoo Linux:
  Won't Fix
Status in “unzip” package in Mandriva:
  Unknown
Status in “unzip” package in openSUSE:
  Fix Released

Bug description:
  Binary package hint: unzip

  This is a fairly annoying bug that's been around and known at least
  since 2005.  It's very visible as it will very often make exchange of
  zip files with Windows users impossible, for example.  As such, it
  gathered it's fair share of "me too" and "how dare you haven't fixed
  this yet!!111!" comments.

  Problem description:
  zip/unzip and the specification fall short when dealing with non-ASCII filenames not encoded in UTF-8

  test case:
  do an "unzip -l" on the file http://tinyurl.com/2aofpxs and witness the question marks

  affected programs:
  the problem is in unzip itself, but affects GUI like xarchiver, file-roller, etc. that rely on unzip for the decompression

  suggested solutions (most are workarounds, not proper fixes):
   a) reintroduce patch for codepage-based zip filenames: bug 477755, http://tinyurl.com/2aqdbqg (Ubuntu blueprint)
   b) unzip filename according to locale: bug 203609
   c) Ubuntu JP has a patch, probably not generally applicable, bug 269482
   d) Russian altlinux distro uses natspec lib and patched zip binary

  natspec was mentioned in bug 477755 comment #2 and may indeed be a
  proper fix, needs closer inspection (I haven't really looked, yet.  As
  discussed in https://bugzilla.gnome.org/show_bug.cgi?id=306403 there
  is no failsafe, straight-forward way to fix this in all cases.
  Nonetheless, the current situation can and should be improved.
  There's some good ideas floating around.  It needs somebody to pull
  and wrap them together.

  It's unfortunate the FOSS community so far hasn't been able to fix
  this rather visible problem.  I'm opening this ticket as a master bug
  and clean slate to document the issue and current status.  Please
  don't ruin it by making above-mentioned unhelpful comments, they
  actually slow things down!  Please don't nominate for a release.

  Unless you're a dev and can provide a patch, you should think VERY
  carefully to do anything but

  1) subscribe yourself to this ticket
  2) mark this bug as affecting you
  3) tell me via mail about other bugs you think are a duplicate of this one, discussing the same problem

  1) to 3) will showcase to the devs how many people are affected and
  that is the only real chance we have for somebody to take a serious
  look.  "Me too" comments do the opposite, so again, please don't do
  it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/file-roller/+bug/580961/+subscriptions



More information about the foundations-bugs mailing list