Update on package import failures involving non-ascii filenames

Martin Packman martin.packman at canonical.com
Wed Oct 26 08:56:58 UTC 2011

Yesterday we deployed a fix for <http://pad.lv/508258> to how the
package importer handles non-ascii filenames. Some packages have now
been successfully imported, and better feedback, including the problem
filename, is now given where there are still errors.

Jonathan Riddell correctly noted in the bug that some packages have
filenames that aren't UTF-8, but the issue was also preventing some
that did have names the importer could decode from succeeding, such


Several of the remaining failures look similarly to be from test
suites, given the full filenames now listed in the error message. It's
encouraging that programs care about this and want to test they handle
non-ascii filenames correctly, though generating them at test time
would be a better approach. :)

The overall results are:
* Before there were 69 package with failures across four similar
UnicodeDecodeError signatures.
* Now there are 36 across three BadFilenameEncoding signatures, plus 3
new failures with InvalidNormalization, and 1 other issue exposed.

Filenames in encodings other than UTF-8:

Many of these are filenames in legacy single byte encodings, some are
double byte, and some are clearly junk. A complete fix needs
<http://pad.lv/63324> in bzr core resolving, but hacking some fallback
into the importer would be possible if it was deemed worthwhile.

Packages with filenames that are not in unicode normal form 'NFC':

This is a symptom of the incomplete unicode normalisation code in bzr,
which OSX <http://pad.lv/172383> also runs into

Funky symlink issue:

Probably a bug in the bzr-builddeb (copied from bzrtools) import_dir function.

There were a few wrinkles in getting this deployed. Some existing
fallout from recent lp:udd changes needed tackling first, and then
using the latest lp:bzr-builddeb broke a few things that needed
interface updates on the udd side. Most excitingly the first
normalisation failure caused a loop that lead to infinitely repeating
tracebacks. Fortunately we were monitoring the process at the time so
could kill it and fix the problem.


More information about the ubuntu-distributed-devel mailing list