Python 3 port of ubiquity

Thu May 17 17:51:05 UTC 2012

I uploaded a Python 3 port of ubiquity today, which I've been dealing
with on and off for a while.  It was surprisingly easier than I
expected, although there were a few interesting roadblocks.  I mostly
embarked on this to get some practice in writing Python 3 code, with a
side order of thinking that it would be a good idea to attack the stack
of things to port from the top down rather than the bottom up, in case
any of the work became unnecessary as a result.

I'd done a Python 3 port before, namely germinate (assisted at the time
by http://python3porting.com/), so I had a rough idea of the sort of way
I find it useful to attack these problems.  In particular, my preference
is definitely to avoid using 2to3 except as a sort of first cut at a
to-do list.  A great deal of the problem can be demolished simply by
porting to a sufficiently modern Python 2 style first.

Thus, I started with a few old standbys: print function, 'except
Exception as value', module renamings, and so on.  To start with, I
found I could either guess randomly at common problems or run 2to3 in
its default report-only mode, find a single category of problem (for
instance, "map returns an iterator rather than a list"), grep for it
through the whole codebase, and think of a way to make all occurrences
valid in both Python 2 and 3.  After a while I got to the point where it
was worth adding --python2 and --python3 options to ubiquity's test
suite so that I could try both (the test runner re-execs itself, so I
couldn't just run it under different interpreters), and could continue
until the test suite passed.

A few specific notes on things I did in this stage:

 * The test suite used things like mock.patch("__builtin__.open").  I
   defined a helper like this:

     if sys.version >= '3':
         def builtin_patch(name):
             return mock.patch("builtins.%s" % name)
     else:
         def builtin_patch(name):
             return mock.patch("__builtin__.%s" % name)

 * gettext.install only takes unicode=1 (or unicode=True or whatever) in
   Python 2; that's unnecessary in Python 3.  The neatest approach I
   found was:

     kwargs = {}
     if sys.version < '3':
         kwargs['unicode'] = 1
     gettext.install(domain, LOCALEDIR, **kwargs)

 * If you're using python-apt, you *must* port entirely to the 0.8 API.
   python-apt tolerated this under Python 2, but the old API is compiled
   out under Python 3.  /usr/share/python-apt/migrate-0.8.py may be of
   some partial help.

 * It's probably not news to anyone that you have to get your binary vs.
   text data model solid when porting to Python 3.  There was one
   wrinkle I hadn't thought of, though.  ubiquity uses subprocesses
   quite a bit, and they return binary data by default.  Initially I
   tried .decode(), but after a while I realised that you can pass
   universal_newlines=True to subprocess.Popen (etc.) to get text output
   directly; this is much neater, works under both Python 2 and 3, even
   improves non-Unix support under Python 2 if you care ;-), and I
   recommend this approach.  There were a couple of exceptions in
   ubiquity, either where requiring straight-up binary data or where
   dealing with text that's potentially in mixed encodings indicated by
   field names.

 * Three-arg raise is particularly awful for compatibility, because the
   Python 2 form is an uncatchable SyntaxError in Python 3; you have to
   use exec() to work around this.  ubiquity only had one instance of
   this, so I used six.reraise().

 * pyflakes got upset with functions/methods defined two ways depending
   on sys.version, so I had to add some exclusions.

 * python-libxml2 hasn't been ported.  If you're currently using this,
   consider whether you can just use something in the standard library
   instead, rather than the larger python(3)-lxml.  I switched ubiquity
   over to xml.etree.cElementTree, and aside from the expected
   footprint-related virtues of using the stdlib, it actually ran faster
   in our case.

 * I had to port PyICU to get the test suite working.  This had been
   done upstream, but required packaging.

 * Be careful with what 2to3 says about list/iterator/view-related
   changes on dictionary methods and similar.  Its conservative approach
   is to add list() more or less everywhere, but if you're actually
   using .items() etc. only as an iterator, you can leave it as-is.
   Watch out for cases where you modify the dictionary while iterating
   over it, though; in that case you'll indeed need list().

 * I know others (including Barry) advocate 'from __future__ import
   unicode_literals'.  I found this a good approach in tests, but it
   makes me nervous in library code since it could easily end up
   changing your API.  I'd say only attempt this if you have
   exceptionally good test coverage.

Now, ubiquity's test suite isn't everything it might be, although I was
pleased to find that it got me most of the way.  However, I still had to
attack some frontend/backend glue code, and the KDE frontend is
currently untested (any volunteers?).  PyKDE4 had been ported upstream,
but required packaging; thanks to Philip Muškovac for reviewing and
uploading my patch there.  There were then a few other things to fix,
including calling sip.setapi("QVariant", 1) until I figure out what's
going on with the new QVariant API, and joyously discovering that most
bits of PyQt4 finally return ordinary Unicode strings in Python 3 rather
than messing about with QString objects.

But, finally, it all works at least in my tests.  I expect there'll be a
bit of shakedown time, but once things have settled I anticipate the
main benefit being that we stop having failures only in languages that
use non-ASCII characters, which has been a headache for us in the past.
I also expect to be able to drop the compatibility code once everything
is working and it's clear that we're past the point of no return in
using only Python 3 on the desktop image.

I definitely felt a tipping point here: once I'd ported a couple of
packages, my approach to subsequent ones has been to go through all the
changes I made for previous ports and duplicate each of them, which
really speeds things up.  Plus, of course, each library helps another
batch of packages.  Now that both GTK (via PyGI) and PyKDE are usable,
it should be possible to attack quite a few multiple-frontend programs
in Ubuntu; so please do!

-- 
Colin Watson                                       [cjwatson at ubuntu.com]