Python 3 port of ubiquity
Colin Watson
cjwatson at ubuntu.com
Thu May 17 17:51:05 UTC 2012
I uploaded a Python 3 port of ubiquity today, which I've been dealing
with on and off for a while. It was surprisingly easier than I
expected, although there were a few interesting roadblocks. I mostly
embarked on this to get some practice in writing Python 3 code, with a
side order of thinking that it would be a good idea to attack the stack
of things to port from the top down rather than the bottom up, in case
any of the work became unnecessary as a result.
I'd done a Python 3 port before, namely germinate (assisted at the time
by http://python3porting.com/), so I had a rough idea of the sort of way
I find it useful to attack these problems. In particular, my preference
is definitely to avoid using 2to3 except as a sort of first cut at a
to-do list. A great deal of the problem can be demolished simply by
porting to a sufficiently modern Python 2 style first.
Thus, I started with a few old standbys: print function, 'except
Exception as value', module renamings, and so on. To start with, I
found I could either guess randomly at common problems or run 2to3 in
its default report-only mode, find a single category of problem (for
instance, "map returns an iterator rather than a list"), grep for it
through the whole codebase, and think of a way to make all occurrences
valid in both Python 2 and 3. After a while I got to the point where it
was worth adding --python2 and --python3 options to ubiquity's test
suite so that I could try both (the test runner re-execs itself, so I
couldn't just run it under different interpreters), and could continue
until the test suite passed.
A few specific notes on things I did in this stage:
* The test suite used things like mock.patch("__builtin__.open"). I
defined a helper like this:
if sys.version >= '3':
def builtin_patch(name):
return mock.patch("builtins.%s" % name)
else:
def builtin_patch(name):
return mock.patch("__builtin__.%s" % name)
* gettext.install only takes unicode=1 (or unicode=True or whatever) in
Python 2; that's unnecessary in Python 3. The neatest approach I
found was:
kwargs = {}
if sys.version < '3':
kwargs['unicode'] = 1
gettext.install(domain, LOCALEDIR, **kwargs)
* If you're using python-apt, you *must* port entirely to the 0.8 API.
python-apt tolerated this under Python 2, but the old API is compiled
out under Python 3. /usr/share/python-apt/migrate-0.8.py may be of
some partial help.
* It's probably not news to anyone that you have to get your binary vs.
text data model solid when porting to Python 3. There was one
wrinkle I hadn't thought of, though. ubiquity uses subprocesses
quite a bit, and they return binary data by default. Initially I
tried .decode(), but after a while I realised that you can pass
universal_newlines=True to subprocess.Popen (etc.) to get text output
directly; this is much neater, works under both Python 2 and 3, even
improves non-Unix support under Python 2 if you care ;-), and I
recommend this approach. There were a couple of exceptions in
ubiquity, either where requiring straight-up binary data or where
dealing with text that's potentially in mixed encodings indicated by
field names.
* Three-arg raise is particularly awful for compatibility, because the
Python 2 form is an uncatchable SyntaxError in Python 3; you have to
use exec() to work around this. ubiquity only had one instance of
this, so I used six.reraise().
* pyflakes got upset with functions/methods defined two ways depending
on sys.version, so I had to add some exclusions.
* python-libxml2 hasn't been ported. If you're currently using this,
consider whether you can just use something in the standard library
instead, rather than the larger python(3)-lxml. I switched ubiquity
over to xml.etree.cElementTree, and aside from the expected
footprint-related virtues of using the stdlib, it actually ran faster
in our case.
* I had to port PyICU to get the test suite working. This had been
done upstream, but required packaging.
* Be careful with what 2to3 says about list/iterator/view-related
changes on dictionary methods and similar. Its conservative approach
is to add list() more or less everywhere, but if you're actually
using .items() etc. only as an iterator, you can leave it as-is.
Watch out for cases where you modify the dictionary while iterating
over it, though; in that case you'll indeed need list().
* I know others (including Barry) advocate 'from __future__ import
unicode_literals'. I found this a good approach in tests, but it
makes me nervous in library code since it could easily end up
changing your API. I'd say only attempt this if you have
exceptionally good test coverage.
Now, ubiquity's test suite isn't everything it might be, although I was
pleased to find that it got me most of the way. However, I still had to
attack some frontend/backend glue code, and the KDE frontend is
currently untested (any volunteers?). PyKDE4 had been ported upstream,
but required packaging; thanks to Philip Muškovac for reviewing and
uploading my patch there. There were then a few other things to fix,
including calling sip.setapi("QVariant", 1) until I figure out what's
going on with the new QVariant API, and joyously discovering that most
bits of PyQt4 finally return ordinary Unicode strings in Python 3 rather
than messing about with QString objects.
But, finally, it all works at least in my tests. I expect there'll be a
bit of shakedown time, but once things have settled I anticipate the
main benefit being that we stop having failures only in languages that
use non-ASCII characters, which has been a headache for us in the past.
I also expect to be able to drop the compatibility code once everything
is working and it's clear that we're past the point of no return in
using only Python 3 on the desktop image.
I definitely felt a tipping point here: once I'd ported a couple of
packages, my approach to subsequent ones has been to go through all the
changes I made for previous ports and duplicate each of them, which
really speeds things up. Plus, of course, each library helps another
batch of packages. Now that both GTK (via PyGI) and PyKDE are usable,
it should be possible to attack quite a few multiple-frontend programs
in Ubuntu; so please do!
--
Colin Watson [cjwatson at ubuntu.com]
More information about the ubuntu-devel
mailing list