[Bug 1473533] Re: CountryNameDict function trying to parse UTF-8 iso3166.tab as US-ASCII

Fri Oct 2 02:27:38 UTC 2015

** Description changed:

+ [Impact]
+ 
+  * The latest tzdata update change the content type for iso3166.tab
+    and zone.tab to UTF-8 which causes an exception in pytz
+ 
+ [Test Case]
+ 
+ $ apt-get install -y python-tz python3-tz
+ 
+ # These should produce exceptions for country_names.  This test
+ # case also includes a test for country_timezones because the
+ # tzdata file for zones has changed to use utf-8 but has yet
+ # to include utf-8 characters.  You can hand edit 
+ # /usr/share/zoneinfo/zone.tab to include a UTF-8 character
+ # to force the exception and then test the proposed package.
+ 
+ $ python -c 'import pytz
+ for item in pytz.country_names.items():
+   pass'
+ 
+ $ python -c 'import pytz
+ for item in pytz.country_timezones.items():
+   pass'
+ 
+ $ python3 -c 'from pytz import country_timezones
+ for item in country_timezones.items():
+   pass'
+ 
+ $ python3 -c 'from pytz import country_names
+ for item in country_names.items():
+   pass'
+ 
+ # A recreate will raise an exception like this:
+ Traceback (most recent call last):
+   File "<string>", line 2, in <module>
+   File "/usr/lib/python3.4/_collections_abc.py", line 497, in __iter__
+     for key in self._mapping:
+   File "/usr/lib/python3/dist-packages/pytz/lazy.py", line 41, in __iter__
+     self._fill()
+   File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 350, in _fill
+     line = line.decode('US-ASCII')
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
+ 
+ [Regression Potential]
+ 
+  * Older tzdata releases should be okay (and this has been tested)
+ 
+ [Other Info]
+  
+  * None
+ 
+ ---- Original Description ----
+ 
  Since tzdata-2015e there are UTF-8 characters in iso3166.tab, see:
  http://mm.icann.org/pipermail/tz/2015-May/022258.html
  http://mm.icann.org/pipermail/tz/2015-June/022306.html

  pytz/__init__.py:_CountryNameDict(LazyDict) is using:
-         zone_tab = open_resource('iso3166.tab')
-         try:
-             for line in zone_tab.readlines():
-                 line = line.decode('US-ASCII')
+         zone_tab = open_resource('iso3166.tab')
+         try:
+             for line in zone_tab.readlines():
+                 line = line.decode('US-ASCII')

  to read it and fails on AX, CI, RE lines, using UTF-8 fixes the issues
  and should work OK even with older tzdata releases.

** Patch added: "precise debdiff"
   https://bugs.launchpad.net/pytz/+bug/1473533/+attachment/4481458/+files/precise.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to python-tz in Ubuntu.
https://bugs.launchpad.net/bugs/1473533

Title:
  CountryNameDict function trying to parse UTF-8 iso3166.tab as US-ASCII

Status in pytz:
  Fix Released
Status in python-tz package in Ubuntu:
  Fix Released
Status in python-tz source package in Precise:
  In Progress
Status in python-tz source package in Trusty:
  In Progress
Status in python-tz source package in Vivid:
  In Progress
Status in python-tz package in Debian:
  New

Bug description:
  [Impact]

   * The latest tzdata update change the content type for iso3166.tab
     and zone.tab to UTF-8 which causes an exception in pytz

  [Test Case]

  $ apt-get install -y python-tz python3-tz

  # These should produce exceptions for country_names.  This test
  # case also includes a test for country_timezones because the
  # tzdata file for zones has changed to use utf-8 but has yet
  # to include utf-8 characters.  You can hand edit 
  # /usr/share/zoneinfo/zone.tab to include a UTF-8 character
  # to force the exception and then test the proposed package.

  $ python -c 'import pytz
  for item in pytz.country_names.items():
    pass'

  $ python -c 'import pytz
  for item in pytz.country_timezones.items():
    pass'

  $ python3 -c 'from pytz import country_timezones
  for item in country_timezones.items():
    pass'

  $ python3 -c 'from pytz import country_names
  for item in country_names.items():
    pass'

  # A recreate will raise an exception like this:
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "/usr/lib/python3.4/_collections_abc.py", line 497, in __iter__
      for key in self._mapping:
    File "/usr/lib/python3/dist-packages/pytz/lazy.py", line 41, in __iter__
      self._fill()
    File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 350, in _fill
      line = line.decode('US-ASCII')
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

  [Regression Potential]

   * Older tzdata releases should be okay (and this has been tested)

  [Other Info]

   * None

  ---- Original Description ----

  Since tzdata-2015e there are UTF-8 characters in iso3166.tab, see:
  http://mm.icann.org/pipermail/tz/2015-May/022258.html
  http://mm.icann.org/pipermail/tz/2015-June/022306.html

  pytz/__init__.py:_CountryNameDict(LazyDict) is using:
          zone_tab = open_resource('iso3166.tab')
          try:
              for line in zone_tab.readlines():
                  line = line.decode('US-ASCII')

  to read it and fails on AX, CI, RE lines, using UTF-8 fixes the issues
  and should work OK even with older tzdata releases.

To manage notifications about this bug go to:
https://bugs.launchpad.net/pytz/+bug/1473533/+subscriptions