[Bug 1029640] Re: Bad characters in Python logger output when using rsyslog

Scott Kitterman ubuntu at kitterman.com
Sat Jul 28 06:23:39 UTC 2012


** Description changed:

+ [IMPACT]
+ 
+ Any UTF-8 messages that are sent to syslog by a Python application are
+ corrupted.
+ 
+ [TESTCASE]
+ 
+ Run the code in comment #9.  You can either do this by running the
+ python interpreter and pasting the code into the python shell or
+ creating a file with the code and running it as python foo where foo is
+ the name of the file.
+ 
+ Then check /var/log/syslog for the mesage "AUDIT: TEST LOGER FROM
+ PYTHON".  There will be a few characters of garbage or odd looking
+ numbers before the word AUDIT.  If you see that, you've recreated the
+ problem.
+ 
+ Install the updated packages from -proposed and re-run the python code
+ from comment #9.  Now there should be now garbage or unusual characters.
+ Something like:
+ 
+ root: AUDIT: TEST LOGER FROM PYTHON
+ 
+ If you get that, the fix is verified.
+ 
+ [Regression Potential]
+ 
+ Nil.  Patch is backported from upstream and is easily visually verified
+ as correct.
+ 
+ [Other Info]
+ 
+ I ran this by Barry Warsaw and he agreed it would be important to get
+ into 12.04.1.
+ 
+ 
+ Original Bug:
+ 
  Ubuntu 12.0.4 LTS 64bit
  python2.7-minimal                  2.7.3-0ubuntu3
  rsyslog                            5.8.6-1ubuntu8
  
  Python converts all syslog messages to UTF8 before sending to syslog. It
  also prepends  the Byte Order Mark (BOM) of the Unicode Standard.  This
  prepended BOM causes bad characters when using rsyslog (have not
  verified with std syslog or syslog-ng).
  
  Example log line:
  
  Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi
  [req-48a555a5-6d2a-4a38-8384-3b4684357e72
  19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8]
  http://MASKED:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-
  hosts/MASKED.json?ignore_awful_caching1343248563 returned with HTTP 200
  
  Note the ' ' before the date field.
  
  Interesting find on issues from another site:
  
  "Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard.
  Specifically it is the hex bytes EF BB BF, which form the UTF-8
  representation of the BOM, misinterpreted as ISO 8859/1 text instead of
  UTF-8.
  
  Probably what it means is that you are using a text editor that is
  saving files in UTF-8 with the BOM, when it should be saving without the
  BOM. It could be PHP files that have the BOM, in which case they'd
  appear as literal text on your page. Or it could be translated text you
  pasted into Joomla! edit windows.
  
  The Unicode Consortium's FAQ on the Byte Order Mark is at
  http://www.unicode.org/faq/utf_bom.html#BOM ."
  
  Note that if I edit the file:  /usr/lib/python2.7/logging/handlers.py as shown in this patch, the bad characters go away:
  ----------------------------------------------------------
  @@ -797,9 +797,10 @@
                                               self.mapPriority(record.levelname))
           # Message is a string. Convert to bytes as required by RFC 5424
           if type(msg) is unicode:
  + # Morph
              msg = msg.encode('utf-8')
  - if codecs:
  - msg = codecs.BOM_UTF8 + msg
  + #if codecs:
  + # msg = codecs.BOM_UTF8 + msg
           msg = prio + msg
           try:
               if self.unixsocket:
  
  ----------------------------------------------------
  
  Perhaps something is wrong with the 'codecs' condition??

** Changed in: python2.7 (Ubuntu Precise)
       Status: Triaged => In Progress

** Changed in: python2.7 (Ubuntu Quantal)
     Assignee: (unassigned) => Scott Kitterman (kitterman)

** Changed in: python2.7 (Ubuntu Precise)
     Assignee: (unassigned) => Scott Kitterman (kitterman)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to python2.7 in Ubuntu.
https://bugs.launchpad.net/bugs/1029640

Title:
  Bad characters in Python logger output when using rsyslog

Status in “python2.7” package in Ubuntu:
  Fix Released
Status in “python2.7” source package in Precise:
  In Progress
Status in “python2.7” source package in Quantal:
  Fix Released

Bug description:
  [IMPACT]

  Any UTF-8 messages that are sent to syslog by a Python application are
  corrupted.

  [TESTCASE]

  Run the code in comment #9.  You can either do this by running the
  python interpreter and pasting the code into the python shell or
  creating a file with the code and running it as python foo where foo
  is the name of the file.

  Then check /var/log/syslog for the mesage "AUDIT: TEST LOGER FROM
  PYTHON".  There will be a few characters of garbage or odd looking
  numbers before the word AUDIT.  If you see that, you've recreated the
  problem.

  Install the updated packages from -proposed and re-run the python code
  from comment #9.  Now there should be now garbage or unusual
  characters.  Something like:

  root: AUDIT: TEST LOGER FROM PYTHON

  If you get that, the fix is verified.

  [Regression Potential]

  Nil.  Patch is backported from upstream and is easily visually
  verified as correct.

  [Other Info]

  I ran this by Barry Warsaw and he agreed it would be important to get
  into 12.04.1.

  
  Original Bug:

  Ubuntu 12.0.4 LTS 64bit
  python2.7-minimal                  2.7.3-0ubuntu3
  rsyslog                            5.8.6-1ubuntu8

  Python converts all syslog messages to UTF8 before sending to syslog.
  It also prepends  the Byte Order Mark (BOM) of the Unicode Standard.
  This prepended BOM causes bad characters when using rsyslog (have not
  verified with std syslog or syslog-ng).

  Example log line:

  Jul 25 13:36:03 mc 2012-07-25 13:36:03 INFO nova.api.openstack.wsgi
  [req-48a555a5-6d2a-4a38-8384-3b4684357e72
  19f932a5b0b34655989f4cb761522bb3 2617e657fdf84569a6be7977318e46c8]
  http://MASKED:8774/v1.1/2617e657fdf84569a6be7977318e46c8/os-
  hosts/MASKED.json?ignore_awful_caching1343248563 returned with HTTP
  200

  Note the ' ' before the date field.

  Interesting find on issues from another site:

  "Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard.
  Specifically it is the hex bytes EF BB BF, which form the UTF-8
  representation of the BOM, misinterpreted as ISO 8859/1 text instead
  of UTF-8.

  Probably what it means is that you are using a text editor that is
  saving files in UTF-8 with the BOM, when it should be saving without
  the BOM. It could be PHP files that have the BOM, in which case they'd
  appear as literal text on your page. Or it could be translated text
  you pasted into Joomla! edit windows.

  The Unicode Consortium's FAQ on the Byte Order Mark is at
  http://www.unicode.org/faq/utf_bom.html#BOM ."

  Note that if I edit the file:  /usr/lib/python2.7/logging/handlers.py as shown in this patch, the bad characters go away:
  ----------------------------------------------------------
  @@ -797,9 +797,10 @@
                                               self.mapPriority(record.levelname))
           # Message is a string. Convert to bytes as required by RFC 5424
           if type(msg) is unicode:
  + # Morph
              msg = msg.encode('utf-8')
  - if codecs:
  - msg = codecs.BOM_UTF8 + msg
  + #if codecs:
  + # msg = codecs.BOM_UTF8 + msg
           msg = prio + msg
           try:
               if self.unixsocket:

  ----------------------------------------------------

  Perhaps something is wrong with the 'codecs' condition??

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1029640/+subscriptions




More information about the foundations-bugs mailing list