[Bug 319994] [NEW] nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters

Launchpad Bug Tracker 319994 at bugs.launchpad.net
Thu Jan 22 18:17:16 GMT 2009


You have been subscribed to a public bug by Joseph Smidt (jsmidt):

Binary package hint: nrss

nrss 0.3.9-1 gets an error parsing a feed encoded in ISO 8859-1 that
contains international symbols. Sometimes only the first item get
displayed if it contains no accented characters.

I've tried this feed:
http://rss.golem.de/rss.php?feed=RSS2.0

But Atom feed from the same site works flawlessly (it is encoded in UTF-8)
http://rss.golem.de/rss.php?feed=ATOM1.0

The problem seems to be that XML_ParserCreate is called in parse.c with encoding set to "UTF-8". When called without explicitly set encoding, Expat honors the document encoding declaration.
I've tested it with XML_ParserCreate(NULL) and that works. Patch attached.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: nrss 0.3.9-1
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nrss
Uname: Linux 2.6.28-4-generic i586

** Affects: nrss (Ubuntu)
     Importance: Undecided
         Status: In Progress


** Tags: apport-bug bitesize feed rss xml
-- 
nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters
https://bugs.launchpad.net/bugs/319994
You received this bug notification because you are a member of Ubuntu Sponsors for universe, which is a direct subscriber.



More information about the Ubuntu-universe-sponsors mailing list