[Bug 319994] [NEW] nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters
Launchpad Bug Tracker
319994 at bugs.launchpad.net
Thu Jan 22 18:17:16 GMT 2009
You have been subscribed to a public bug by Joseph Smidt (jsmidt):
Binary package hint: nrss
nrss 0.3.9-1 gets an error parsing a feed encoded in ISO 8859-1 that
contains international symbols. Sometimes only the first item get
displayed if it contains no accented characters.
I've tried this feed:
But Atom feed from the same site works flawlessly (it is encoded in UTF-8)
The problem seems to be that XML_ParserCreate is called in parse.c with encoding set to "UTF-8". When called without explicitly set encoding, Expat honors the document encoding declaration.
I've tested it with XML_ParserCreate(NULL) and that works. Patch attached.
DistroRelease: Ubuntu 9.04
Package: nrss 0.3.9-1
Uname: Linux 2.6.28-4-generic i586
** Affects: nrss (Ubuntu)
Status: In Progress
** Tags: apport-bug bitesize feed rss xml
nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters
You received this bug notification because you are a member of Ubuntu Sponsors for universe, which is a direct subscriber.
More information about the Ubuntu-universe-sponsors