[Bug 319994] [NEW] nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters
Launchpad Bug Tracker
319994 at bugs.launchpad.net
Thu Jan 22 18:17:16 GMT 2009
You have been subscribed to a public bug by Joseph Smidt (jsmidt):
Binary package hint: nrss
nrss 0.3.9-1 gets an error parsing a feed encoded in ISO 8859-1 that
contains international symbols. Sometimes only the first item get
displayed if it contains no accented characters.
I've tried this feed:
http://rss.golem.de/rss.php?feed=RSS2.0
But Atom feed from the same site works flawlessly (it is encoded in UTF-8)
http://rss.golem.de/rss.php?feed=ATOM1.0
The problem seems to be that XML_ParserCreate is called in parse.c with encoding set to "UTF-8". When called without explicitly set encoding, Expat honors the document encoding declaration.
I've tested it with XML_ParserCreate(NULL) and that works. Patch attached.
ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: nrss 0.3.9-1
ProcEnviron:
PATH=(custom, user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: nrss
Uname: Linux 2.6.28-4-generic i586
** Affects: nrss (Ubuntu)
Importance: Undecided
Status: In Progress
** Tags: apport-bug bitesize feed rss xml
--
nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters
https://bugs.launchpad.net/bugs/319994
You received this bug notification because you are a member of Ubuntu Sponsors for universe, which is a direct subscriber.
More information about the Ubuntu-universe-sponsors
mailing list