[Bug 320829] Re: Bogofilter seems to fail decoding base64

Loïc Minier lool at dooz.org
Sat Apr 10 10:28:31 BST 2010

Actually this bug uncovers an important issue with parsing of the first
line of the body; bumping to high.

** Changed in: bogofilter (Ubuntu)
       Status: Confirmed => Fix Committed

** Changed in: bogofilter (Ubuntu)
   Importance: Undecided => Medium

** Changed in: bogofilter (Ubuntu)
     Assignee: (unassigned) => Loïc Minier (lool)

** Also affects: bogofilter (Ubuntu Lucid)
   Importance: Medium
     Assignee: Loïc Minier (lool)
       Status: Fix Committed

** Changed in: bogofilter (Ubuntu Lucid)
   Importance: Medium => High

Bogofilter seems to fail decoding base64
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is a direct subscriber.

Status in Bogofilter: Bayesian Mail Filtering: Confirmed
Status in “bogofilter” package in Ubuntu: Fix Committed
Status in “bogofilter” source package in Lucid: Fix Committed

Bug description:
Binary package hint: bogofilter-bdb

Description: Ubuntu 8.04.1
Release: 8.04
Package: bogofilter-bdb
Source-Package: bogofilter
Version: 1.1.5-2ubuntu5

During the last days I received a lot of similar spam that passed bogofilter marked as Ham. Even after tagging a lot of mails (>50) this was not improved. Neither for already tagged mails nor for new mails.

Looking on the plain mail text I found out that the mails although plain text with cp1251 formatting were base64 encoded. Thus I first assumed that bogofilter might be unable of handling base64 encoding. But actually this is integrated since version 0.10 and should therefore be still in 1.1.5-2ubuntu5 as I have installed here.

A brief test brought up the following:

I tagged one of the spam mails using a new database with "bogofilter -s" and compared the database contents (retrieved via "bogoutil -d") with another new database were I tagged the same mail but with decoded body and subject.

In the first DB only information on header fields was present. In the second DB there was also information regarding the body of the mail.

Thus I conclude that bogofilter did not manage to decode the mail - whereas KMail does this flawlessly.

I attach an mbox folder with a selection of mails.

More information about the Ubuntu-sponsors mailing list