spamassassin setup with Evo - Edgy

Fri Mar 30 14:37:59 UTC 2007

On Thu, 2007-03-29 at 21:29 -0400, Jeffrey F. Bloss wrote:
> John Dangler wrote:
> 
> > I have been putting spam mails into a folder called 'Possible_Spam' for
> > the last hour or so (it has 65 messages in it now), and have run
> > sa-learn --spam --mbox ./Possible_Spam against it... once the messages
> > in the folder have been run through sa-learn, do I need to leave them
> > there, or is it okay to delete them ? (I'm not sure I can see the reason
> > for running sa-learn multiple times against the same messages) ...
> 
> You can move them or whatever, but it's always nice to have a set of
> known spam messages handy in case you need to re-train so I wouldn't
> blow them up. Unless you have an unlimited supply...
According to the man page on sa-learn - 
 --ham
Learn the input message(s) as ham.   If you have previously learnt any
of the messages as spam, SpamAssassin will forget them first, then
re-learn them as ham.  Alternatively, if you have previously learnt them
as ham, it’ll skip them this time around.  If the messages have already
been filtered through SpamAssassin, the learner will ignore any
modifications SpamAssassin may have made.

--spam
Learn the input message(s) as spam.   If you have previously learnt any
of the messages as ham, SpamAssassin will forget them first, then
re-learn them as spam.  Alternatively, if you have previously learnt
them as spam, it’ll skip them this time around.  If the messages have
already been filtered through SpamAssassin, the learner will ignore any
modifications SpamAssassin may have made.

So, re-running messages previously identified as one or the other is ok,
since the utility will know which classification the message has been
set to.

> 
> It's not a good thing to re-run sa-learn across the same set of
> messages because it muddies up SA's database of tokens I think, or
> rather there's a command line switch that needs thrown to make
> 'sa-learn --forget' about a message before re-learning it as spam or
> ham. ;)
I'm not sure this is necessary, given the above explanations... although
there is also --clear, which will wipe a learn database so you can start
over...
> 
> And don't neglect to run 'sa-learn --ham' on a goodly amount of known
> ham messages, it's equally important for initial setup.
Yeah - about this... my email tree is setup so that all of my folders
are set at the same level as Inbox.  Yet, when I run sa-learn against my
Inbox folder, which currently has 746 messages in it, sa-learn reports:
root at croatus: sa-learn --ham --mbox ./Inbox
Learned tokens from 236 message(s) (2617 message(s) examined)

I'm not sure how this is possible.  But if it's looking at everything on
this pass, then it's taking my Spam folder and re-learning it as ham
every time...