The problem with blacklists and false positives

Chan Chung Hang Christopher christopher.chan at bradbury.edu.hk
Fri May 8 13:33:31 UTC 2009


R Kimber wrote:
> On Fri, 08 May 2009 07:35:39 +0800
> Christopher Chan wrote:
>
>   
>> Moderators would have to be able to train bogofilter and in fact they 
>> would have to do that from the very start and approve each and every 
>> mail until bogofilter becomes sufficiently accurate to leave only a 
>> small workload if it ever gets to that point.
>>     
>
> No.  You can collect a corpus of posts that are agreed to be good and
> another that are agreed to be unacceptable, and then train on these.
> This is not an unduly onerous task, and certainly avoids having to look
> at each and every post.  Bogofilter, in my experience, quickly becomes
> fairly accurate.
>
>   

I don't see how you avoid training bogofilter in any case. Whether you 
start with fresh new posts to the list or scour the archives for 'good' 
and 'bad' posts, you still need someone to do the legwork. In the end, 
you still have to 'look at each and every post' initially as I said.

bogofilter quickly becomes accurate yes. IF the stuff it looks at does 
not change. Good luck with that. There is a reason wikipedia has an 
entry for 'bayesian poisoning.' Not that I am saying members of the list 
will be out to poison the database but that it will naturally happen.




More information about the ubuntu-users mailing list