PDA

View Full Version : Automatic SPAM and HAM learning


saschahb
11th July 2005, 08:33 AM
hi all,

i've coded a script which makes it possible to all IMAP users to use the sa-learn function from spamassassin automatically...

It works like that:
Users have to create a SPAM and/or HAM folder within their root INBOX directory (via email client).
Now the users have to sort SPAM mails, which are not correctly identified as SPAM, to this SPAM folder.
And all mails which are identified as spam, but are no spam to the HAM folder.
The script runs for example every hour or once a day. It checks every SPAM and HAM folder for mails and uses sa-learn to learn the content.
SPAM mails will be deleted automatically after learnen (can be turned off within the script).
HAM mails will be moved back to the INBOX after learning so the user can sort the mails to another folder.
I hope that you'll like this script. Of course you can post any suggestions.

You just have to add this script to the root-crontab.
for example (at every full hour):
0 * * * * /usr/local/sbin/qauto-salearn

Source:
qauto-salearn perlscript (http://www.realriot.de/coding/qauto-salearn)

PS: There are some config parameter within the script. I use the script with debian 3.1 and Plesk 7.5.3

saschahb
12th July 2005, 03:34 PM
bugfix and a new feature!
Version 1.1

- I bugfixed a little permission-problem with maildirectories where email is disabled within plesk
- Another bugfix is that this script now uses sudo to sa-learn new mails and to create the bayes databases....
In the last version the bayes databases had root-permissions. So the autolearn function from spamassassin didn't work correctly except the qauto-salearn one...

within the script you have to define the path to sudo and the user which this script should give the bayes databases (it's "popuser" at debian 3.1 with plesk 7.5.3)

Have fun...


Download V1.1 of qauto-salearn (http://www.realriot.de/coding/qauto-salearn)

jamesyeeoc
13th July 2005, 01:45 AM
Thank you saschahb. I am sure there will be many who will make use of your script, I know there have been a number of posts wanting to know how to do this.

saschahb
13th July 2005, 02:12 AM
Thanks,

i'll keep on developing this script because i (and perhaps many more people) need this function daily... :D

If you've got any suggestions don't hesitate to tell me...

jamesyeeoc
13th July 2005, 02:24 AM
I believe the paths and popuser should be fine for RH boxes as well, not too sure about if they are running on a VPS/Virtuozo type server though.

If you ever get additional OS input from others and want to put in a check for OS type, I can give you a list of what files to check for to determine OS and version. Gave the same to lvalics for powertoys. Then maybe auto-set the paths/options per detected OS?

Herby
13th July 2005, 06:33 AM
Hi,

running on a RHEL box (Whitebox in fact)

except these lines it seemed to work

bayes expire_old_tokens: lock: 7278 cannot create tmp lockfile /root/.spamassassin/bayes.lock.<FQDN>.7278 for /root/.spamassassin/bayes.lock: Keine Berechtigung

br
herby

saschahb
14th July 2005, 10:41 AM
hm, don't know why the script tries to update the bayesdatabes of root...

you're running the script as root? in fact, you have to. perhaps you can give me some more informations about this problem. can't see where the matter is :(

jamesyeeoc
15th July 2005, 02:27 AM
bayes expire_old_tokens: lock: 7278 cannot create tmp lockfile /root/.spamassassin/bayes.lock.<FQDN>.7278 for /root/.spamassassin/bayes.lock: Keine BerechtigungIn general, this Permission Denied error would usually indicate a problem with the bayes database path.

Make sure your /etc/sysconfig/spamassassin file has a -H /var/qmail on the end of it.

cat /etc/sysconfig/spamassassin
SPAMDOPTIONS="-d -u qmailq -q -x -c -H /var/qmail "

(Note: found this on ART's forum)
[Edit] - ah, but would this be proper for a user by user training vs. systemwide...?

atomicturtle
15th July 2005, 07:41 AM
depends on if you're storing bayes/awl data in SQL or not. The -q flag means "use mysql", if you are using mysql then your training system would need to be modified to get sa-learn to use the correct syntax. Which unfortunately is extremely ugly, since sa-learn doesnt let you specify the user on the command line. Youve got to do some messy stuff with local.cf to get it into the right place.

Its something I've been working out in atomic-psa, I'll have something together in another week or so I reckon.

saschahb
21st February 2006, 06:30 AM
Bugfixes and a new feature!
Version 1.2

- Some permissions fixed. Setting all bayes etc. files to user popuser.popuser...

- If a special directory is configured this script will change the default .qmail user file that every detected spammail will be moved to this directory.
For example: A user creates a directory "Perhapsspam" on this imap account and this script is configued to move all detected spam to "Perhapsspam" he will never get detected SPAM to his INBOX anymore. Always to this directory.

Requirements: safecat has be installed on the server.

PLEASE! Read the first lines of this script. You have to configure some things.

If've tested this script under RHEL4. Debian should work, too.

Have fun...


Download V1.2 of qauto-salearn (http://www.realriot.de/coding/qauto-salearn)

Next feature: Retention time for detected SPAM in the special directory

saschahb
22nd February 2006, 02:51 AM
Version 1.3 is out...

New feature (due to several requests):

+ SPAM retention time on special folder
( SPAM will be automatically deleted after a given amount of days)

Download V1.3 of qauto-salearn (http://www.realriot.de/coding/qauto-salearn)

beam
26th February 2006, 10:32 AM
Thanks alot for that script.
You should make the Folders variable.

Beam

saschahb
26th February 2006, 10:45 AM
Originally posted by beam
Thanks alot for that script.You should make the Folders variable.


You mean the SPAM and HAM Foldername?

beam
26th February 2006, 10:49 AM
Originally posted by saschahb
You mean the SPAM and HAM Foldername?

Yes, I mean SPAM and HAM foldernames.

Beam

beam
27th February 2006, 01:17 AM
I have a problem with it on my server. sa-learn tries to learn to the Bayesdatabase of the runnig user. The sudo doesn't change the home-dir, so it tries /root/.spamassassin.
If I use the -H option of sudo it learns /var/qmail/.spamassissin.
I found the problem in sa-learn: The db-path only works for dump and import and not for spam and ham.
Maybe its a solution to softlink the bayesfiles of the actual user to the home of the popuser...

MMaverick
31st May 2006, 10:06 AM
Does this script work with the command "thisisspam" of spamassassin?
If not, why not?

I read about this command from this article; http://support.pa.msu.edu/help/faqs/other/faq-spamassassin-part-3.html

I'm surely going to use this script, thanks alot!

MMaverick
3rd June 2006, 05:17 PM
Hi,

I got the following error when running the script:

bayes expire_old_tokens: lock: 21979 cannot create tmp lockfile /root/.spamassassin/bayes.lock.dedicational.com.21979 for /root/.spamassassin/bayes.lock: Permission denied

EDIT: Problem solved: Only had to change the $sluser value in the script. (Changed it to 'root' since the crontab was being run as root.)

dl2rbi
16th November 2006, 03:53 AM
Looks very interesting ...
Has anyone tested this script with PLESK 8.0.1 ?
I think there are changes neccessary ...

Werner (using PLESK 8.0.1 / SuSE 10/64bit)

MMaverick
16th November 2006, 04:03 AM
If you have the money, you should just buy 4PSA Spam Guardian.
It's really worth your money. It has a very nice user interface, easy to maintain, stable and has the spam learn script which works fine.

Check the website for more information.
http://www.4psa.com/products/4psasguardian.php