Improve SpamAssassin accuracy – sa-learn and Spam Trap

SpamAssassin is probably the most popular antiSPAM service for your own server and although through its mails analysis can stop a large part of SPAM, spam bot are always improving their contents to be not flag as SPAM. However SpamAssassin comes with a large set of rules but won’t change much unless you teach him!

Training SA will then help improve its accuracy, however to be effective, you need to have a similar amount of Ham (Non Spam) and Spam mails or more Ham than Spam, as if you only train Spam, SA rules will be biased toward Spam and could generate false positive.

Obviously, teaching as to come from you. If you find Ham in Spam folder, you need to move it to your regular Inbox and if you find Spam in your regular Inbox, you have to move them in Spam. (Basic learning I’ll say:) )

I assume you have a working mail configuration with SpamAssassin working and moving those Spam into a Spam folder through Procmail or Sieve rules.

If not, here is how to install a Postfix + Dovecot mail system with SSL and here is how to protect your mails with SpamAssassin + ClamAV with Amavis and Procmail.

I’m doing a 2 continuous ways training, if you just want a quick script to do so, just check at the end of the article.


1) SA Learning on current INBOX and SPAM mailboxes

The official documentation got a lot of details and I recommend you to check it if you want to know more.

SpamAssassin comes with its own tool to learn Spam and Ham, called sa-learn. Very easy to use with Maildir or mbox, …

To learn Spam, simply run

sa-learn --no-sync --spam ~/Maildir/.Junk/{cur,new}

where .Junk is the folder name of your Spam folder located on your Maildir folder. (If you are using mbox just add the option--mbox before the folder name)

and to learn Ham:

sa-learn --no-sync --ham ~/Maildir/.INBOX/{cur,new}

Note that we don’t need to activate the learning on the tmp folder as it should be empty most of the time.

Once done, you need to tell SA to rebuild its database with the command:

sa-learn --sync

You can then build your own rules, script or cron job based on these commands.


 2) Spam Trap

You may want to create a dedicated mail box for known spammy websites like (Go ahead bots, send me emails to this box mail, they will all be set as spam).

To do so, after creating this mail box, either you adapt previous script or run a cron job for SA to learn from (Regardless if it is in Spam folder or not) or you can set up a rule in your procmail if any mails are sent their, to learn it as Spam such as:

* ^To:.*
:0c: spamassassin.spamlock
| sa-learn --spam

But in my case, I prefer to have a unique script to deal with all these.


SA-Lean script coupled with Spam Trap

If you are looking for a full script to scan all users ham and spam folder, while deleting automatically old spam from your system and providing a backup of SpamAssassin rules (In case you want to use them on a different system or just as pure backup), here is a script you could use.

Create the script file (Where you want), example:

mkdir /home/script
nano /home/script/sa-learn

and paste:

### From, modified by Karibu (

# Specify users names, space padded [user=(user1 user2 user3)] or leave it empty [user=()] to include all users. All users is considered uid ≥ 1000.

# Specify Spam Trap user name (Comment out to disable spamtrap)

# After how many days should Spam be deleted?

# Backup path (Comment out to disable backup)
bk=/home/backup/sa-learn_bayes_`date +%F`.backup

# Log file to keep record

####### BEGINNING OF THE SCRIPT ########

echo -e "\n`date +%c`"  >> $log 2>&1

if [ -z ${user[@]} ]; then
echo No user mentioned - Using all users from system
user=( $(awk -v exclude="$spamtrap" -F':' '$3 >= 1000 && $3 < 65534 && $1!~ exclude{print $1}' /etc/passwd) )

for u in ${user[@]}; do
if [ ! -d /home/$u/Maildir ]; then
echo "No such Maild     ir for $u" >> $log 2>&1
echo "Proceeding with ham and spam training on user \"$u\""

echo $u Spam Scan>> $log 2>&1
sa-learn --no-sync --spam /home/$u/Maildir/.Junk/{cur,new} >> $log 2>&1

echo $u Ham Scan>> $log 2>&1
sa-learn --no-sync --ham /home/$u/Maildir/{cur,new} >> $log 2>&1

if [ -n $spamtrap ]; then
echo SpamTrap Scan>> $log 2>&1
sa-learn --no-sync --spam /home/$spamtrap/Maildir/.Junk/{cur,new} >> $log 2>&1
sa-learn --no-sync --spam /home/$spamtrap/Maildir/{cur,new} >> $log 2>&1
echo Sync SA base >> $log 2>&1
sa-learn --sync >> $log 2>&1
if [ $? -eq 0 ]; then
for u in ${user[@]}; do
echo "deleting spam for $u older than $cleanafter" >> $log 2>&1
find /home/$u/Maildir/.Junk/cur/ -type f -mtime +$cleanafter -exec rm {} \;
echo "sa-learn wasn't able to sync. Something is broken. Skipping spam cleanup"

echo "Statistics:" >> $log 2>&1
sa-learn --dump magic >> $log 2>&1
echo ============================== >> $log 2>&1

if [ -n $bk ]; then
echo "Backup writing to $bk" >> $log 2>&1
sa-learn --backup > $bk

Just set the users you want to monitor (leave it blank if all), spam trap if any, old spam deletion timeframe and the backup folder path.

The script will then scan the selected users (if not all but excluding Spam trap first), will learn from cur/new INBOX for Ham and from cur/new SPAM box for Spam. Then it will scan the spamtrap account to flag all mails as spam (from both INBOX and SPAM folder). Then it will sync the SA base, remove old spam (If requested) and finally print the statistics on top of create a backup file of SA rules.

Don’t forget to make it executable (chmod +x) and set a cron job, like every day at 1am:

0 1 * * * /home/script/sa-learn

or for example only once a month (if not used often), the first day of the month at 1am:

0 1 1 * * /home/script/sa-learn




Loves to discover web-based apps to install on his own server@home and write articles about it


Add a comment