Skip to main content

Improve SpamAssassin accuracy – sa-learn and Spam Trap

SpamAssassin is probably the most popular antiSPAM service for your own server and although through its mails analysis can stop a large part of SPAM, spam bot are always improving their contents to be not flag as SPAM. However SpamAssassin comes with a large set of rules but won’t change much unless you teach him!

Training SA will then help improve its accuracy, however to be effective, you need to have a similar amount of Ham (Non Spam) and Spam mails or more Ham than Spam, as if you only train Spam, SA rules will be biased toward Spam and could generate false positive.

Obviously, teaching as to come from you. If you find Ham in Spam folder, you need to move it to your regular Inbox and if you find Spam in your regular Inbox, you have to move them in Spam. (Basic learning I’ll say:) )

I assume you have a working mail configuration with SpamAssassin working and moving those Spam into a Spam folder through Procmail or Sieve rules.

If not, here is how to install a Postfix + Dovecot mail system with SSL and here is how to protect your mails with SpamAssassin + ClamAV with Amavis and Procmail.

I’m doing a 2 continuous ways training, if you just want a quick script to do so, just check at the end of the article.


1) SA Learning on current INBOX and SPAM mailboxes

The official documentation got a lot of details and I recommend you to check it if you want to know more.

SpamAssassin comes with its own tool to learn Spam and Ham, called sa-learn. Very easy to use with Maildir or mbox, …

To learn Spam, simply run

where .Junk is the folder name of your Spam folder located on your Maildir folder. (If you are using mbox just add the option--mbox before the folder name)

and to learn Ham:

Note that we don’t need to activate the learning on the tmp folder as it should be empty most of the time.

Once done, you need to tell SA to rebuild its database with the command:

You can then build your own rules, script or cron job based on these commands.


 2) Spam Trap

You may want to create a dedicated mail box for known spammy websites like (Go ahead bots, send me emails to this box mail, they will all be set as spam).

To do so, after creating this mail box, either you adapt previous script or run a cron job for SA to learn from (Regardless if it is in Spam folder or not) or you can set up a rule in your procmail if any mails are sent their, to learn it as Spam such as:

But in my case, I prefer to have a unique script to deal with all these.


SA-Lean script coupled with Spam Trap

If you are looking for a full script to scan all users ham and spam folder, while deleting automatically old spam from your system and providing a backup of SpamAssassin rules (In case you want to use them on a different system or just as pure backup), here is a script you could use.

Create the script file (Where you want), example:

and paste:

Just set the users you want to monitor (leave it blank if all), spam trap if any, old spam deletion timeframe and the backup folder path.

The script will then scan the selected users (if not all but excluding Spam trap first), will learn from cur/new INBOX for Ham and from cur/new SPAM box for Spam. Then it will scan the spamtrap account to flag all mails as spam (from both INBOX and SPAM folder). Then it will sync the SA base, remove old spam (If requested) and finally print the statistics on top of create a backup file of SA rules.

Don’t forget to make it executable (chmod +x) and set a cron job, like every day at 1am:

or for example only once a month (if not used often), the first day of the month at 1am:



Reduce SPAM and improve security – Amavis + SpamAssassin + ClamAV + Procmail + PostScreen>90% of mail traffic are actually SPAM….and you will quickly need to implement Spam protection either from global blacklist, or learning algorithm or even check SMTP protocol respect.

The most popular way to block SPAM on your mail server is probably SpamAssassin. It’s a free and Open Source spam filter written in Perl. It will perform a wide range of tests on headers and body text to determine how likely spam will be your mail. You could after make SpamAssassin learn from its mistake (Ham) or endorse its correct decision (SPAM). It’s a powerful too and very flexible. The downside will be its resources footprint as it will scan all our mail to assign a score to them and basically >90% of them will be SPAM.

Others solution exists, more resources efficient, but with others downside tho. It’s the case of using RBL (Real-time Blackhole). It’s a database of known spammy IPs, from Spamhaus for example. You can select the spammy IPs list to block (Some are larger than others). However the downside is you might block legitimate IPs as only 1 domain might actually spam and all the rest on same IP could be legitimate. Or worse, in some cases, Spamhaus and co blocked a full range of IP…

But there is also others way to do it, like with Postscreen. As most of the Spam are sent by Zombies computers and have only a very limited amount of time to deliver their spammy mails before being backlisted, they tend to make compromises in their SMTP protocol implementation, for example, they may speak before their turn or they may ignore responses from SMTP servers and continue sending mail even when the server tells them no to do so, etc… In that case, Postscreen is here to see if they respect the SMTP protocol and if they do, will allow the mail to be delivered.

I think this process is quite efficient and could save a lot of resources as SpamAssassin will not have to scan all the mails, but only the one having passed the first tests from Postscreen. However if rejected, the client will need to resend the mail (Usually spammers don’t) and in this case you can have a long grace period (Several minutes to several hours depending on the client…). For this reason I do not use it but if you are having a heavy load due to spam and spamassassin don’t work enough or use all your resources, it’s a good turnaround. aspect to cover is having an AntiVirus. For linux? you will say? Well first of all, Linux is not perfect (Although it managed much better the authorization and system access than Windows) but you could suffer from some virus. But most important, you may not be the only user that will read mails coming from your server. Either you could offer access to family, friends, … or read your mails on different system including Windows or simply forward a mail to others people. That’s why I think having a proper AntiVirus for your mails is important.

But here again, having an AntiVirus that will scan all your mails to look for viruses will use a significant amount of resources  (30-50mb Ram probably?) and here is where again Postscreen could help, to avoid scanning Spam mails too.

Actually, to make this configuration works, you will also need an additional package, Amavis to buckle the loop:

Postscreen will remove at the earliest stage a significant part of Spam (The one not respecting SMTP protocol implementation) and let them go to Postfix. Amavis will then do the bridge between Postfix and SpamAssassin + ClamAV to check the Spam and Virus and finally Procmail to dispatch all these into the local mailbox. (Note that Sieve in Dovecot could do it too)

So let’s see how to install and configure all this.

PS: I don’t use Postscreen and if you want no delays in your mail, shouldn’t use.


and we will also add some compression tools to be able to scan the archives for viruses too.
Postscreen is part of Postfix and does not require additional package.


  • ClamAV:

Per default, ClamAV will automatically update its database every hour. If you want to update it now, you can run:

Then, to avoid ownership issues during scans from ClamAV and Amavis, we need to add ClamAV and Amavis users to each others’ groups:

  • Amavis:

You will need to make Amavis and Postfix communicate.

In /etc/postfix/, below the line:


to looks like that:

And at the end of the file add:

then in /etc/postfix/, add:

Now you need to configure Amavis directly. In /etc/amavis/conf.d/15-content_filter_mode, make sure the 2 variables

are uncommented. You’re now good to go to SpamAssassin

  • SpamAssassin:

I suggest to create a dedicated user to run spamassassin to better control the process and have dedicated logs.

In root (su) type:

Its configuration file is located in /etc/default/spamassassin. You will need to modify few things to enable SpamAssassin:

and change the following to 1

You will also need to modify the OPTION line to become:

and add a new line with:

Now you need to configure Postfix to use SpamAssassin

At the line:

add below (new line):

then at the end of the file, add:

Finally restart all the services you have touched to.

If any issue happen during the restart, it should tell you what to do. If no issue, you should now be protected from Spam and Viruses.

You can try if it works by sending a fake spam to your mail box. Simply send you an email with the content:

or try with a inoffensive virus from The European Expert Group For IT-Security.

  • Procmail:

You may want to make sure they are store in your Junk box to separate them from your regular inbox. Here is where Procmail enter. (Although Sieve in Dovecot could do the same)

First, you will need to tell postfix to use procmail.

add the following line:

then, we need to config the rules.

From the Dovecot wiki, it states that Procmail seems to have some intermittent delivery problems if you use the system-wide configuration with Maildir style mailboxes. (/etc/procmailrc) and thus should use $HOME/.procmailrc instead.

Hence, to avoid having to configure that at every new email/user we will use the skel system to ensure our .procmailrc is copied to every new user.

In root, create the /etc/skel/.procmailrc file

and copy this simple configuration:

This will route the SPAM in the .Junk folder. (You should be able to subscribe to this folder using your favourite email client like Thunderbird,…)

When you will create a new user, the user will have this .procmailrc in its home and should be able to have it email running directly.

As explained in the first part of this tutorial, to create a new user: (In root)

A long tutorial but you should now have access to a secure mail system.

A New CAPTCHA Approach

If you want to use Postscreen to have an additional layer of Spam protection, you can follow below tutorial:

  • Postscreen:

In your /etc/postfix/, add a section for Postscreen as following:

Few explanation:


When a client connect to Postscreen, it will start to communicate by sending a first banner “Please wait to be seated” and 6 seconds later, the remaining information on the SMTP identity. According to SMTP protocol, the client needs to wait to receive the entire banner. Spam bots will probably not wait (as they are configured to send as many mails as possible) and Postscreen will not accept its mail.


Initially, before the ESMTP (Extended SMTP), the protocol was half-duplex, mining the server and client needed to send 1 command at a time and wait for the answer of the other. Enabling this option will indicate to the client that he needs to send 1 command at the time as Postscreen “does not” support ESMTP. Here again, most probably Spam bots will not respect that and send the entire set of commands directly.


This test is a simple filter that block the commands CONNECT, GET and POST, used by spam bots when they use proxies. This filter is actually already implemented in Postfix (Since version 2.2) but having at the upstream should help reduce the load on the smtp daemon.


This test is still very simple but a lot of Spam bots don’t respect it….in the SMTP protocol implementation, each line should finish by <CR><LF> for “Carriage Return & Line Feed”. But a lot of zombies only use the <LF> at the end of their line.

Obviously many more options exists and you should read the official documentation to learn more.

Then you need to modify the /etc/postfix/ to enable Postscreen and allow him to route the validated mails to smtpd.(In root)

and replace the line


and then restart postfix

However you will receive mails with a delay from few minutes (5mn from Hotmail and 20mn from Gmail based on my previous test) to few hours depending on the client side….that’s why I don’t use Postscreen in fact.