Howto: Reduce Spam using SpamAssassin Bayesian filters in cPanel

Introduction

While this article is intended to demonstrate how to train and use Bayesian filters on cPanel hosted sites, you can use the information in this technique wherever SpamAssassin and fetchmail are installed.

Requirements

  1. cPanel based host
  2. SpamAssassin
  3. Access to the cron scheduler within cPanel
  4. Access to the file manager within cPanel
  5. Fetchmail must be installed and usable on your hosted server

Audience

This article assumes a working knowledge of cPanel, file editing, and email protocols. It is useful for anyone wanting to increase the accuracy of SpamAssassin's spam detection.

SpamAssassin

SpamAssassin inspects emails being delivered to your email account looking for signs that the email is spam. It has a number of signatures that it looks for, and each of these can be given a weighting, which results in the email having a spam “score”. If the score is over a threshold, the email is considered spam. There are a number of actions that can be taken if the email is designated as spam, one of which is to alter the Subject line of the email to reflect it is spam so that a rule can be defined in your email client to automatically delete the email or move it to a junk email folder.

Bayesian Filters

Bayesian filters are a method that SpamAssassin can use to learn more about the kinds of email you receive and so can enhance its ability to identify which emails are spam, and which are not. This also helps reduce the false-positive count, where an email is marked as spam when it should not have been.

These filters work by identifying key elements of emails that are common to spam, and those that are common to non-spam email. To achieve this, the filters must be given a sample of both to learn from and requires at a minimum 200 spam emails and 200 non-spam emails. Unfortunately, for many, obtaining 200 spam emails is a trivial task.

The Setup

This article assumes that you will be using a separate email address for spam training. This can make things easier to manage, especially if you use POP3 as your primary means for downloading email.

The basic steps are as follows:

  1. Enable and configure SpamAssassin
  2. Create a Spam email account with a non-spam email folder
  3. Configure .fetchmailrc to allow SpamAssassin to download emails
  4. Configure cron to automatically train SpamAssassin
  5. Copy 200 spam emails to Spam inbox, and 200 non-spam emails to a non-spam folder

Enable SpamAssassin

  1. Navigate to the Mail / SpamAssassin menu in cPanel
  2. Click Enable SpamAssassin. This will enable SpamAssassin on any incoming email. Note that this will already accurately identify spam, however, the remainder of the article shows how to increase SpamAssassin's accuracy.
  3. Click Configure SpamAssassin
  4. Ensure the configuration is as follows:

add_header: all

This adds headers emails that will enable you to see why an email has been classed as spam or non-spam

report_safe: 0

This makes changes to the email “in situ”. If you would prefer that the email is left unaltered, set this option to “1”. A new email will be created with the spam details in the content, with the original unaltered email as an attachment

required_score: 5

SpamAssassin uses a series of criteria to determine if an email is a spam, and each is given a score. The email will be classed as spam if the determined score is above 5

rewrite_header subject: **SPAM** _HITS_ (_BAYES_)

This will change the subject line of the email. The email subject line will be prefixed with the above line, with _HITS_ replace with the spam score for the email, and _BAYES_ replaced with the Bayesian score

score BAYES_99: 5

This states that any email that has a 99% or above rating as spam due to Bayesian analysis, then add five “spam” points to its score. Note that this will put the score of the email over the 5 point threshold defined above, and so automatically marks the email as spam

Create Spam Account

This email address will be used to train SpamAssassin; it is not intended that this account be used for sending or receiving normal emails.

  1. Navigate to Mail / Manage/Add/Remove Accounts
  2. Click Add Account
  3. Create a new account with an appropriate name, such as spam@mydomain.com, and define a password. SpamAssassin will empty this account on a regular basis, so it is not necessary to define a quota.
  4. Click Back and you should see a list of email accounts; click Read Webmail for the spam account you just created. Enter the password for the account, and you will be taken to the webmail selection page. Choose your preferred webmail client.
  5. Create a new folder called “nonspam”.

Configure Fetchmail

SpamAssassin uses fetchmail to read the emails from the spam account to train its Bayesian filters.

  1. Navigate to File Manager in cPanel
  2. You will be presented with a list of folders, followed by a list of files. At the bottom of the folder list, click Create New File
  3. On the right-hand side, type .fetchmailrc in the dialogue box, and leave the document type as Text Document
  4. The .fetchmailrc file should appear in the file list in the left-hand panel. Click the filename and then select Edit File from the menu on the right-hand side.
  5. Complete the file as follows, using the username and password you selected for your spam account above:

poll 127.0.0.1

with protocol IMAP

username spam+mydomain.com password spampassword is spam

  1. Save the file

Configure cron to Automatically Train Spamassassin

Cron is a task scheduler, and so can carry out tasks at specific times of the day. We will configure cron to run the SpamAssassin training process for spam and non-spam once a day.

  1. Navigate to Cron jobs in cPanel
  2. Click Advanced
  3. Each time cron runs, it can send an email to an administrator to report the results of each task. Enter an appropriate email address in the box provided
  4. First we want SpamAssassin to train for non-spam, so put the time that you want this task to run at by entering the minutes of the time in the first box, and the hour in the second box. So to have the non-spam training task run at 1:30am, enter 30 in the minute column, and 1 in the hour column. The command to run the non-spam training task is:

fetchmail -v -u spam+mydomain.com -a -n --folder nonspam 127.0.0.1 -m 'sa-learn --ham --siteconfigpath /usr/share/spamassassin'

  1. The next entry will tell SpamAssassin to train for spam. Enter a later time in the the minute and hour field, such as 35 for a minute, and 1 for an hour, to have the command run at 1:35am. The command to train for spam is:

fetchmail -v -u spam+mydomain.com -a -n 127.0.0.1 -m 'sa-learn --spam --siteconfigpath /usr/share/spamassassin'

The cron screen should look something like the following:

Provide Training Emails

As noted before, SpamAssassin requires 200 spam and non-spam emails before it will start using the Bayesian filters in emails.

To provide the emails, you must first access the spam email account in your preferred email client. Follow the instructions for your email client to access an IMAP email account, using the account name and password you selected above. When you have created the account and accessed it, you should see the Inbox and nonspam folders. Copy 200 spam emails into the inbox, and 200 non-spam emails into the nonspam folder. The next time the cron jobs are scheduled to execute, SpamAssassin will create a database from those emails that it will use to determine whether any future emails are spam.

Ongoing Management

Any emails that are identified as spam will be marked as such with SPAM in the subject line. You can use a rule or filter in your email client to automatically deal with these emails, perhaps by putting them into a folder the contents of which are automatically deleted after a defined time period.

SpamAssassin gets better with the more data it has to work with, and so from time to time, you will get false positives, where an email is classed as spam that should not have been. If this happens, copy the email to the nonspam folder of the spam account and SpamAssassin will update its “nonspam” database. Likewise, some spam will not be identified, and these should be copied to the inbox of the spam account.

Troubleshooting

Most issues that may be encountered with using this method to train SpamAssassin can be diagnosed by reviewing the email received from the cron job. They are usually related to the fetchmail process executed through cron:

  1. The SpamAssassin global configuration file is not at /usr/share/spamassassin. Check with your hosting provider as to its location
  2. Fetchmail is not installed. Ask your hosting provider if it can be installed.
  3. Emails do not have a Bayesian spam score. This is usually because SpamAssassin has not been provided with enough emails to build a profile of what is and isn't spam. Check that the inbox and nonspam folders are being emptied each day, and that you have provided 200 samples of each type of email
  • 0 Users Found This Useful
Was this answer helpful?

Related Articles

Do your Linux hosting plans include shell access via ssh?

Yes, you may request Shell access via support ticket, we provide a jailed shell access. Please...

Free SSL on all Linux shared hosting plans

All our shared Linux hosting comes with Free SSL provided by Comodo and Cpanel. This is...

Name server change and crossover traffic during propagation

During the change of Name servers, DNS is cached at all DNS servers until the TTL (time to live)...

How to send email using ASP and CDOSYS

The following ASP script can be used to send email using CDOSYS on our servers. Dim iConfig Dim...

Securing your site

Typical attack vectors for hacked sites are the following: Insecure file and folder...