SpamD Options

This forum contains features that has been archived. This section contains implemented features, duplicate requests, and requests which we have decided not to implement.
Post Reply

Do you need this feature?

Yes
30
88%
No
4
12%
 
Total votes: 34

Shiloh
Normal user
Normal user
Posts: 163
Joined: 2006-04-14 00:00

SpamD Options

Post by Shiloh » 2006-04-14 20:18

I would like to see an option in hMailServer to select the hostname and port of a SpamD server to connect to for handling SpamAssassin processing of inbound email. I would also like the ability to set thresholds on a per user basis for when to tag and/or discard an email based on its spam score.

This would give us a very scalable way to add SpamAassassin support to hMailServer, because the SpamD code could run on a cluster of FreeBSD boxes and hMailServer could simply connect to SpamD via TCP. We have already done something similar using XMail.

This feature could be accomplished by merging the source code from spamc into hMailServer and then adding the thresholds fields to the MySQL table that holds the user information.

mbreitba
Senior user
Senior user
Posts: 340
Joined: 2006-04-14 22:25

Post by mbreitba » 2006-04-15 21:00

This sounds like an excellent idea, but can you tell us more about these clustered freebsd systems?

Shiloh
Normal user
Normal user
Posts: 163
Joined: 2006-04-14 00:00

Post by Shiloh » 2006-04-16 23:42

More about the clustered SpamD boxes:

What we did to increase the scalability of SpamAssassin is to run SpamD instead of launching SA every time we want to parse an email. SpamD stays running between hits so there is no startup overhead per hit. We tried running SpamD on Windows but found that SpamD runs a lot better under FreeBSD or Linux. This has little to do with the performance of the OS itself. The SpamD application is simply written such that it runs better under Unix than Windows.

We we deployed SpamD on a FreeBSD box, it occurred to us that we could just as easily deploy a large cluster of identical FreeBSD boxes running SpamD. Then we use a load balancer to balance the SpamD requests evenly across the cluster of FreeBSD boxes. This is an excellent way to build a very scalable SpamAssassin based spam filtering solution.

Since quite a few of our servers use this SpamD cluster for filtering, it is important to have the SpamD cluster simply score the email. The SpamD cluster does not decide when to drop an email due to the score. The scripts on the email servers decide on a per mailbox basis what to do based on various spam scores. For example, a certain score might mean the email should be deleted while another score would mean it should just be tagged as potential spam. Each end user gets to decide what scores are right for them.

Also, we deployed ClamD on all of the FreeBSD boxes. We added a spam rule into SpamD that checks the email for a virus. SpamD connects to ClamD (both running on the FreeBSD boxes). If the email has a virus, the spam score is incremented by 200 points and the email is deleted at the spam filtering boxes. This solution allowed us to move the overhead of virus filtering off of the email servers.

This overall solution runs like a champ for spam and virus filtering. We currently launch a custom sciprt that launches SpamC, and SpamC connects to the SpamD cluster. However, more performance can be obtained by integrating the SpamC code and the per mailbox thresholds directly into the email server code. This way the only overhead per hit would be the TCP connection to the SpamD cluster. There would not be any additional processes getting launched every time an email was received by the email server. The net result would be even more performance and less overall load on the email servers.

mbreitba
Senior user
Senior user
Posts: 340
Joined: 2006-04-14 22:25

Post by mbreitba » 2006-04-17 17:33

Very neat indeed. Might have to look in to this more.

Shiloh
Normal user
Normal user
Posts: 163
Joined: 2006-04-14 00:00

Post by Shiloh » 2006-11-29 19:32

Here is a link to how I currently connect to SpamD from hMail.
http://www.howiblockspam.com/index.php? ... icle&sid=2

It accomplishes the same thing I am voting for in this poll, but uses a COM object and a custom VBS script to interface SpamD. Doing this directly in hMail will be easier and offer a bit better performance. Anyway, feel free to use my code until this has been integrated into hMail.

User avatar
martin
Developer
Developer
Posts: 6846
Joined: 2003-11-21 01:09
Location: Sweden
Contact:

Post by martin » 2007-07-14 13:11

What method / load balancer did you use balance the SpamD request evenly?

Is it enough to be able to specify a single IP address / port of the SpamD in hMailServer? And then if you want load balancing you'll have to set up a load balancer at that host which forwards traffic to SpamD.

(I'll probably not integrate SpamC in hMailServer but instead write my own code for it.)

GlenC
Senior user
Senior user
Posts: 680
Joined: 2004-08-17 23:31
Location: Santiago, Chile

Post by GlenC » 2007-07-14 14:35

The original post is kinda old but I thought I should point out that a lot of this capability is already built-in to SpamAssassin. Perhaps not to the degree that Shiloh needs in his organization but I feel I should point it out in order to help prevent needless duplication of features that might already exist.

SpamC already provides a simple form of load balancing. One can supply a list of hosts to connect to and then configure SpamC to randomly select one of those hosts to use for its submission. This might be enough for most installations. SpamC man page here: http://spamassassin.apache.org/full/3.2 ... spamc.html

As for user settings as described above, if you configure SA to use sql tables then this becomes very easy. The user can easily configure their own scores, white and blacklists and a few other things via a web interface. IMAP authentication makes it simple since you don't have to do anything special for each user to set them up. An example interface showing what options the user can change can be found here: http://www.misak.dk/phpsaadmin/index.php
username: demo password: demo

In my mind, IF hMailserver were to provide an interface of some sort for spamassassin, it should only concern itself with providing the username and other spamc options. Perhaps providing the option of skipping checks on locally generated mail or on mails already identified as spam using hMails already built-in spam checks.

Just some food for thought.

Shiloh
Normal user
Normal user
Posts: 163
Joined: 2006-04-14 00:00

Post by Shiloh » 2007-08-24 18:09

SpamC has it own load balancing. It just does a simple round robin dns type of thing. What we use for true load balancing is a pfSense box. It can tell if a SpamD box is down and will not route any traffic to a downed node. It works slick. In our setup, we just have one address that SpamD clients connect to, and that is the IP of the pfSense cluster. Then all of our SpamD boxes sit behind the pfSense box.

I personally do not like using SQL with SA, because it adds another point that can easily fail once the volume is really high. I would never dream of having all of my SA boxes look up per user stuff in a shared MySQL server. At our volume, the MySQL box would deadlock and fail. What I prefer to do is have SA simply score the email and then have logic on the email server side that decides on a per user basis what to do based on the score returned from SA. This is how I implemented the COM object for SA and the sampel event handler for hMailserver.

Post Reply