Score Spam Based on Blacklists in Google Sheets

This section contains scripts that hMailServer has contributed with. hMailServer 5 is needed to use these.
Post Reply
rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-01-08 21:14

I have a script that scores messages based on domain and IP blacklists stored in a Google sheet. When someone updates the blacklists in the Google spreadsheet, the script automatically picks up the changes. A blacklist consists of rows in the spreadsheet mapping regular expression patterns to scores. For example, here is an IP address blacklist:

Pattern Score
5\.39\.218\.\d{1,3} 6
46\.166\.184\.\d{1,3} 6
77\.81\.10[5-8]\.\d{1,3} 4
107\.17[2-5]\.\d{1,3}\.\d{1,3} 3
173\.199\.119\.22[4-9] 6
173\.199\.119\.2[3-5]\d 6
204\.188\.19[2-9]\.\d{1,3} 4
204\.188\.2\d\d\.\d{1,3} 4
216\.245\.19[2-9]\.\d{1,3} 5
216\.245\.2[01]\d\.\d{1,3} 5
216\.245\.22[0-3]\.\d{1,3} 5

For messages with an initial spam score greater than zero, the script matches the patterns against the Received header of a message in OnAcceptMessage. If there is a match, it adds the corresponding score to the message's cumulative spam score. After adding the scores, it uses the spam delete threshold already configured in hMailServer to determine whether to reject the message.

My complete spreadsheet lives here. You are welcome to use the script to access my spreadsheet. For now, I'm keeping the spreadsheet read-only for everyone except me. I may create a Google form where you can submit domains and IP addresses you think should be added to the list.

Below is the complete code for EventHandlers.js. It could use much more robust exception handling. You will have to replace HMAILSERVER_USER_NAME and HMAILSERVER_PASSWORD in the script with the login credentials for your hMailServer Administrator tool.

Code: Select all

function OnAcceptMessage(oClient, oMessage)
  {
  Result.Value = 0; // Accept the message by default.
  
  var oApp = new ActiveXObject("hMailServer.Application");
  oApp.Authenticate("<HMAILSERVER_USER_NAME>", "HMAILSERVER_PASSWORD");
  var SPAM_DELETE_TRESHOLD = oApp.Settings.AntiSpam.SpamDeleteThreshold;
  
  var spamScore = Number(oMessage.HeaderValue("X-hMailServer-Reason-Score"));
  if (spamScore > 0)
    {
    EventLog.Write("--- Message from " + oMessage.From + " to " + oMessage.To + " with initial spam score of " + spamScore + ". ---");
	
    var received = oMessage.HeaderValue("Received");

    var domainBlacklist = loadDomainBlacklist();
    spamScore += scoreBlacklist(oMessage, received, domainBlacklist, "domain");
	
    var ipBlacklist = loadIPBlacklist();
    spamScore += scoreBlacklist(oMessage, received, ipBlacklist, "IP");
	  
    EventLog.Write("Final spam score for message is " + spamScore + ".");
    oMessage.HeaderValue("X-hMailServer-Script-Reason-Score") = spamScore;
    oMessage.Save();  
    }
  
  if (spamScore >= SPAM_DELETE_TRESHOLD)
    {
    Result.Message = "Rejected by hMailServer spam detection script.";
    Result.Value = 2;
    EventLog.Write("Rejected message as spam.");
    }
  }
  
function scoreBlacklist(oMessage, received, blacklist, blacklistName)
  {
  var listScore = 0;
  
  var numServers = blacklist.length;
  EventLog.Write("Checking " + numServers + " server patterns in " + blacklistName + " blacklist.");
  for (var index = 0; index < numServers; index++)
    {
    var server = blacklist[index];
    var serverScore = matchPattern(received, server);
	listScore += serverScore;
	if (serverScore > 0)
	  oMessage.HeaderValue("X-hMailServer-Script-Reason-" + (index + 1)) = "Matched blacklisted server pattern '" + server.pattern + "' with a score of " + serverScore + ".";
    }
	
  return listScore;
  }
  
function loadDomainBlacklist()
  {
  var spreadsheetID = "1qy2cB0C0MvpqibE7KgZq3GQxuxMNWM2xQO26sE3CPC4";
  var gridID = "okp0ef5";
  var blacklist = loadBlacklist(spreadsheetID, gridID);

  return blacklist;
  }
  
function loadIPBlacklist()
  {
  var spreadsheetID = "1qy2cB0C0MvpqibE7KgZq3GQxuxMNWM2xQO26sE3CPC4";
  var gridID = "ofvnxkf";
  var blacklist = loadBlacklist(spreadsheetID, gridID);

  return blacklist;
  }
  
function loadBlacklist(spreadsheetID, gridID)
  {
  var baseURL = "https://spreadsheets.google.com/feeds/list/";
  var urlSuffix = "/public/values?alt=json";
  var url = baseURL + spreadsheetID + "/" + gridID  + urlSuffix;
  
  var contentType = "application/json";
  var responseText = httpGet(url, contentType);
  
  var responseJSON = eval("(" + responseText + ")");
  var blacklist = responseJSON.feed.entry;

  return blacklist;
  }
  
function httpGet(url, contentType)
  {
  var requestMethod = "GET";
  var httpRequest = new ActiveXObject("MSXML2.ServerXMLHTTP.6.0");
  httpRequest.open(requestMethod, url, false);
  if (contentType != null)
    httpRequest.setRequestHeader("Content-Type", contentType);
  httpRequest.send();

  return httpRequest.responseText;
  }
  
function matchPattern(received, server)
  {
  var score = 0;
  var pattern = new RegExp(server.gsx$pattern.$t);
  var matches = pattern.exec(received);
  if (matches != null)
    {
    var numMatches = matches.length;
    var match = matches[0];
    score = Number(server.gsx$score.$t);
    EventLog.Write("Server pattern '" + pattern + "' matched '" + match + "' with a score of " + score + ".");
    }
	
  return score;
  }

User avatar
SorenR
Senior user
Senior user
Posts: 3274
Joined: 2006-08-21 15:38
Location: Denmark

Re: Score Spam Based on Blacklists in Google Sheets

Post by SorenR » 2017-01-08 22:15

Nice !!

On a side note... JSON exists native to JScript from version 5.8, so your approach earlier was right on - EXCEPT, for some reason Microsoft told their JScript 5.8 engine to be JScript 5.7 compliant by default... Image

Anywho... This could point you in the right direction... https://msdn.microsoft.com/en-us/librar ... s.84).aspx

And...
Starting with JScript 5.8, by default, the JScript scripting engine supports the language feature set as it existed in version 5.7. This is to maintain compatibility with the earlier versions of the engine. To use the complete language feature set of version 5.8, the Windows Script interface host has to invoke IActiveScriptProperty::SetProperty.

Internet Explorer 8 opts into the JScript 5.8 language features when the document mode for Internet Explorer 8 is "Internet Explorer 8 Standards" mode. For other document modes, Internet Explorer uses the version 5.7 feature set.

JScript 5.8 includes native JavaScript Object Notation (JSON) support and the accessor methods for Document Object Model (DOM) prototypes.
https://msdn.microsoft.com/en-us/librar ... s.84).aspx
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Re: Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-01-08 23:12

SorenR wrote:Nice !!

On a side note... JSON exists native to JScript from version 5.8, so your approach earlier was right on - EXCEPT, for some reason Microsoft told their JScript 5.8 engine to be JScript 5.7 compliant by default... Image

Anywho... This could point you in the right direction... https://msdn.microsoft.com/en-us/librar ... s.84).aspx
Good catch! I'll see if setting a script property enables the version 5.8 capabilities.

In the meantime, I found a minor bug in my code.

Code: Select all

if (serverScore > 0)
  oMessage.HeaderValue("X-hMailServer-Script-Reason-" + (index + 1)) = "Matched blacklisted server pattern '" + server.pattern + "' with a score of " + serverScore + ".";
should instead be

Code: Select all

if (serverScore > 0)
  oMessage.HeaderValue("X-hMailServer-Script-Reason-" + (index + 1)) = "Matched blacklisted server pattern '" + server.gsx$pattern.$t + "' with a score of " + serverScore + ".";

User avatar
SorenR
Senior user
Senior user
Posts: 3274
Joined: 2006-08-21 15:38
Location: Denmark

Re: Score Spam Based on Blacklists in Google Sheets

Post by SorenR » 2017-01-08 23:30

The "domain" tab in the spreadsheet... Is that supposed to be a content filter in RegEx ??
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Re: Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-01-08 23:37

SorenR wrote:The "domain" tab in the spreadsheet... Is that supposed to be a content filter in RegEx ??
No. The Domain sheet contains domain name patterns to check against the Received header.

For example:

(\w+\.)+us(?:\b) 1

matches 'better.givenailtoecare.us' in this Received header:

Received: from better.givenailtoecare.us (sa1107.azar-a.net [91.219.239.156]) by xxxxxxx.org with ESMTP ; Sun, 8 Jan 2017 14:16:37 -0600

and, since there is a match, adds 1 to the spam score.

A content filter blacklist is certainly possible, though, and it could use almost exactly the same infrastructure!

User avatar
SorenR
Senior user
Senior user
Posts: 3274
Joined: 2006-08-21 15:38
Location: Denmark

Re: Score Spam Based on Blacklists in Google Sheets

Post by SorenR » 2017-01-08 23:50

rcauvin wrote:
SorenR wrote:The "domain" tab in the spreadsheet... Is that supposed to be a content filter in RegEx ??
No. The Domain sheet contains domain name patterns to check against the Received header.

For example:

(\w+\.)+us(?:\b) 1

matches 'better.givenailtoecare.us' in this Received header:

Received: from better.givenailtoecare.us (sa1107.azar-a.net [91.219.239.156]) by xxxxxxx.org with ESMTP ; Sun, 8 Jan 2017 14:16:37 -0600

and, since there is a match, adds 1 to the spam score.

A content filter blacklist is certainly possible, though, and it could use almost exactly the same infrastructure!
Well, it matches any .us domain.

"better.givenailtoecare.us" AKA oClient.HELO :mrgreen:

One thing tho... WhatIf a Backup-MX is used? You would have to find the next (untrusted) "Received:" header to read...
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Re: Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-01-09 00:12

SorenR wrote:One thing tho... WhatIf a Backup-MX is used? You would have to find the next (untrusted) "Received:" header to read...
Good point. Plenty of room for improvement and covering all the bases.

User avatar
SorenR
Senior user
Senior user
Posts: 3274
Joined: 2006-08-21 15:38
Location: Denmark

Re: Score Spam Based on Blacklists in Google Sheets

Post by SorenR » 2017-01-10 15:51

Just in case the attribute setting is not working ... :wink:

Code: Select all

    function includeFile (filename) {
        var fso = new ActiveXObject ("Scripting.FileSystemObject");
        var fileStream = fso.openTextFile (filename);
        var fileData = fileStream.readAll();
        fileStream.Close();
        eval(fileData);
    }

Code: Select all

includeFile("externalFile1.js");
includeFile("externalFile2.js");
includeFile("etc.js");
http://stackoverflow.com/questions/9282 ... -it-in-ano

JSON...
https://github.com/douglascrockford/JSON-js
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Re: Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-01-10 16:54

I updated the script and Email Spam Blacklist spreadsheet to include keyword matching against the incoming mail subject line. I added a "Keyword" tab to the spreadsheet that contains regular expression patterns to match against the subject line. You could also modify the script to match the keyword patterns against the body of the message, but for now I'm just matching against the subject.

The latest code for the script is below.

Code: Select all

function OnAcceptMessage(oClient, oMessage)
  {
  Result.Value = 0; // Accept the message by default.
  
  var oApp = new ActiveXObject("hMailServer.Application");
  oApp.Authenticate("<HMAILSERVER_USER_NAME>", "<HMAILSERVER_PASSWORD>");
  var SPAM_DELETE_TRESHOLD = oApp.Settings.AntiSpam.SpamDeleteThreshold;
  
  var spamScore = Number(oMessage.HeaderValue("X-hMailServer-Reason-Score"));
  if (spamScore > 0)
    {
    EventLog.Write("--- Message from " + oMessage.From + " to " + oMessage.To + " with initial spam score of " + spamScore + ". ---");
	
    var received = oMessage.HeaderValue("Received");

    var domainBlacklist = loadDomainBlacklist();
    spamScore += scoreBlacklist(oMessage, received, domainBlacklist, "domain");
	
    var ipBlacklist = loadIPBlacklist();
    spamScore += scoreBlacklist(oMessage, received, ipBlacklist, "IP");
	
    var subject = oMessage.Subject;
    var keywordBlacklist = loadKeywordBlacklist();
    spamScore += scoreBlacklist(oMessage, subject, keywordBlacklist, "subject");
	  
    EventLog.Write("Final spam score for message is " + spamScore + ".");
    oMessage.HeaderValue("X-hMailServer-Script-Reason-Score") = spamScore;
    oMessage.Save();
    }
  
  if (spamScore >= SPAM_DELETE_TRESHOLD)
    {
    Result.Message = "Rejected by hMailServer spam detection script.";
    Result.Value = 2;
    EventLog.Write("Rejected message as spam.");
    }
  }
  
function scoreBlacklist(oMessage, received, blacklist, blacklistName)
  {
  var listScore = 0;
  
  var numItems = blacklist.length;
  EventLog.Write("Checking " + numItems + " patterns in " + blacklistName + " blacklist.");
  for (var index = 0; index < numItems; index++)
    {
    var item = blacklist[index];
    var itemScore = matchPattern(received, item);
    listScore += itemScore;
    if (itemScore > 0)
      oMessage.HeaderValue("X-hMailServer-Script-Reason-" + (index + 1)) = "Matched pattern '" + item.gsx$pattern.$t + "' in " + blacklistName + " blacklist with score " + itemScore + ".";
    }
	
  return listScore;
  }
  
function loadDomainBlacklist()
  {
  var spreadsheetID = "1qy2cB0C0MvpqibE7KgZq3GQxuxMNWM2xQO26sE3CPC4";
  var gridID = "okp0ef5";
  var blacklist = loadBlacklist(spreadsheetID, gridID);

  return blacklist;
  }
  
function loadIPBlacklist()
  {
  var spreadsheetID = "1qy2cB0C0MvpqibE7KgZq3GQxuxMNWM2xQO26sE3CPC4";
  var gridID = "ofvnxkf";
  var blacklist = loadBlacklist(spreadsheetID, gridID);

  return blacklist;
  }
  
function loadKeywordBlacklist()
  {
  var spreadsheetID = "1qy2cB0C0MvpqibE7KgZq3GQxuxMNWM2xQO26sE3CPC4";
  var gridID = "oslxlge";
  var blacklist = loadBlacklist(spreadsheetID, gridID);

  return blacklist;
  }
  
function loadBlacklist(spreadsheetID, gridID)
  {
  var baseURL = "https://spreadsheets.google.com/feeds/list/";
  var urlSuffix = "/public/values?alt=json";
  var url = baseURL + spreadsheetID + "/" + gridID  + urlSuffix;
  
  var contentType = "application/json";
  var responseText = httpGet(url, contentType);
  
  var responseJSON = eval("(" + responseText + ")");
  var blacklist = responseJSON.feed.entry;

  return blacklist;
  }
  
function httpGet(url, contentType)
  {
  var requestMethod = "GET";
  var httpRequest = new ActiveXObject("MSXML2.ServerXMLHTTP.6.0");
  httpRequest.open(requestMethod, url, false);
  if (contentType != null)
    httpRequest.setRequestHeader("Content-Type", contentType);
  httpRequest.send();

  return httpRequest.responseText;
  }
  
function matchPattern(received, server)
  {
  var score = 0;
  var pattern = new RegExp(server.gsx$pattern.$t);
  var matches = pattern.exec(received);
  if (matches != null)
    {
    var numMatches = matches.length;
    var match = matches[0];
    score = Number(server.gsx$score.$t);
    EventLog.Write("Pattern '" + pattern + "' matched '" + match + "' with score " + score + ".");
    }
	
  return score;
  }

rcauvin
New user
New user
Posts: 15
Joined: 2017-01-02 16:39

Re: Score Spam Based on Blacklists in Google Sheets

Post by rcauvin » 2017-04-29 14:55

Just an update on the status of this project, some of the progress I've made, and some of the lessons I've learned. It has made a huge difference in the amount of spam I receive.

To review, there are three blacklists, each in a tab in a Email Spam Blacklist Google Sheet. All the blacklists define regular expressions to match against the email headers.

The Domain blacklist matches domains in the Received header, the IP blacklist matches IP addresses in the Received header, and the Keyword blacklist matches text in the Subject line.

I have personally curated each of the blacklists, researching every spam message to determine the IP address ranges, subject line patterns, and domain patterns of repeat offenders. I've assigned scores to each pattern, taking into consideration the source of the spam and the likelihood that legitimate email may also match the pattern.

At this moment, the spreadsheet contains 80 offending domain patterns, 469 offending IP address patterns, and 97 offending keyword patterns. It also contains hundreds of "candidate" IP addresses that aren't yet scored but that have sent exactly one spam message.

A few lessons learned:
  1. More than 90% of non-spam emails I receive are from trusted email address patterns that I can add to hMailServer's "White listing" and bypass spam checks for them completely.
  2. Since I use my ISP's DNS, many of the DNS blacklists I've added to my hMailServer configuration hit their rate limits at some point during the day and don't detect spam.
  3. For really bad repeat offenders that I never expect to send me legitimate emails, I add the IP address ranges to hMailServer's "Advanced > IP Ranges" configuration and block them completely. I've done so with about 128 IP address ranges so far.
  4. The zz.countries.nerd.dk DNS blacklist is great for detecting possible spam based on the country of origin. My hMailServer configuration assigns the lowest possible score (1) for emails from outside my country (USA), but an initial spam score of 1 is enough to trigger my script to perform its additional checks.
One of the reasons I thought a script based on blacklists in a Google Sheet would be helpful is that the sheet serves as a human-readable list that multiple people can curate, and that people could use without using the corresponding script. Let me know if you have any questions or want to help curate the list.

Post Reply