SpamAssassin Bootcamp (sa-learn) train BAYES

This section contains scripts that hMailServer has contributed with. hMailServer 5 is needed to use these.
User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2014-08-09 13:31

The success of SpamAssassin relies on a well trained Bayes database. There are many ways to train your Bayes database, this is my shot at doing it.
NB! wrote:How to obtain SpamAssassin and the installation and configuration of SpamAssassin is NOT described here! Search elsewhere on this forum to obtain this information.
The idea came from my play/toy MailServer (Postfix, Dovecut & MailScanner) on my Synology DS209+II NAS. There it is basically a no-brainer to set up as everything works off of MailDirs.

So how to do ... on a hMailServer 5.4.2-B1964 system ...

1 -- Build a script using the COM api to find and extract relevant emails and what could be more natural than to assume INBOX is good and SPAM is bad. Also, if HAM end up in SPAM (or visa versa) you move it to the respective folder (INBOX or SPAM) and at the next scheduled run, the email will be classified differently. I execute this script using Windows Schedule at 04:00 in the morning when everyone is (supposed to be) at sleep.

2 -- A global rule in hMailServer to move emails tagged as SPAM into a SPAM folder. Setting this rule as a global rule will ensure that ALL users of hMailServer are covered. If the SPAM folder do not exist, it will be created by hMailServer automatically.

Rule name: sa-learn

Code: Select all

Criteria -> Custom header field -> X-hMailServer-Spam = YES
Action -> Move to IMAP folder -> IMAP folder = SPAM
Some admins may want to monitor what is tagged as SPAM, like me, so I forward a copy to spam@my-domain.tld with this revised global rule.

Rule name: sa-learn (BigBrother version) :: Use AND

Code: Select all

Criteria -> Custom header field -> X-hMailServer-Spam = YES
Criteria -> Custom header field -> X-hMailServer-LoopCount < 1
Action -> Move to IMAP folder -> IMAP folder = SPAM
Action -> Forward email -> To = spam@my-domain.tld
"X-hMailServer-LoopCount" is used to prevent loops. By checking for value < 1 we make sure it is only run once. Thus SPAM will stay in the spam@my-domain.tld INBOX.

3 -- Now that we have both good and bad emails defined, we need to pull them off of the server. For that I have choosen VBScript to interact with the hMailServer COM API.
Functional Description: wrote: The VBScript (sa-learn.vbs) will work with ONE domain at present as I only have one domain. Using the COM API it will locate and process all account addresses for that domain - except those addresses listed in the exception list.

The script will do two passes, one for INBOX and one for SPAM, and generate two .cmd files (HAMCopy.cmd & SPAMCopy.cmd) to be run by the script.

During the two passes, the number of messages in the respective folders are checked and only the last (max.) 20 messages are processed. This procedure is based on the assumption that the COM API will return data sorted by table ID and from examining the database it appears that hMailServer simply adds new message ID's to the table in favor of reusing deleted message ID's.

HAMCopy.cmd and SPAMCopy.cmd simply copies the selected .eml files to a HAM or SPAM directory.

A third .cmd file (sa-learn.cmd) is also executed by the script and this .cmd file contains the commands to execute sa-learn --spam, sa-learn --ham, sa-learn --sync and sa-learn --backup as it is customary on Unix type systems.

On a Dual-Core 3GHz, 4GB RAM, SATA, System w/ Windows Server 2003R2 it took almost 20 minutes to process ~4.200 HAM and ~5.600 SPAM emails.
sa-learn.vbs

Code: Select all

Option Explicit
   '
   ' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
   '

   Dim hmAdmin, hmPassword, hmDomain, hmSPAMFolder, hmSPAMDir, hmHAMFolder, hmHAMDir, hmExcludeAddress
   Dim i, j, s, objApp, objDomain, objAccount, objIMAPFolder, objMessage
   Dim fsoSPAM, fsoHAM, fsoSALearn, objFSO, objSPAM, objHAM, objShell

   '
   ' Configuration parameters - BEGIN
   '
   hmAdmin = "Administrator"          ' hMailServer Administrator user
   hmPassword = "********"            ' hMailServer Administrator password
   hmDomain = "my-domain.tld"         ' Domain name
   hmSPAMFolder = "SPAM"              ' SPAM IMAP folder
   hmSPAMDir = "C:\hMailServer\SPAM"  ' You need to create this directory!
   hmHAMFolder = "INBOX"              ' HAM IMAP Folder
   hmHAMDir = "C:\hMailServer\HAM"    ' You need to create this directory!
   hmExcludeAddress = "spam@my-domain.tld, surveillance@my-domain.tld"

   fsoSPAM = "C:\hMailServer\Events\SPAMCopy.cmd"
   fsoHAM = "C:\hMailServer\Events\HAMCopy.cmd"
   fsoSALearn = "C:\hMailServer\Events\sa-learn.cmd"
   '
   ' Configuration parameters - END
   '

   Set objShell = WScript.CreateObject("WScript.Shell")
   Set objFSO = CreateObject("Scripting.FileSystemObject")
   Set objApp = CreateObject("hMailServer.Application")
   Call objApp.Authenticate(hmAdmin, hmPassword)
   Set objDomain = objApp.Domains.ItemByName(hmDomain)

   '
   ' Find SPAM messages
   '
   Set objSPAM = objFSO.CreateTextFile(fsoSPAM,True)
   For i = 0 to objDomain.Accounts.Count -1
      Set objAccount = objDomain.Accounts.Item(i)

      ' DO NOT process excluded and non-active accounts.
      If (NOT InStr(hmExcludeAddress, objAccount.Address)) * objAccount.Active Then

         Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmSPAMFolder)

         ' If no messages - skip
         If objIMAPFolder.Messages.Count > 0 Then

            s = 0
            If objIMAPFolder.Messages.Count - 20 > 0 Then s = objIMAPFolder.Messages.Count - 20

            For j = s to objIMAPFolder.Messages.Count -1
               Set objMessage = objIMAPFolder.Messages.Item(j)
               objSPAM.Write "COPY " & objMessage.FileName & " " & hmSPAMDir & " /Y" & vbCrLf
            Next
         End If
      End If
   Next
   objSPAM.Close

   '
   ' Find HAM messages
   '
   Set objHAM = objFSO.CreateTextFile(fsoHAM,True)
   For i = 0 to objDomain.Accounts.Count -1
      Set objAccount = objDomain.Accounts.Item(i)

      ' DO NOT process excluded and non-active accounts.
      If (NOT InStr(hmExcludeAddress, objAccount.Address)) * objAccount.Active Then

         Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmHAMFolder)

         ' If no messages - skip
         If objIMAPFolder.Messages.Count > 0 Then

            s = 0
            If objIMAPFolder.Messages.Count - 20 > 0 Then s = objIMAPFolder.Messages.Count - 20

            For j = s to objIMAPFolder.Messages.Count -1
               Set objMessage = objIMAPFolder.Messages.Item(j)
               objHAM.Write "COPY " & objMessage.FileName & " " & hmHAMDir & " /Y" & vbCrLf
            Next
         End If
      End If
   Next
   objHAM.Close

   '
   ' Execute file copy and sa-learn.exe - sequentially - no StdOut.
   '
   objShell.Run fsoSPAM, 0, true
   objShell.Run fsoHAM, 0, true
   objShell.Run fsoSALearn, 0, true
sa-learn.cmd

Code: Select all

C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM\*.eml"
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --ham "C:\hMailServer\HAM\*.eml"
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --sync
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --backup > "C:\Documents and Settings\Default User\.spamassassin\bayes_backup"
REM DELETE C:\hMailServer\SPAM\*.eml /Q
REM DELETE C:\hMailServer\HAM\*.eml /Q
Disclaimer: wrote: I take no responsibility for what you may or may not do with the above script/shell script. It works for me - it may not work for you. I DO NOT GUARANTEE THIS CODE TO BE BUG-FREE! USE AT YOUR OWN RISK! WHATEVER YOU DO - IT'S NOT MY FAULT!

AND remember; Real men do NOT backup - but they CRY a lot!

Please feel free to adopt and modify.
SørenR.

" I will initiate self-destruct. " — IG-11.

percepts
Senior user
Senior user
Posts: 5282
Joined: 2009-10-20 16:33
Location: Sceptred Isle

Re: SpamAssassin Bootcamp (sa-learn)

Post by percepts » 2014-08-09 15:24

How well does it work if all your users have moved all their INBOX email into their own IMAP folder structure before 4.00am :mrgreen:

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-08-09 15:36

percepts wrote:How well does it work if all your users have moved all their INBOX email into their own IMAP folder structure before 4.00am :mrgreen:
Once you have a solid foundation of HAM and SPAM to begin with, then I don't think a few HAM's really matter.
Anyway, it's not like all communication stop when the sun go down :mrgreen:

Image Image Image Image Image = email ...
SørenR.

" I will initiate self-destruct. " — IG-11.

percepts
Senior user
Senior user
Posts: 5282
Joined: 2009-10-20 16:33
Location: Sceptred Isle

Re: SpamAssassin Bootcamp (sa-learn)

Post by percepts » 2014-08-09 19:53

I think you need all the hams so that when a new rule is auto created it can be tested against them to be sure it doesn't mark them as spam. i.e. the ham is as important as the spam for sa-learn to produce reliable rules.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-08-09 20:35

percepts wrote:I think you need all the hams so that when a new rule is auto created it can be tested against them to be sure it doesn't mark them as spam. i.e. the ham is as important as the spam for sa-learn to produce reliable rules.
Well, I can trust my son and my wife not to move or delete emails frequently. 90% of the "first run" Bayes database I did yesterday came from them, collected over the last 7 years or so... :mrgreen: Allmost 8.000 emails close to 50:50.

I have about 3.500 SPAM mails in my "spam account" collected since March 2014 that I have not processed yet as they do contain false-positives. I esitmate that when I am done training SA I will probably have handled well over 10.000 emails, one way or another. Pretty good for a server with only 6 human accounts - and a few "machine" accounts. :mrgreen:

If I manage to train SA properly, Bayes hopefully will become more or less selfmaintaining or require very little maintenance. That will allow me to switch off some of the spam checks in hMailServer such as the HELO test, the DKIM test and some of the RBL checks.
SørenR.

" I will initiate self-destruct. " — IG-11.

percepts
Senior user
Senior user
Posts: 5282
Joined: 2009-10-20 16:33
Location: Sceptred Isle

Re: SpamAssassin Bootcamp (sa-learn)

Post by percepts » 2014-08-09 22:38

how often do you get false positives and false negatives anyway? Out of the box SA is pretty good. And what percentage of false negatives is really opted in mail where you and your users have given out your email address when you shouldn't have?

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-08-10 01:21

percepts wrote:how often do you get false positives and false negatives anyway? Out of the box SA is pretty good. And what percentage of false negatives is really opted in mail where you and your users have given out your email address when you shouldn't have?
Hard to say... Not sure my definition of SPAM is shared by SpamAssassin thus I have some pretty heavy blacklists on the server to capture the Danish language "Great Discount/Bargain" emails. ;-)

I probably have an 95% hit rate on "real" SPAM and less than 2% HAM. My blacklists are not perfect, (I even have whitelists for my blacklists) so HAM do get caught and I have told my family to move them into INBOX as part of the Bayes training.

Most of the opt-out emails - probably 95% - just appear out of the blue and for some of them you have to opt-out within 24 hours or the opt-out link go dead :evil:
I even get Swedish and Norwegian opt-out emails :roll: I know great parts of Sweden and Norway (and UK for that matter) used to be Danish - but it has to stop - that was a 1.000 years ago.

I would love to make an internal website listing all the opt-out emails we get so people can select or deselect what they want and then run an automatic opt-out procedure every time they come in... But as usual, something else comes in the way. :wink:

So far today, with 1 day of Bayes training, I tag more SPAM than "Blacklist" and if this trend continues I can begin minimizing my blacklists and hopefully see a decline in the false negatives.

I hardly ever give out my email but somehow during the last 3-4 months my weekly SPAM count went from 2-3 to 50+. Real SPAM, not the "we are sending you this because bla bla..." (in Danish) lame excuse...

From memory I have given out my email address to "VW Heritage - UK" and a danish appliances e-shop (new brushes for our washingmachine) in 2014 - the other places I do online shopping I have been a customer for years. I'm usually very carefull about who I give my email address to.

The rest of the family do get their fair share of "stuff" but also here the trend is that they are getting more "real" SPAM than before.
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-08-11 13:28

Modified sa-learn.cmd

Code: Select all

echo %DATE% %TIME% - START >> C:\hMailServer\Logs\sa-learn.log
echo SPAM: >> C:\HmailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM\*.eml" >> C:\hMailServer\Logs\sa-learn.log
echo HAM: >> C:\HmailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --ham "C:\hMailServer\HAM\*.eml" >> C:\hMailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --sync
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --backup > "C:\Documents and Settings\Default User\.spamassassin\bayes_backup"
del C:\hMailServer\SPAM\*.eml /Q
del C:\hMailServer\HAM\*.eml /Q
echo %DATE% %TIME% - STOP >> C:\hMailServer\Logs\sa-learn.log
This morning's harvest.. :mrgreen:

Code: Select all

11-08-2014  4:00:48,53 - START 
SPAM: 
Learned tokens from 20 message(s) (114 message(s) examined)
HAM: 
Learned tokens from 11 message(s) (120 message(s) examined)
11-08-2014  4:01:18,19 - STOP 
SørenR.

" I will initiate self-destruct. " — IG-11.

Kriztan
Normal user
Normal user
Posts: 39
Joined: 2010-03-17 16:12
Location: Germany

Re: SpamAssassin Bootcamp (sa-learn)

Post by Kriztan » 2014-09-03 10:38

Hi,
thanks for this nice script. On my Win Server 2008 R2 this script fails, with the following failure (see attached file).
Line 45 is

Code: Select all

Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmSPAMFolder)
Any ideas?

Thank Kriztan
Attachments
Unbenannt.png
Unbenannt.png (6.33 KiB) Viewed 46136 times

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-09-03 12:05

Kriztan wrote:Hi,
thanks for this nice script. On my Win Server 2008 R2 this script fails, with the following failure (see attached file).
Line 45 is

Code: Select all

Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmSPAMFolder)
Any ideas?

Thank Kriztan
A "subscript out of range" means you're trying to access an item that doesn't exist. Did you change "SPAM" in line 16 to your actual SPAM folder?

I did put "SPAM" in the script, but my actual SPAM folder is called "Junk E-mail" as it was initially created by Outlook :wink:
SørenR.

" I will initiate self-destruct. " — IG-11.

Kriztan
Normal user
Normal user
Posts: 39
Joined: 2010-03-17 16:12
Location: Germany

Re: SpamAssassin Bootcamp (sa-learn)

Post by Kriztan » 2014-09-03 12:23

SorenR wrote:Did you change "SPAM" in line 16 to your actual SPAM folder?
Yes I did, but I have some administrative accounts like info@domain.tld and there was the failure. Now I've created an IMAP folder called Junk and the scipt works fine.
Thanks

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-09-03 12:43

Kriztan wrote:
SorenR wrote:Did you change "SPAM" in line 16 to your actual SPAM folder?
Yes I did, but I have some administrative accounts like info@domain.tld and there was the failure. Now I've created an IMAP folder called Junk and the scipt works fine.
Thanks
Ah.. Yes... The script could do with some error checking :oops:
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-09-21 19:43

Revised script...

Code: Select all

Option Explicit
   '
   ' Version 0.2.0 30/8-2014, Soren Rathje - Selection changed to DAYS.
   '
   ' Configuration parameters
   '
   ' NB! SPAMFolder and HAMFolder MUST exist for every account to be processed.
   '     Administrative and automation accounts can be excluded from processing
   '     by defining them in "ListExclude"
   '
   Const Administrator = "Administrator"
   Const Secret        = "Secret"
   Const Domain        = "MyDomain.com"
   Const SPAMFolder    = "SPAM"
   Const HAMFolder     = "INBOX"
   Const ListExclude   = "postmaster@MyDomain.com, blog@MyDomain.com"
   Const SPAMdir       = "C:\hMailServer\SPAM"
   Const HAMdir        = "C:\hMailServer\HAM"
   Const SPAMCopy      = "C:\SpamAssassin\SPAMCopy.cmd"
   Const HAMCopy       = "C:\SpamAssassin\HAMCopy.cmd"
   Const SALearn       = "C:\SpamAssassin\sa-learn.cmd"
   Const RetainDays    = 30

   Sub MakeFileList(oDomain,sFolder,sDir,sFile,sExclude)
      Dim i, j, oFile, oAccount, oIMAPFolder, oMessage
      Set oFile = oFSO.CreateTextFile(sFile,True)
      For i = 0 to oDomain.Accounts.Count -1
         Set oAccount = oDomain.Accounts.Item(i)

         ' DO NOT process excluded and non-active accounts.
         If (NOT InStr(sExclude, oAccount.Address)) * oAccount.Active Then
            Set oIMAPFolder = oAccount.IMAPFolders.ItemByName(sFolder)

            ' If no messages - skip
            If oIMAPFolder.Messages.Count > 0 Then
               For j = 0 to oIMAPFolder.Messages.Count -1
                  Set oMessage = oIMAPFolder.Messages.Item(j)
                  If oMessage.InternalDate > dRetainDate Then
                     oFile.Write "COPY " & oMessage.FileName & " " & sDir & " /Y" & vbCrLf
                  End If
               Next
            End If
         End If
      Next
      oFile.Close
   End Sub

   '
   ' Define variables/objects
   '
   Dim oApp, oDomain, oShell, oFSO
   Dim dRetainDate

   '
   ' Initialize environment
   '
   Set oShell = WScript.CreateObject("WScript.Shell")
   Set oFSO = CreateObject("Scripting.FileSystemObject")
   Set oApp = CreateObject("hMailServer.Application")
   Call oApp.Authenticate(Administrator, Secret)
   Set oDomain = oApp.Domains.ItemByName(Domain)

   '
   ' Start from this date...
   '
   dRetainDate = CDate(Now - RetainDays)

   '
   ' Find SPAM messages
   '
   Call MakeFileList(oDomain,SPAMFolder,SPAMdir,SPAMCopy,ListExclude)

   '
   ' Find HAM messages
   '
   Call MakeFileList(oDomain,HAMFolder,HAMdir,HAMCopy,ListExclude)

   '
   ' Execute file copy and sa-learn.exe - sequentially - no StdOut.
   '
   oShell.Run SPAMCopy, 0, true
   oShell.Run HAMCopy, 0, true
   oShell.Run SALearn, 0, true
   
SørenR.

" I will initiate self-destruct. " — IG-11.

saschadd
New user
New user
Posts: 13
Joined: 2014-07-16 14:58

Re: SpamAssassin Bootcamp (sa-learn)

Post by saschadd » 2014-10-27 22:16

Hi SorenR,
this script looks very helpful and i would like to try it.
But at the moment i am a bit stuck as i cant find the files needed
C:\SpamAssassin\SPAMCopy.cmd
C:\SpamAssassin\HAMCopy.cmd

Maybe i installed the wrong version of spamassassin?!
I installed the version from jamsoftware but just found the one on sourceforge which is not up to date regarding the version.

Maybe you could tell me which version you are using?

At the moment it feels like my spamassassin doesnt do anything at all. ;)
Therefore i have to train it. ;)

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-10-28 01:02

Version 0.4.0 of sa-learn.vbs
Attachments
sa-learn.vbs.rar
(1.44 KiB) Downloaded 484 times
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-10-28 01:04

saschadd wrote:Hi SorenR,
this script looks very helpful and i would like to try it.
But at the moment i am a bit stuck as i cant find the files needed
C:\SpamAssassin\SPAMCopy.cmd
C:\SpamAssassin\HAMCopy.cmd

Maybe i installed the wrong version of spamassassin?!
I installed the version from jamsoftware but just found the one on sourceforge which is not up to date regarding the version.

Maybe you could tell me which version you are using?

At the moment it feels like my spamassassin doesnt do anything at all. ;)
Therefore i have to train it. ;)
My script will build the two files every time you run the script. :wink:
SørenR.

" I will initiate self-destruct. " — IG-11.

saschadd
New user
New user
Posts: 13
Joined: 2014-07-16 14:58

Re: SpamAssassin Bootcamp (sa-learn)

Post by saschadd » 2014-10-28 09:05

SorenR wrote: My script will build the two files every time you run the script. :wink:
:lol: This was to easy! ;)

Well one more noobish question... how do i start this script?
In hmailserver there is the possibility to enter scripts so do i copy the script to EventHandlers.vbs?
And when will the script be run then?

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-10-28 12:40

saschadd wrote:
SorenR wrote: My script will build the two files every time you run the script. :wink:
:lol: This was to easy! ;)

Well one more noobish question... how do i start this script?
In hmailserver there is the possibility to enter scripts so do i copy the script to EventHandlers.vbs?
And when will the script be run then?
The script "sa-learn.vbs" is run by Windows Scheduler as user SYSTEM. I run mine every day at 04:00...
SørenR.

" I will initiate self-destruct. " — IG-11.

saschadd
New user
New user
Posts: 13
Joined: 2014-07-16 14:58

Re: SpamAssassin Bootcamp (sa-learn)

Post by saschadd » 2014-10-28 20:34

Hi Soren,
sorry for reasking but i am still stuck because now i cant find the folder

.spamassassin\bayes

which might be a problem of different versions of spamassassin.
Could you please tell me which version from where you are using?

The one i have installed is possibly a 64 bit version as i am able to find an .spamassassin folder under
"C:\Windows\SysWOW64\config\systemprofile\.spamassassin"

There is another .spamassassin folder under an user account but this one is empty, there is no bayes folder in it.

i am a bit confused :shock:

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-10-28 22:22

saschadd wrote:Hi Soren,
sorry for reasking but i am still stuck because now i cant find the folder

.spamassassin\bayes

which might be a problem of different versions of spamassassin.
Could you please tell me which version from where you are using?

The one i have installed is possibly a 64 bit version as i am able to find an .spamassassin folder under
"C:\Windows\SysWOW64\config\systemprofile\.spamassassin"

There is another .spamassassin folder under an user account but this one is empty, there is no bayes folder in it.

i am a bit confused :shock:
The Bayes database is created and maintained by SA-Learn.exe when it reads and classifies the emails (HAM or SPAM).
Usually the Bayes database is located in %USERPROFILE%\.spamassassin (~/.spamassassin in Unix terms)

When I run the script (unattended via Scheduler) the Bayes database is stored in "C:\Documents and Settings\Default User\.spamassassin" - I have a Windows 2003 R2 Server.

You can overrule this location in "sa-learn.cmd" but then you also have to specify "bayes_path c:\whatever\.spamassassin" in the SpamAssassin configuration (local.cf)
SørenR.

" I will initiate self-destruct. " — IG-11.

blakito
Normal user
Normal user
Posts: 30
Joined: 2010-09-15 14:27

Re: SpamAssassin Bootcamp (sa-learn)

Post by blakito » 2014-10-29 17:11

Hi SorenR, congrats and thanks for the great work! It will save me a lot of time.

I'm not a programmer, but since the RetainDays wasn't working correcly for me, i did a little workaround.

Code: Select all

Where
                   If oMessage.InternalDate > dRetainDate Then
Replace
                   If DateDiff("y",oMessage.InternalDate,dDateToday) <= RetainDays Then

Where
   Dim dRetainDate, EventLog, RetCode
Replace
   Dim dDateToday, EventLog, RetCode

Where
   dRetainDate = CDate(Now - RetainDays)
Replace
   dDateToday = CDate(Now)
I also changed the ListExclude to work as ListInclude, since i don't trust every user to train my filter, and updated sa-learn.cmd to work with JAM Software Spam Assassin Box, which uses spamc to train instead of sa-learn.

Best regards

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-10-29 18:49

blakito wrote:Hi SorenR, congrats and thanks for the great work! It will save me a lot of time.

I'm not a programmer, but since the RetainDays wasn't working correcly for me, i did a little workaround.

Code: Select all

Where
                   If oMessage.InternalDate > dRetainDate Then
Replace
                   If DateDiff("y",oMessage.InternalDate,dDateToday) <= RetainDays Then

Where
   Dim dRetainDate, EventLog, RetCode
Replace
   Dim dDateToday, EventLog, RetCode

Where
   dRetainDate = CDate(Now - RetainDays)
Replace
   dDateToday = CDate(Now)
I also changed the ListExclude to work as ListInclude, since i don't trust every user to train my filter, and updated sa-learn.cmd to work with JAM Software Spam Assassin Box, which uses spamc to train instead of sa-learn.

Best regards
Well, my code is not perfect as I'm not a programmer by profession but I learn as I go :mrgreen:

It's a work in progress and primarily based on my own needs :wink:
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2014-11-25 22:08

Version 0.4.1 - Fixed some compatibility issues..

Temporary HAM and SPAM folders are now located under C:\SpamAssassin
Execute HAMCopy and SPAMCopy now done with "cmd /C ...." to fix compatibility issues with Windows Server.

Error logging is done in hMailServer EventLog.
Attachments
sa-learn.0.4.1.rar
(1.77 KiB) Downloaded 503 times
SørenR.

" I will initiate self-destruct. " — IG-11.

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2014-11-26 04:23

Just make a SPAM folder for each account manually. Looks like it runs without error. I also use the latest version of Soren's script but how can I tell everything does what it supposes to do?

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-03-14 13:19

Just noticed I never responded to the last post.

If we assume this configuration

Code: Select all

   Const SPAMdir       = "C:\SpamAssassin\temp\SPAM"
   Const HAMdir        = "C:\SpamAssassin\temp\HAM"
   Const SPAMCopy      = "C:\SpamAssassin\temp\SPAMCopy.cmd"
   Const HAMCopy       = "C:\SpamAssassin\temp\HAMCopy.cmd"
   Const SALearn       = "C:\SpamAssassin\sa-learn.cmd"
then in directory C:\SpamAssassin\temp we should have two batch files. HAMCopy.cmd and SPAMCopy.cmd, the contents of these files should look similar to this and they are created when the script run...

Code: Select all

COPY C:\hMailServer\Data\lolle.org\benjamin\85\{85E86914-C145-416C-8C7B-B6E3AB36CF08}.eml C:\SpamAssassin\temp\HAM /Y
COPY C:\hMailServer\Data\lolle.org\benjamin\71\{719FC286-413E-4611-B8D4-229F312A8DE8}.eml C:\SpamAssassin\temp\HAM /Y
COPY C:\hMailServer\Data\lolle.org\benjamin\27\{27D5B9CB-24E5-49A1-BFDD-1A7B06164061}.eml C:\SpamAssassin\temp\HAM /Y
and...

Code: Select all

COPY C:\hMailServer\Data\lolle.org\benjamin\4C\{4C47C9FC-3CA0-4B19-9AD1-02E7CF69D666}.eml C:\SpamAssassin\temp\SPAM /Y
COPY C:\hMailServer\Data\lolle.org\benjamin\74\{74456571-5425-4691-8C3E-1936AC7844F7}.eml C:\SpamAssassin\temp\SPAM /Y
COPY C:\hMailServer\Data\lolle.org\benjamin\DD\{DD988FF8-6DA6-475D-A764-A88083688DAB}.eml C:\SpamAssassin\temp\SPAM /Y
These are the message files that will be copied into C:\SpamAssassin\temp\HAM and C:\SpamAssassin\temp\SPAM.

When the script execute C:\SpamAssassin\sa-learn.cmd, SA-LEARN will process both directories and store token information into the bayesian data, syncronise the database and the journal data and backup the database.

SA-LEARN will also echo console output into C:\SpamAssassin\logs\sa-learn.log...

Code: Select all

13-03-2015  4:00:31,69 - START 
SPAM: 
Learned tokens from 25 message(s) (244 message(s) examined)
HAM: 
Learned tokens from 31 message(s) (257 message(s) examined)
13-03-2015  4:01:09,91 - STOP 
Both directories (C:\SpamAssassin\temp\HAM and C:\SpamAssassin\temp\SPAM) are cleared after use.
SørenR.

" I will initiate self-destruct. " — IG-11.

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-03-15 05:44

Having an issue with the spamcopy and hamcopy not running. Traced it to a path issue with the copy command. my hmailserver is installed in programfiles etc which has a space.

This ...
COPY C:\Program Files (x86)\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)

Should look like this (which works)
COPY "C:\Program Files (x86)"\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)

I have no idea how to add the "" in the script. any ideas appreciated.

TIA

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-03-15 09:03

DART wrote: This ...
COPY C:\Program Files (x86)\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)

Should look like this (which works)
COPY "C:\Program Files (x86)"\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)
Amazing what a little googling will do!

changed line 52 from this
oFile.Write "COPY " & oMessage.FileName & " " & sDir & " /Y" & vbCrLf

to this
oFile.Write "COPY " & """" & oMessage.FileName & """" & " " & sDir & " /Y" & vbCrLf

problem solved

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-03-15 13:43

OR....

use Chr(34) where you want the speechmark to appear (instead of """")

eg,

Code: Select all

oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & sDir & " /Y" & vbCrLf

FYI.
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-03-15 13:45

DART wrote:
DART wrote: This ...
COPY C:\Program Files (x86)\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)

Should look like this (which works)
COPY "C:\Program Files (x86)"\hMailServer\Data\mydomain.com\kerrie\CD\ (snip)
Amazing what a little googling will do!

changed line 52 from this
oFile.Write "COPY " & oMessage.FileName & " " & sDir & " /Y" & vbCrLf

to this
oFile.Write "COPY " & """" & oMessage.FileName & """" & " " & sDir & " /Y" & vbCrLf

problem solved
Ah yes... The Microsoft Curse... Spaces in paths and filenames :roll:

It has been a problem for as long as I can remember... IIRC I noticed it for the first time with Windows 3.11 or was it Windows for Workgroups? :evil:
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-03-15 16:42

SorenR wrote: with Windows 3.11 or was it Windows for Workgroups?
Erm... If I remember rightly, that is one in the same. WfW was 3.11.

Just saying, but I am very PROBABLY wrong. :roll:
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-03-15 22:28

jimimaseye wrote:
SorenR wrote: with Windows 3.11 or was it Windows for Workgroups?
Erm... If I remember rightly, that is one in the same. WfW was 3.11.

Just saying, but I am very PROBABLY wrong. :roll:
Nope... Windows 3.11 was a bugfix for 3.1 and WfW 3.11 the first to require a 386 processor.

Oh man... I'm getting old. ;-)
SørenR.

" I will initiate self-destruct. " — IG-11.

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-03-16 02:28

SorenR wrote: Oh man... I'm getting old. ;-)
we could chat about running BBS's under desqview if you want! :D :D :D I feel your pain :shock:


I am using the Jam software spamassassin windows version and modified the sa-learn.cmd to suit. The reporting functions dont work.

Code: Select all

@echo off
:: Modified spam learning script
:: spamd service must be run with --allow-tell switch
:: c:\spamassassin\logs dir must exist
::
:: Learning spam
set TRAINTYPE=spam
set FOLDER=c:\spamassassin\spam\

echo %DATE% - %TIME% - Learning %TRAINTYPE%  >> c:\spamassassin\logs\trainspam.log
for %%X in ("%FOLDER%*") do spamc -L %TRAINTYPE% < "%%X"
spamc -r >> c:\spamassassin\logs\trainspam.log


:: Learning HAM
set TRAINTYPE=ham
set FOLDER=c:\spamassassin\ham\

echo %DATE% - %TIME% - Learning %TRAINTYPE%  >> c:\spamassassin\logs\trainspam.log
for %%X in ("%FOLDER%*") do spamc -L %TRAINTYPE% < "%%X"
spamc -r >> c:\spamassassin\logs\trainspam.log

:: Remove all messages from spam/ham directories
del C:\SpamAssassin\SPAM\*.eml /Q
del C:\SpamAssassin\HAM\*.eml /Q
echo %DATE% %TIME% - Finished >> C:\SpamAssassin\logs\trainspam.log
echo . >> c:\spamassassin\logs\trainspam.log

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-03-16 12:23

DART wrote:
SorenR wrote: Oh man... I'm getting old. ;-)
we could chat about running BBS's under desqview if you want! :D :D :D I feel your pain :shock:


I am using the Jam software spamassassin windows version and modified the sa-learn.cmd to suit. The reporting functions dont work.

Code: Select all

@echo off
:: Modified spam learning script
:: spamd service must be run with --allow-tell switch
:: c:\spamassassin\logs dir must exist
::
:: Learning spam
set TRAINTYPE=spam
set FOLDER=c:\spamassassin\spam\

...
...
That is one way of doing it.

With "spamc" you have to submit every single mail to SpamAssassin for processing where "sa-learn" will simply read a whole directory at once.

You will be "sharing" the SpamAssin Daemon with hMailServer and if your training overload SpamAssassin then hMailServer cannot do inbound mail scoring.
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-12 11:04

DART wrote:
I am using the Jam software spamassassin windows version and modified the sa-learn.cmd to suit. The reporting functions dont work.
I dont understand why you needed to change this? The SA-LEARN command is included with the Jam version (is there anyone using Spamassassin For Windows that ISNT a Jam version? :) ) I suspect that if you didnt get it working then it was only a case of file paths: in particular changing
this

Code: Select all

C:\Documents and Settings\Default User\.spamassassin\bayes
to this

Code: Select all

C:\Documents and Settings\Administrator\.spamassassin\bayes
where 'Administrator' is the high level user being used to perform your Spamassassin checks. Note the above is for old XP notation. Modern such as win7, "Server" OS etc use the directory path format of

Code: Select all

C:\Users\Administrator\.spamassassin\bayes
(there is no "Default User" in my Server 2008 or on my Win7 client (although in Win7 there is a "default" and this is likely to cause the problem.

However, I am not that sure the --DBPATH parameter is relevant/required if you run it as the same user as you installed and run your Spamassassin mail checks. In the directory C:\Users\Administrator\.spamassassin there is the existing bayes db that is created at installation time and is the default DB that will be used in the absence of the above 'override' path and is the one referred to by default (assuming your SA checking is under the same user as the installation user)

I just ran this command on my Server 2008 OS with Jam software and it all worked fine and populated the tokens database in my C:\Documents and Settings\Administrator\.spamassassin (default) folder.

Code: Select all

sa-learn.exe --ham "D:\spam\*.eml"
so your ('Dart') workround shouldnt be necessary (resorting/remaining with Sorens original suggestion)
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-05-12 13:03

Paths have changed in the latest version of spamassassin, its all described in the above thread. It was an issue with spaces in the path, even Soren responded in agreement.

Yes, you could adjust the paths during install but most people wont as it is the tyranny of the default and you don't realise its a problem until after the install completes.

cheers

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-12 16:16

I saw the references to path directories containing spaces in the '*copy.cmd' files, I wasnt referring to that. I was referring to your comment:
DART wrote: I am using the Jam software spamassassin windows version and modified the sa-learn.cmd to suit. The reporting functions dont work.

Code: Select all

@echo off
:: Modified spam learning script
:: spamd service must be run with --allow-tell switch
:: c:\spamassassin\logs dir must exist
::
:: Learning spam
set TRAINTYPE=spam
set FOLDER=c:\spamassassin\spam\

echo %DATE% - %TIME% - Learning %TRAINTYPE%  >> c:\spamassassin\logs\trainspam.log
for %%X in ("%FOLDER%*") do spamc -L %TRAINTYPE% < "%%X"
spamc -r >> c:\spamassassin\logs\trainspam.log

:: Learning HAM
set TRAINTYPE=ham
set FOLDER=c:\spamassassin\ham\

echo %DATE% - %TIME% - Learning %TRAINTYPE%  >> c:\spamassassin\logs\trainspam.log
for %%X in ("%FOLDER%*") do spamc -L %TRAINTYPE% < "%%X"
spamc -r >> c:\spamassassin\logs\trainspam.log
...
...
where you changed from using SA-LEARN to using SPAMC instead. (at least, that is how it reads to me)
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-05-13 01:27

jimimaseye wrote:I saw the references to path directories containing spaces in the '*copy.cmd' files, I wasnt referring to that. I was referring to your comment:

where you changed from using SA-LEARN to using SPAMC instead. (at least, that is how it reads to me)
Oh ok, I see what you mean. Yes, that was a conscious decision as spamc talks directly to spamd (service / daemon) whereas sa-learn is more of a user based approach.

see Jam software docs.
A: The SpamAssassin Bayes filter can be trained either using the sa-learn.exe or the spamc.exe. The main difference on a Windows system is that sa-learn will run under the currents user credentials while spamc will pass the mails that shall be trained to spamd (the SpamAssassin Daemon) which then trains the mail under the user credentials of spamd. This is especially important if you run SpamD for spam filtering under a separate user account, e.g. the Windows system user account (which is the default when using SpamAssassin in a Box for example). In this case you have to use spamc for training, as using sa-learn would just train the Bayes database for the local user.
On my win7 system, even if no one logs into the console, everything still runs. According to the above, that might not be the case if I had used sa-learn. Also a daemon is usually more efficient.

cheers

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-13 11:03

Hmmm.... ok, understood.

I am now left wondering.....

....so if the SA-LEARN method (as stated by Soren,) is using a DB in a specified path ("--dbpath" parameter) and that path quoted points to DEFAULT user, and 'Server 2008' OS doesnt have such a path (as I identified earlier), then what happens on these Server OS's ? As the SPAMD server is ran as a SYSTEM user (usually) I would imagine under systems like Win7 that do have the DEFAULT USER directory then refer to the same database....which in turn makes me think (if it is true) that under these OS's your method is not required here yet it is required for Server OS's.

Does this make sense?

In other words, my question is:

Code: Select all

                        XP/Win7         Server 2008
                        ^^^^^^^         ^^^^^^^^^^^

Has 'DEFAULT user'        Yes              No
    DB path?

SPAMD run as              Yes              Yes
System Process

Will refer SA-LEARN
to Default User DB        Yes              No - so where is the Bayes DB stored?

Will refer SPAMC -L
to current SPAMD          Yes              Yes - where is THIS Bayes DB stored?
Bayes DB
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-05-13 14:31

jimimaseye wrote:Hmmm.... ok, understood.

I am now left wondering.....

In other words, my question is:

Code: Select all

                        XP/Win7         Server 2008
                        ^^^^^^^         ^^^^^^^^^^^

Has 'DEFAULT user'        Yes              No
    DB path?

SPAMD run as              Yes              Yes
System Process

Will refer SA-LEARN
to Default User DB        Yes              No - so where is the Bayes DB stored?

Will refer SPAMC -L
to current SPAMD          Yes              Yes - where is THIS Bayes DB stored?
Bayes DB
Searching the Win7 filesystem, I found the bayes database in "C:\Windows\SysWOW64\config\systemprofile\.spamassassin"

Not sure where server 2008 OS puts it.

I should also explain that the reason I put spamassassin in c:\spamassassin is for my own reasons, nobody has to do this if they don't want to. Sometimes, dos / command line based programs dont like long windows paths and I have been bitten by this before and it is habit for me now.

I also hand off all spam scoring to spamassassin, I dont use HMS SPAM tools at all.

cheers

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-13 16:53

Good find, thanks for that. I have just hecked and yes the tokens database is residing in C:\Windows\SysWOW64\config\systemprofile\.spamassassin

From seeing its size and modified time it led me to then find that in my ignorance and naivety that this bayes database was ALREADY being updated everytime a spam check is performed. So I checked ... and see that bayes scoring/database updating is ENABLED BY DEFAULT (# Use Bayesian classifier "use_bayes" and # Bayesian classifier auto-learning "bayes_auto_learn " are not overridden in the local.cf and therefore both have a default: 1 (enabled).

So, I checked, and at the command line, using default administrator account, I was able to run and update that same database

Code: Select all

sa-learn.exe --dbpath "C:\Windows\SysWOW64\config\systemprofile\.spamassassin\bayes" --ham "C:\path etc\*.eml"
So to conclude my original thoughts and further ask:

1, @DART: is it really necessary to use the modified version of running SPAMC -L, when the original suggestion of using SA-LEARN can do the job without loading the SPAMD process (thereby avoiding the delays on the service for realtime incoming scans) and also
2, @Soren: unless the above setting "bayes_auto_learn", (above) has been set to Zero/turned off, is it necessary to run this 'training' daily/regularly with scheduler (which seems to be the point of the whole script in the first place)? It seems that yes, you run the training initially to build the database but then, as I found out above, the tokens database then continually gets updated AT THE TIME it comes in (so then running the training again on the same messages will result in unnecessary duplication checking).
3, @soren: is it worth noting or modifying the 'sa-learn' command in your original tutorial to reflect the above findings in order that it does apply the original training results to the default bayes database that is then used by the Spamd service?
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-05-13 22:40

jimimaseye wrote:2, @Soren: unless the above setting "bayes_auto_learn", (above) has been set to Zero/turned off, is it necessary to run this 'training' daily/regularly with scheduler (which seems to be the point of the whole script in the first place)? It seems that yes, you run the training initially to build the database but then, as I found out above, the tokens database then continually gets updated AT THE TIME it comes in (so then running the training again on the same messages will result in unnecessary duplication checking).
3, @soren: is it worth noting or modifying the 'sa-learn' command in your original tutorial to reflect the above findings in order that it does apply the original training results to the default bayes database that is then used by the Spamd service?
2... The same message is not really learned twice, it may be learned differently the second time, should there be a HAM/SPAM message to counter-act what was learned earlier. SpamAssassin generally knows which messages it has looked in... ;-)

The initial learning is done with sufficient messages to form a trustworthy base. As time passes, SpamAssassin WILL clean out old entries from the Bayesian database, so it is normal for the database to "grow smaller" over time.

The purpose of learning new messages every day is to optimize the statistical data and to adapt to "seasons greetings".

3... hMailServer do not support user specific Bayesian databases thus the database has to be stored under the default user profile - or - scripts/commands edited to match a common location for the database by adding options to commandline and in the local.cf file based on the location dictated by the spamd daemon.

The only benefit I see using SPAMC is that you can have your SpamAssassin on a different box (Linux or Windows).
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-13 22:57

Understood about point 2, but can I ask further on point 3....
SorenR wrote:hMailServer do not support user specific Bayesian databases
I dont under what any of this has to do with Hmailserver. Isnt this all about configuration of Spamassassin only and has no relevance to HMS
SorenR wrote:.....thus the database has to be stored under the default user profile
Yeah, it was this that I was suggesting earlier would need modifying (or even omitting) in the case of Win7 and Server OS's ie, 'DEFAULT USER' doesnt exist on Server OS but in all cases the default SPAMD installation is maintaining the bayes DB under C:\Windows\SysWOW64\config\systemprofile\.spamassassin. It is for this I was asking about relevance of changing your script to reflect this (as it currently seems to suggest saving the token DB in a directory thats doesnt exist and that the DEFAULT Spamd service doesnt refer to.

Can you clarify for me please?
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

DART
New user
New user
Posts: 29
Joined: 2015-03-04 10:07

Re: SpamAssassin Bootcamp (sa-learn)

Post by DART » 2015-05-13 23:37

jimimaseye wrote:
1, @DART: is it really necessary to use the modified version of running SPAMC -L, when the original suggestion of using SA-LEARN can do the job without loading the SPAMD process (thereby avoiding the delays on the service for realtime incoming scans) and also
To be honest, I am not sure I can answer that for sure. It would require a discussion between the HMS dev, the Spamassassin devs and Soren as well as alot of testing, and even then, they may not all agree! :)

What I can say for sure is that going back to application and network fundamentals ...

- Using spamassasin in general is going to introduce delays in realtime scanning no matter using SPAMD or SA-learn. HMS has to hand off to SA and wait for it to finish and then respond. I know nothing about how HMS (or SA) works internally but watching the logs, it seems to take note of this and continues processing other mail while SA is "thinking" and then resumes the handling of that particular piece of mail once SA reports back to HMS.

- In general, it is more desirable to use a daemon or service process than a userspace one. (Spamd over SAlean). They are often developed differently and more efficient in terms of memory and CPU usage, they (daemon) may not always be faster.

- The same type of (delay) issue would occur for any AV scanning as well if using an external program.

- These are "acceptable" delays and generally don't cause issues as the developers understand that this has to happen and "program" around the application behaviour.

- My system gets about 3k mails per day with half a dozen domains and about 1k of those are spam and about 50 viruses. On the same box, I run about 4 or 5 low volume websites all on a 2.8ghz 64 bit VM with 4gb ram and it all runs like a purring kitten

The only time the cpu goes over about 10% is when running Soren's most excellent BAYES learning script but the CPU issue is file copy I/O and nothing else but that only last for a few minutes anyway.

cheers

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-05-14 00:05

jimimaseye wrote:
SorenR wrote:hMailServer do not support user specific Bayesian databases
I dont under what any of this has to do with Hmailserver. Isnt this all about configuration of Spamassassin only and has no relevance to HMS
SorenR wrote:.....thus the database has to be stored under the default user profile
Yeah, it was this that I was suggesting earlier would need modifying (or even omitting) in the case of Win7 and Server OS's ie, 'DEFAULT USER' doesnt exist on Server OS but in all cases the default SPAMD installation is maintaining the bayes DB under C:\Windows\SysWOW64\config\systemprofile\.spamassassin. It is for this I was asking about relevance of changing your script to reflect this (as it currently seems to suggest saving the token DB in a directory thats doesnt exist and that the DEFAULT Spamd service doesnt refer to.

Can you clarify for me please?
By default SpamAssassin will store data under the profile of the Windows user running the process/command/daemon...

local.cf:
# bayes_path /path/filename (default: ~/.spamassassin/bayes)
should provide a means to override this. Do NOT forget to assign proper RWX permissions.
bayes_path /path/filename (default: ~/.spamassassin/bayes)
This is the directory and filename for Bayes databases. Several databases will be created, with this as the base directory and filename, with _toks, _seen, etc. appended to the base. The default setting results in files called ~/.spamassassin/bayes_seen, ~/.spamassassin/bayes_toks, etc.

By default, each user has their own in their ~/.spamassassin directory with mode 0700/0600. For system-wide SpamAssassin use, you may want to reduce disk space usage by sharing this across all users. However, Bayes appears to be more effective with individual user databases.
PS.. Linux "~" is equal to Windows "%HOMEPATH%"

I'm assuming that by adding "bayes_path" to "local.cf" it is no longer needed to use the "--dbpath" parameter... Tests will show that.
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-05-14 00:23

jimimaseye wrote:
SorenR wrote:hMailServer do not support user specific Bayesian databases
I dont under what any of this has to do with Hmailserver. Isnt this all about configuration of Spamassassin only and has no relevance to HMS.
Well... Yes... And No...

hMail is "prepared" (need some more code but info is in the code as remarks) for per-user scanning in that it can send a username along with the email to the SpamAssassin Daemon for scanning with this specific user's preferences.... It really only makes sense with file based Bayesian databases under Postfix/Dovecut (Linux) - the proper solution is to have SpamAssassin store the Bayesian data in a SQL database - and this is where SpamAssassin become A LOT more complicated :mrgreen:
SørenR.

" I will initiate self-destruct. " — IG-11.

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-05-14 00:24

SorenR wrote: By default SpamAssassin will store data under the profile of the Windows user running the process/command/daemon...

local.cf:
# bayes_path /path/filename (default: ~/.spamassassin/bayes)
should provide a means to override this. Do NOT forget to assign proper RWX permissions.
That explains it and fits. C:\Windows\SysWOW64\config\systemprofile\ is the default profile for the SYSTEM account that Spamd is running again and as it has not been changed in LOCAL.Cf, it applies the database there. So it seems that quite simply you do not need to enter the --DBPATH parameter in SA-LEARN (as it will pick up from whatever the SpamD service has already been configured (or not!) to use).

To be sure I ask these questions just trying to simply things by removing possibly unnecessary paramters and/or opportunities for failures due to OS platform inconsistencies.
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

User avatar
SorenR
Senior user
Senior user
Posts: 3837
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2015-05-14 00:31

jimimaseye wrote:
SorenR wrote: By default SpamAssassin will store data under the profile of the Windows user running the process/command/daemon...

local.cf:
# bayes_path /path/filename (default: ~/.spamassassin/bayes)
should provide a means to override this. Do NOT forget to assign proper RWX permissions.
That explains it and fits. C:\Windows\SysWOW64\config\systemprofile\ is the default profile for the SYSTEM account that Spamd is running again and as it has not been changed in LOCAL.Cf, it applies the database there. So it seems that quite simply you do not need to enter the --DBPATH parameter in SA-LEARN (as it will pick up from whatever the SpamD service has already been configured (or not!) to use).

To be sure I ask these questions just trying to simply things by removing possibly unnecessary paramters and/or opportunities for failures due to OS platform inconsistencies.
SA-LEARN and SPAMD need to run as the same Windows user if you want to eliminate "--DBPATH", tested on XP so I assume it's the same on other versions. :wink:

My daemon (SPAMD) run under system user and my SA-learn run as Administrator ...
SørenR.

" I will initiate self-destruct. " — IG-11.

saschadd
New user
New user
Posts: 13
Joined: 2014-07-16 14:58

Re: SpamAssassin Bootcamp (sa-learn)

Post by saschadd » 2015-10-31 13:15

Hi,

was someone able to change the max message size for bayes training?
I am using the free jamsoftware edition of spamassassin on sbs 2011 essentials and i couldnt find a way to change the max message size spamassassin would read.

Happy Halloween, Sascha

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2015-10-31 13:36

sa-learn [options]....

--max-size <b> Skip messages larger than b bytes;
defaults to 256 KB, 0 implies no limit
http://spamassassin.apache.org/full/3.4 ... learn.html
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2016-04-29 06:43

For better output format to sa-learn.log. You can try this.

Code: Select all

echo %DATE% %TIME% - START >> C:\hMailServer\Logs\sa-learn.log
<nul set /p = SPAM: >> C:\HmailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM\*.eml" >> C:\hMailServer\Logs\sa-learn.log
echo. >> C:\HmailServer\Logs\sa-learn.log
<nul set /p = HAM: >> C:\HmailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --ham "C:\hMailServer\HAM\*.eml" >> C:\hMailServer\Logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --sync
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --backup > "C:\Documents and Settings\Default User\.spamassassin\bayes_backup"
echo. >> C:\HmailServer\Logs\sa-learn.log
del C:\hMailServer\SPAM\*.eml /Q
del C:\hMailServer\HAM\*.eml /Q
echo %DATE% %TIME% - STOP >> C:\hMailServer\Logs\sa-learn.log
Last edited by jimimaseye on 2016-04-29 18:23, edited 1 time in total.
Reason: added CODE formatting

smedberg
New user
New user
Posts: 1
Joined: 2016-05-06 22:53

Re: SpamAssassin Bootcamp (sa-learn)

Post by smedberg » 2016-05-06 23:12

Here is an alternative code to use for Public folders based on the script in the thread.
I'm only a copy/paste coder so it might not be so fancy but it's working.

You can have two public folders that users can drop mails for learning.
I also changed the way the files are copied since I had problems with curly brackets in the name when running sa-learn.exe
I'm saving the files with the hMail message ID since I got some error with the brackets.

Code: Select all

Option Explicit

Const HMSADMINUSER = "administrator"
Const HMSADMINPWD = "password"  'change me

' Public Spam / Ham folders
	Const SPAMFolder    = "Spam"
    Const HAMFolder     = "Ham"
	Const SPAMdir       = "C:\Spamassassin\SPAM"
	Const HAMdir        = "C:\Spamassassin\HAM"
	Const SPAMCopy      = "C:\Spamassassin\Learn\SPAMCopy.cmd"
	Const HAMCopy       = "C:\Spamassassin\Learn\HAMCopy.cmd"
	Const SALearn       = "C:\Spamassassin\sa-learn.cmd"

	Dim oApp, oIMAPFolder, oMessage, oFSO, j,oShell

	Set oFSO = CreateObject("Scripting.FileSystemObject")
	Set oApp = CreateObject("hMailServer.Application")
	Set oShell = WScript.CreateObject("WScript.Shell")
	Call oApp.Authenticate(HMSADMINUSER, HMSADMINPWD)
	
	Sub MakeList(bFolder,bDir,bFile)
		 
		Set bFile = oFSO.CreateTextFile(bFile,True)

		Set oIMAPFolder = oApp.settings.PublicFolders.ItemByName(bFolder)
            If oIMAPFolder.Messages.Count > 0 Then             ' If no messages - skip
               For j = 0 to oIMAPFolder.Messages.Count -1
                  Set oMessage = oIMAPFolder.Messages.Item(j)
                     bFile.Write "COPY " & chr(34) & oMessage.FileName & chr(34) & " " & bDir & "\" & oMessage.ID & ".eml /Y" & vbCrLf
                  
               Next
            End If
	End Sub

	
   ' Find SPAM messages
   '
   Call MakeList(SPAMFolder,SPAMdir,SPAMCopy)
   
   ' Find HAM messages
   '
   Call MakeList(HAMFolder,HAMdir,HAMCopy)
	
   oShell.Run "cmd.exe /C " & SPAMCopy, 0, true
   oShell.Run "cmd.exe /C " & HAMCopy, 0, true
   oShell.Run "cmd.exe /C " & SALearn, 0, true

    

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2016-05-18 06:45

I recently got some eml files copied to the spam folder and it cannot be deleted through the script. I tried to delete them manually and it says I don't have enough permission even I am the administrator of the server. It was all good before. Any guess?

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2016-05-18 08:04

Got to safe mode and it can delete, but I got to purge those spams from the system so they won't copy over again.

User avatar
mattg
Moderator
Moderator
Posts: 21115
Joined: 2007-06-14 05:12
Location: 'The Outback' Australia

Re: SpamAssassin Bootcamp (sa-learn)

Post by mattg » 2016-05-18 08:44

What version of Windows?
Where is the data directory?
Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2016-05-18 09:12

Window 2008 R2 standard
F:\hMailServer\Data

Not all the spams have problem to delete. Only a few.

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2016-05-18 10:27

johnyu2012 wrote:I recently got some eml files copied to the spam folder and it cannot be deleted through the script. I tried to delete them manually and it says I don't have enough permission even I am the administrator of the server. It was all good before. Any guess?
Recently I had some seriously strange confusing behaviour on one of our machines too regarding permissions. Although not related to Hmailserver (its with Sage Accounting software) it does have some very strange confusing symptoms similar to yours.

In short:
from WITHIN the software you can generate and save a file to somewhere on your disk. Yet it wouldnt allow saving to any of usual MY DOCUMENT/home folders that belong to the user that is running the software (even if you right-click CREATE a new folder from the browse window first). Even if those folders where changed permission (by admin) to 'full control' for 'EVERYONE' it still would return 'permission denied'. And even if the software was run with full Administrator credentials (ie, log in as Administrator), saving to Administrator home folders, it still said the same. BUT... from OUTSIDE of the software, (just at explorer OS level) the folders pose no problems (just as you would expect). It is completely baffling that even the highest ranking user, cannot save to its OWN home folder from within the software (and the software is run as a normal 'user' account according to logon). And yet, you can save the files to the DESKTOP of the user which is in the same branch of the folder tree as the DOCUMENTS folder (c:\users\user\...).

Naturally, as you would expect, Sage Support didnt know anything and blamed Windows Permissions...despite them actually looking at it with me as me (shadowing) and remaining completely confused scratching their heads just like me).

Our PC's are static with no configuration or software changes and the only thing that changes are Windows Updates. And this problem never happened prior to November.

So....is there something weird going on with Windows and recent 'Windows Updates' causing these weird permission problems?

Could be related. But I dont have an answer (and remain perplexed with our problem). Maybe it gives to look at locally to solve your problem.
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

johnyu2012
Normal user
Normal user
Posts: 108
Joined: 2012-09-11 06:33

Re: SpamAssassin Bootcamp (sa-learn)

Post by johnyu2012 » 2016-05-18 11:03

yeah, weird. Even using those shredding software cannot delete those files, it has to be done in safe mode.

User avatar
mattg
Moderator
Moderator
Posts: 21115
Joined: 2007-06-14 05:12
Location: 'The Outback' Australia

Re: SpamAssassin Bootcamp (sa-learn)

Post by mattg » 2016-05-18 11:06

Almost sounds like an infected system
Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

User avatar
jimimaseye
Moderator
Moderator
Posts: 8780
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn)

Post by jimimaseye » 2016-05-18 11:07

johnyu2012 wrote:yeah, weird. Even using those shredding software cannot delete those files, it has to be done in safe mode.
You sure its not 'hooked'/locked by some AV software or similar (ie, locked rather than permissions disallowed)?
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

EHCanadian
New user
New user
Posts: 11
Joined: 2011-05-28 07:35
Location: Canada

Re: SpamAssassin Bootcamp (sa-learn)

Post by EHCanadian » 2016-05-28 18:50

Quick Question.. Under Server 2008..

SpamAssassin v3.4.1.36 binaries from jam-software

The { and } in the file names cause a error: archive-iterator: unable to open

Example Error
archive-iterator: unable to open C:\SrvData\HMailServer\Training\Spam\{FBE83817-9CC9-4FEF-9B63-1AA3627CE3EC}.eml: No such file or directory
Learned tokens from 0 message(s) (0 message(s) examined)

So I renamed two messages files names
Learned tokens from 2 message(s) (2 message(s) examined)

So the question is.. You able to alter your script to include the removal of the { } when exporting to the file names ?

Post Reply