SpamAssassin Bootcamp (sa-learn) train BAYES

This section contains scripts that hMailServer has contributed with. hMailServer 5 is needed to use these.
User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2016-05-28 19:50

EHCanadian wrote:Quick Question.. Under Server 2008..

SpamAssassin v3.4.1.36 binaries from jam-software

The { and } in the file names cause a error: archive-iterator: unable to open

Example Error
archive-iterator: unable to open C:\SrvData\HMailServer\Training\Spam\{FBE83817-9CC9-4FEF-9B63-1AA3627CE3EC}.eml: No such file or directory
Learned tokens from 0 message(s) (0 message(s) examined)

So I renamed two messages files names
Learned tokens from 2 message(s) (2 message(s) examined)

So the question is.. You able to alter your script to include the removal of the { } when exporting to the file names ?
Change

Code: Select all

oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & sDir & " /Y" & vbCrLf
to

Code: Select all

oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & sDir & "\" & CLng(oMessage.ID) & ".eml /Y" & vbCrLf
It looks like the culprit is a windows version of bug #7296
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7296
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

EHCanadian
New user
New user
Posts: 11
Joined: 2011-05-28 07:35
Location: Canada

Re: SpamAssassin Bootcamp (sa-learn)

Post by EHCanadian » 2016-05-28 19:59

Thx from canada! :)

Agostino
New user
New user
Posts: 17
Joined: 2014-01-19 19:29

Re: SpamAssassin Bootcamp (sa-learn)

Post by Agostino » 2016-11-17 01:52

Hi
i understand that the script in the head of this thread search the spam into the spam folder of accounts of one domain and put the mail in a same folder "spam" for whole accounts, is it right?
But is possible to search in more domain?
Thanks in advance

User avatar
katip
Senior user
Senior user
Posts: 747
Joined: 2006-12-22 07:58
Location: Istanbul

Re: SpamAssassin Bootcamp (sa-learn)

Post by katip » 2017-04-27 09:14

sorry for resurrecting this almost half a year old thread. it's very enlightening, but this doesn't work for me:

Code: Select all

C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM\*.eml"
log shows "Learned tokens from 0 message(s) (0 message(s) examined)" no matter how many files (eml) there are.

however this (just the path to folder) works fine:

Code: Select all

C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM"
log shows "Learned tokens from 3 message(s) (4 message(s) examined)" as expected.

btw, i'd suggest to add --use-ignores (if you maintain a bayes_ignore_from/to list like i do) too.

just for reference in case another SA novice gets confused like me.
Katip
--
HMS 5.7.0 x64, MariaDB 10.4.10 x64, SA 3.4.2, ClamAV 0.101.2 + SaneS

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn)

Post by SorenR » 2017-04-27 10:26

Just had a look at my install, seems like I missed an update...

I found that "--siteconfigpath" was known so it could be eliminated and thus "--dbpath" could be specified in the config since hMailServer do not support SpamAssassin User configuration so a common setting is just fine.

local.cf:

Code: Select all

bayes_path C:\SpamAssassin\bayes_db\bayes
It seems my server/OS/SpamAssassin don't really care if I specify *.EML so I removed it and it works :wink:

sa-learn.cmd:

Code: Select all

echo %DATE% %TIME% - START >> C:\SpamAssassin\logs\sa-learn.log

echo HAM: >> C:\SpamAssassin\logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --ham C:\SpamAssassin\temp\HAM >> C:\SpamAssassin\logs\sa-learn.log
DEL C:\SpamAssassin\temp\HAM\*.eml /Q

echo SPAM: >> C:\SpamAssassin\logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --spam C:\SpamAssassin\temp\SPAM >> C:\SpamAssassin\logs\sa-learn.log
DEL C:\SpamAssassin\temp\SPAM\*.eml /Q

C:\SpamAssassin\sa-learn.exe --sync >> C:\SpamAssassin\logs\sa-learn.log

C:\SpamAssassin\sa-learn.exe --backup > "C:\SpamAssassin\bayes_db\bayes_backup"

echo %DATE% %TIME% - STOP >> C:\SpamAssassin\logs\sa-learn.log
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

scottn
New user
New user
Posts: 1
Joined: 2017-05-16 01:53

Re: SpamAssassin Bootcamp (sa-learn)

Post by scottn » 2017-05-16 02:30

SorenR wrote:It seems my server/OS/SpamAssassin don't really care if I specify *.EML so I removed it and it works :wink:
If you run the SPAM command in a CMD window, it will output more information. It looks like a bug in the ArchiveIterator.pm script and it doesn't include a "\" necessary to make a valid path to the files. Removing *.EML worked though.

Code: Select all

C:\Program Files>"C:\Program Files\SpamAssassin\sa-learn.exe" --siteconfigpath="C:\Program Files\SpamAssassin\etc\spamassassin" --dbpath "C:\Users\Default\AppData\Roaming\spamassassin\bayes" --spam "C:\SvrData\MailServer\SPAM\*.eml"
archive-iterator: no access to C:\SvrData\MailServer\SPAM{B1640347-67F2-45A7-BC11-F1C7A94DDC75}.eml: No such file or directory at Mail/SpamAssassin/ArchiveIterator.pm line 833.
archive-iterator: no access to C:\SvrData\MailServer\SPAM{BE5B671E-2F7A-441A-9759-042461EBA03D}.eml: No such file or directory at Mail/SpamAssassin/ArchiveIterator.pm line 833.
archive-iterator: no access to C:\SvrData\MailServer\SPAM{CC3238F1-4226-44D5-A7CB-DD37C9C58FBE}.eml: No such file or directory at Mail/SpamAssassin/ArchiveIterator.pm line 833.
archive-iterator: unable to open C:\SvrData\MailServer\SPAM{B1640347-67F2-45A7-BC11-F1C7A94DDC75}.eml: No such file or directory
archive-iterator: unable to open C:\SvrData\MailServer\SPAM{BE5B671E-2F7A-441A-9759-042461EBA03D}.eml: No such file or directory
archive-iterator: unable to open C:\SvrData\MailServer\SPAM{CC3238F1-4226-44D5-A7CB-DD37C9C58FBE}.eml: No such file or directory
Learned tokens from 0 message(s) (0 message(s) examined)

User avatar
jimimaseye
Moderator
Moderator
Posts: 8527
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by jimimaseye » 2018-03-20 23:06

After all this time, Ive only just become aware that for those of us using the Jam Windows version of Spamassassin, they provide their own script to do the bayes training.

It is found in the spamassassin for windows program directory, and it called TRAINBAYES.BAT. It does pretty much the same thing as Sorens script (leading this thread). It prompts for choice of HAM or SPAM and for the directory of your emails. Then it does the training by using the spamd process which should already be running.

The BAT file (trainbayes.bat) is as follows (full acknowledgement of Jams work accepted):

Code: Select all

@echo off

echo.
echo SpamAssassin Bayes training script v1.0
echo Copyright (c) 1999-2014 JAM Software GmbH
echo.

echo Please specify the full path where the messages that shall be trained reside (e.g. C:\spam\).
echo Path MUST END with a backslash. Quotes are added automatically.
echo.
set /P FOLDER=
echo.

if not exist "%FOLDER%" (
	echo Directory "%FOLDER%" does not exist, stopping here.
	goto end
)

set /P TRAINTYPE=Train as spam or ham? (S/h)
if {%TRAINTYPE%}=={h} (
	set TRAINTYPE=ham
) else (
	set TRAINTYPE=spam
)
echo.

echo Checking if SpamAssassin daemon (spamd) is available on local host...
spamc -K > nul
echo.
if %errorlevel%==0 (
	echo Spamd is available. Using spamc for training.
	set METHOD=spamc
) else (
	echo Spamd is not available. Using sa-learn for training.
	set METHOD=sa-learn
)

if %METHOD% == spamc (
	goto spamc
) else (
	goto sa-learn
)

:spamc
for %%X in ("%FOLDER%*") do spamc -L %TRAINTYPE% < "%%X"
if %errorlevel%==74 (
	echo Learning is not allowed by spamd, please start spamd with --allow-tell switch.
	goto end
)
goto end

:sa-learn:
for %%X in ("%FOLDER%*") do sa-learn --%TRAINTYPE% < "%%X"
goto end

:end
echo.
echo Bayes training with learntype "%TRAINTYPE%" finished.
echo Press any key to exit script.
pause>nul
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn)

Post by palinka » 2018-07-14 22:19

SorenR wrote:
2014-08-11 13:28

This morning's harvest.. :mrgreen:

Code: Select all

11-08-2014  4:00:48,53 - START 
SPAM: 
Learned tokens from 20 message(s) (114 message(s) examined)
HAM: 
Learned tokens from 11 message(s) (120 message(s) examined)
11-08-2014  4:01:18,19 - STOP 
I installed and ran this script (0.4.1 version) a few times. My log is empty.

Code: Select all

Sat 07/14/2018 14:38:00.89 - START 
SPAM: 
HAM: 
Sat 07/14/2018 14:38:00.92 - STOP 
That's the last of 4 runtime entries and they all look like that.

The first successful run did not create a log at all.

Before the last run, I moved a bunch of false positives from spam to inbox. I thought these would be picked up.

Spamcopy.cmd and hamcopy.cmd both have entries in them so I think the script is working. Any idea why the log is empty?

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-14 23:34

What does your sa-learn.cmd look like ??
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-15 03:08

SorenR wrote:
2018-07-14 23:34
What does your sa-learn.cmd look like ??

Code: Select all

echo %DATE% %TIME% - START >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
echo SPAM: >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe --siteconfigpath="C:\Program Files\JAM Software\SpamAssassin for Windows\etc\spamassassin" --dbpath "C:\Users\user\.spamassassin\bayes" --spam "X:\sa-train\SPAM\*.eml" >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
echo HAM: >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe --siteconfigpath="C:\Program Files\JAM Software\SpamAssassin for Windows\etc\spamassassin" --dbpath "C:\Users\user\.spamassassin\bayes" --ham "X:\sa-train\HAM\*.eml" >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe --siteconfigpath="C:\Program Files\JAM Software\SpamAssassin for Windows\etc\spamassassin" --dbpath "C:\Users\user\.spamassassin\bayes" --sync >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe --siteconfigpath="C:\Program Files\JAM Software\SpamAssassin for Windows\etc\spamassassin" --dbpath "C:\Users\user\.spamassassin\bayes" --backup > "C:\Users\user\.spamassassin\bayes_backup"
del X:\sa-train\SPAM\*.eml /Q
del X:\sa-train\HAM\*.eml /Q
echo %DATE% %TIME% - STOP >> X:\xampp\htdocs\website.tld\status\sa-log\sa-learn.log
I put the log in a web accessible folder so I could look at it easier. You think that might have anything to do with it? Also moved the SPAM & HAM folders outside of the spamassassin installation folder.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-15 11:04

https://support.microsoft.com/en-us/hel ... tion-marks

I would use quotes around this ...

"C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe"
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-15 13:22

SorenR wrote:
2018-07-15 11:04
https://support.microsoft.com/en-us/hel ... tion-marks

I would use quotes around this ...

"C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe"
Thanks! I'm making progress.

Code: Select all

Sun 07/15/2018  6:53:57.11 - START 
SPAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
HAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
Sun 07/15/2018  6:54:12.25 - STOP 
Sun 07/15/2018  7:05:37.79 - START 
SPAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
HAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
Sun 07/15/2018  7:05:52.88 - STOP 
It ran from windows scheduler early this morning, and as mentioned, it ran a few times yesterday. The log snippet is from manually running it just now after adding the quotes.

After the first run (after fixing the quotes), I saw that no messages were examined, so I deleted hamcopy.cmd and spamcopy.cmd and ran it again. Still 0 messages examined. Both spamcopy.cmd and hamcopy.cmd were full of entries both before deleting and after they were rebuilt by running the script. Is this 0 messages examined normal? (I did run it minutes apart)

One last thing. I'm not good at scripting. Is there a way to reverse the log entries so the newest is at the top instead of the bottom? This is not so important, obviously, but since I set it up to view on the web, it would be easier to look at. :-) Thanks again.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-15 22:28

Version 0.4.2 that never made it to this thread as an archive ;-)

Remember to update your local.cf as the reference to the bayes db is removed from sa-learn.exe (not needed if present in local.cf)
Attachments
sa-learn.0.4.2.rar
(1.96 KiB) Downloaded 121 times
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-16 00:12

Awesome. Thank you. I'll try it and report back if any issues.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-16 01:35

Wow this is great. I thought it wasn't working at first since there's no spinny "thinking" cursor or any other kind of notice. The log wasn't populating. Turns out it was just working the whole time. Took a few minutes (3 minutes 22 seconds to be exact). Yes sir, its working great. I can't wait to see how SA/HMS deal with spam now. Thanks so much for this great script.

Code: Select all

Sun 07/15/2018 19:23:53.66 - START 
SPAM: 
Learned tokens from 131 message(s) (132 message(s) examined)
HAM: 
Learned tokens from 1494 message(s) (1494 message(s) examined)
expired old bayes database entries in 7 seconds
149527 entries kept, 98139 deleted
token frequency: 1-occurrence tokens: 70.21%
token frequency: less than 8 occurrences: 20.76%
Sun 07/15/2018 19:27:15.91 - STOP 

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-24 16:38

I just wanted to follow up on this. Everything working fine, but I've been doing a lot of reading and thinking and I came up with an idea to help the process.

First of all, I'm trying to train for real spam, not bulk commercial messages. Secondly, I wanted to see the messages that get deleted by HMS. I had my HMS spam delete threshold set to 8. I changed that to 8000. Why not 800? Because gtube scores ~1000. I guess I could have set it to anything 1100 or more, but the 8 was already there. :lol:

However, I still want to delete the messages scored 8+, so I created the following 2 rules. I also created a spam repository by creating a domain and an account just for spam (spam@spamdomain.com).

Spam to Spam Folder - all marked spam below the delete threshold should be moved to the spam folder and also forwarded to the spam repository.
Criteria: Use AND
X-hMailServer-Spam = YES
X-hMailServer-LoopCount < 1
X-hMailServer-Reason-Score < 8
Actions:
Forward email to: spam@spamdomain.com
Move to IMAP folder: Spam

Spam Delete - all marked spam above the delete threshold should be deleted and also forwarded to the spam repository.
Criteria: Use AND
X-hMailServer-Spam = YES
X-hMailServer-LoopCount < 1
X-hMailServer-Reason-Score > 7.99
Actions:
Forward email to: spam@spamdomain.com
Delete Email

Now I can see and sort all spam for all accounts while users will never receive spam scored > 8. My goal with the spam repository is to collect everything marked as spam and manually sort REAL spam (sent to spam folder of spam@spamdomain.com) and bulk commercial messages that got caught as false positives (left in the inbox of spam@spamdomain.com). Then the train bayes script can learn from these.

A noble goal. :roll: But I have a question. Obviously, with lazy and/or ignorant users not filtering their own mail AND most of the false positives I'm able to sort in the repository will still sit in the spam folder of most of the users, how much of an affect should I be able to get? I'm in the very early stages of this, so training is limited. I'll report back any observations I'm able to make.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-24 18:25

I thought about it some more and decided it would be better to send all spam to users inboxes and that way it will "force" them to decide if its spam or ham. They may delete it, they may just leave it there, or they may actually move real spam to the spam folder.

In order to effectuate this, I changed the "Spam to Spam Folder" rule above to only forward the message to the spam repository. That way, the message will land in the user's inbox and also my repository. No matter what the user decides to do, I will have the opportunity to sort ham from spam. That should build up a better database to pass the currently fasle positives.

The problem with the way I set it up in the above post is that if a false positive lands in the user's spam folder, I'm not able to move it back to the inbox for processing. If *everything* (except what's deleted above the threshold) goes to the user's inbox, the spam that I'm sorting in the repository won't conflict with false positives that most users won't bother to look at, much less move. AND, with a spam tag in the subject line, its probably a little more likely the message will get deleted by the user, if not moved to the spam folder. At least, that's what I'm hoping for. :)

Edit - I just remembered this only works on one domain. I guess I'll have to move my spam repository to that domain.

Is it possible to add more domains or make it system wide folder scanning?

Edit 2 - possible domain workaround - have one script for each domain and run sequentially from task scheduler? All pointing to the same bayes database?

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-24 20:17

Possible domain workaround #2. Set the script target domain to be the spam repository. That excludes all user interaction and should produce the best results since only I, as administrator, will have the ability to sort ham from spam. I think that's what I'll do, but first I'll reduce the spam marking score to eliminate the possibility of actual spam not getting marked (false negatives).

I only see a dozen messages marked spam per day, so sorting messages is a pretty easy task.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-24 22:55

My system:

SPAM score is 3 and SPAM delete threshold is 1000. Max message size is 1024 KB.

Global rules:
- SPAM score < 7 go to "Junk E-mail" folder AND spam@mydomain.tld INBOX
- SPAM score > 6 go to spam@mydomain.tld INBOX ONLY!

SpamAssassin run every night with a 7 day window registering INBOX as HAM and "Junk E-mail" folder as SPAM.
- Note2Self: I really must modify procedure soon to accept alternate locations of SPAM, if user is using non-english mail client.

User have 7 days to re-scan/re-label mail by moving to/from Inbox or Spam folder.

Various rules/script functions will label SPAM by adding score (blacklist), clear score (whitelist) or add custom headers.

User rules for spam@mydomain.tld:
1: Rule to move SnowShoe SPAM to "UCE - SnowShoe" folder.
2: Rule to move SPAM score > 6 to "UCE - HighScore" folder.
3: Rule to move SpamAssassin detections to "UCE - SpamAssassin" folder.
4: Rule to move blacklisted SPAM to "UCE - Custom Rule" folder.
5: Rule to move remaining SPAM to "UCE - SPAM".

The folder "UCE - FPs" is used for registering False Positives. For this account only, SpamAssassin will include "UCE - FPs" as HAM and "UCE - HighScore" as SPAM.

That's it, basically :mrgreen:

Oh, by the way. One more rule for spam@mydomain.tld that I pulled back in from the cold. I want to see if I can do something sensible with it. Still need to figure out how to wait for a scripted pageupdate.

Criteria: "X-hMailServer-Reason-Score" Greater than 6
Actions: Run function "Unsubscribe"

Code: Select all

   Sub Unsubscribe(oMessage)
      Dim strRegEx, Match, Matches
      strRegEx = "<(http|https):[\s\S]*?>"
      With CreateObject("VBScript.RegExp")
         .Pattern = strRegEx
         .Global = True
         .MultiLine = True
         .IgnoreCase = True
         Set Matches = .Execute(oMessage.HeaderValue("List-Unsubscribe"))
      End With
      If Matches.Count > 0 Then
         Dim sMember, sURL
         For Each Match In Matches
            If (InStr(Match.Value, "<") > 0) Then
               sURL = Mid(Trim(Match.Value), 2, Len(Trim(Match.Value))-2)
            Else
               sURL = Trim(Match.Value)
            End If
            On Error Resume Next
            With CreateObject("MSXML2.ServerXMLHTTP.6.0")
               .setoption(2) = (.getoption(2) & " - SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS")
               .open "GET", sURL, False
               .setrequestheader "User-Agent", "online link validator (http://www.dead-links.com/)"
               .send ("")
            End With
            On Error Goto 0
            If (Err.Number <> 0) Then
               Eventlog.Write( "ERROR: Sub Unsubscribe(oMessage)" )
               Eventlog.Write( "Error       : " & Err.Number )
               Eventlog.Write( "Error (hex) : 0x" & Hex(Err.Number) )
               Eventlog.Write( "Source      : " & Err.Source )
               Eventlog.Write( "Description : " & Err.Description )
               Err.Clear
               Exit Sub
            End If
         Next
      End If
   End Sub
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-25 00:17

Oh yeah. I'm trying that unsubscribe script.

Question for you: is your spam@domain.TLD on the same domain as your "regularly used" domain? Do you have more than 1 domain and if so, are you running multiple instances of the sa-learn script?

Just curious because I have two well used domains, and a couple of others for machine use, etc. I'd like to incorporate both of the well used domains into sa-learn. The only way I can figure it will work is to have 2 completely separate instances with the Bayes database as the only thing in common. 2 sets of scripts, 2 sets of spam and ham folders, 2 scheduled tasks, etc. I haven't tried it yet.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-25 10:26

Multiple domains would require some rewriting, will look into that.
As it is now you could try this version and run two copies of the script with different domains. I think 1 SPAM account should be fine.

sa-learn.vbs

Code: Select all

Option Explicit
   '
   ' Version 0.4.3 30/05-2018, Soren Rathje - Introduced two special folders; non-delivered SPAM and False Positives.
   ' Version 0.4.2 28/05-2016, Soren Rathje - Compatibility issues (curly brackets bug in sa-learn).
   ' Version 0.4.1 25/11-2014, Soren Rathje - Compatibility issues.
   ' Version 0.4.0 27/10-2014, Soren Rathje - Changed error logging.
   ' Version 0.3.0 11/10-2014, Soren Rathje - Bugfixing & Log error to Eventlog if IMAPFolder is missing
   ' Version 0.2.0 30/08-2014, Soren Rathje - Selection changed to DAYS.
   ' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
   '
   ' Configuration parameters
   '
   ' NB! SPAMFolder and HAMFolder MUST exist for every account to be processed.
   '     Administrative and automation accounts can be excluded from processing
   '     by defining them in "ListExclude"
   '
   Const Administrator   = "Administrator"
   Const Secret          = "########"
   Const Domain          = "mydomain.tld"
   Const SPAMFolder      = "Junk E-mail"
   Const SPAMSpecial     = "UCE - HighScore"
   Const HAMFolder       = "INBOX"
   Const HAMSpecial      = "UCE - FPs"
   Const ListExclude     = "postmaster@mydomain.tld, blog@mydomain.tld, spam@mydomain.tld"
   Const SPAMUser        = "spam@mydomain.tld"
   Const SPAMdir         = "C:\SpamAssassin\temp\SPAM"
   Const HAMdir          = "C:\SpamAssassin\temp\HAM"
   Const SPAMCopy        = "C:\SpamAssassin\temp\SPAMCopy.cmd"
   Const SPAMCopySpecial = "C:\SpamAssassin\temp\SPAMCopySpecial.cmd"
   Const HAMCopy         = "C:\SpamAssassin\temp\HAMCopy.cmd"
   Const HAMCopySpecial  = "C:\SpamAssassin\temp\HAMCopySpecial.cmd"
   Const SALearn         = "C:\SpamAssassin\sa-learn.cmd"
   Const RetainDays      = 7
   ' Const RetainDays      = 30

   Sub MakeFileList(oDomain,sFolder,sDir,sFile,sExclude)
      Dim i, j, oFile, oAccount, oIMAPFolder, oMessage, Flag : Flag = False
      Set oFile = oFSO.CreateTextFile(sFile,True)

      For i = 0 to oDomain.Accounts.Count -1
         Set oAccount = oDomain.Accounts.Item(i)

         ' DO NOT process excluded and non-active accounts.
         If (InStr(sExclude, oAccount.Address) = 0) AND oAccount.Active Then Flag = True
         If (sExclude = "") And (oAccount.Address <> SPAMUser) Then Flag = False

         If Flag Then
            On Error Resume Next
            Set oIMAPFolder = oAccount.IMAPFolders.ItemByName(sFolder)
            If Err.Number Then
               EventLog.Write ("ERROR: VBScript SA-Learn")
               EventLog.Write ("Exception: Set oIMAPFolder = oAccount.IMAPFolders.ItemByName(sFolder)")
               EventLog.Write ("Exception: sFolder = " & sFolder)
               EventLog.Write ("Exception: oAccount = " & oAccount.Address)
               EventLog.Write ("Error       : " & Err.Number)
               EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
               EventLog.Write ("Source      : " & Err.Source)
               EventLog.Write ("Description : " & Err.Description)
               Err.Clear
            ElseIf oIMAPFolder.Messages.Count > 0 Then             ' If no messages - skip
               For j = 0 to oIMAPFolder.Messages.Count -1
                  Set oMessage = oIMAPFolder.Messages.Item(j)
                  If oMessage.InternalDate > dRetainDate Then
                     oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & sDir & "\" & CLng(oMessage.ID) & ".eml /Y" & vbCrLf
               End If
               Next
            End If
            On Error GoTo 0


         End If
      Next

      oFile.Close
   End Sub

   '
   ' Define variables/objects
   '
   Dim oApp, oDomain, oShell, oFSO
   Dim dRetainDate, EventLog, RetCode

   '
   ' Initialize environment
   '
   Set oShell = WScript.CreateObject("WScript.Shell")
   Set oFSO = CreateObject("Scripting.FileSystemObject")
   Set oApp = CreateObject("hMailServer.Application")
   Call oApp.Authenticate(Administrator, Secret)
   Set EventLog = CreateObject("hMailServer.EventLog")
   Set oDomain = oApp.Domains.ItemByName(Domain)

   '
   ' Start from this date...
   '
   dRetainDate = CDate(Now - RetainDays)

   '
   ' Find SPAM messages
   '
   Call MakeFileList(oDomain,SPAMFolder,SPAMdir,SPAMCopy,ListExclude)
   Call MakeFileList(oDomain,SPAMSpecial,SPAMdir,SPAMCopySpecial,"")

   '
   ' Find HAM messages
   '
   Call MakeFileList(oDomain,HAMFolder,HAMdir,HAMCopy,ListExclude)
   Call MakeFileList(oDomain,HAMSpecial,HAMdir,HAMCopySpecial,"")

   '
   ' Execute file copy and sa-learn.exe - sequentially - no StdOut.
   '
   On Error Resume Next

   oShell.Run "cmd.exe /C " & SPAMCopy, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run SPAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   oShell.Run "cmd.exe /C " & SPAMCopySpecial, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run SPAMCopySpecial, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   oShell.Run "cmd.exe /C " & HAMCopy, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run HAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   oShell.Run "cmd.exe /C " & HAMCopySpecial, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run HAMCopySpecial, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   oShell.Run "cmd.exe /C " & SALearn, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run SALearn, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   On Error GoTo 0
sa-learn.cmd

Code: Select all

echo %DATE% %TIME% - START >> C:\SpamAssassin\logs\sa-learn.log

echo HAM: >> C:\SpamAssassin\logs\sa-learn.log
rem C:\SpamAssassin\sa-learn.exe --ham "C:\SpamAssassin\temp\HAM\*.eml" >> C:\SpamAssassin\logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --ham C:\SpamAssassin\temp\HAM >> C:\SpamAssassin\logs\sa-learn.log
del C:\SpamAssassin\temp\HAM\*.eml /Q

echo SPAM: >> C:\SpamAssassin\logs\sa-learn.log
rem C:\SpamAssassin\sa-learn.exe --spam "C:\SpamAssassin\temp\SPAM\*.eml" >> C:\SpamAssassin\logs\sa-learn.log
C:\SpamAssassin\sa-learn.exe --spam C:\SpamAssassin\temp\SPAM >> C:\SpamAssassin\logs\sa-learn.log
del C:\SpamAssassin\temp\SPAM\*.eml /Q

C:\SpamAssassin\sa-learn.exe --sync >> C:\SpamAssassin\logs\sa-learn.log

C:\SpamAssassin\sa-learn.exe --backup > "C:\SpamAssassin\bayes_db\bayes_backup"

echo %DATE% %TIME% - STOP >> C:\SpamAssassin\logs\sa-learn.log
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-25 16:15

Remember to make a copy of your C:\SpamAssassin\bayes_db\bayes_backup file in case you need to restore your Bayes database.

"sa-learn.exe --restore bayes_backup" and please remember that this process is destructive - it WILL overwrite your current Bayes database.

sa-learn.vbs Version 0.5.0

Code: Select all

Option Explicit
   '
   ' Version 0.5.0 25/07-2018, Soren Rathje - Multiple domains, skipping non-existing folders plus reworked code.
   ' Version 0.4.3 30/05-2018, Soren Rathje - Introduced two special folders; non-delivered SPAM and False Positives.
   ' Version 0.4.2 28/05-2016, Soren Rathje - Compatibility issues (curly brackets bug in sa-learn).
   ' Version 0.4.1 25/11-2014, Soren Rathje - Compatibility issues.
   ' Version 0.4.0 27/10-2014, Soren Rathje - Changed error logging.
   ' Version 0.3.0 11/10-2014, Soren Rathje - Bugfixing & Log error to Eventlog if IMAPFolder is missing
   ' Version 0.2.0 30/08-2014, Soren Rathje - Selection changed to DAYS.
   ' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
   '
   ' Configuration parameters
   '
   '     Administrative and automation accounts can be excluded from processing
   '     by defining them in "ExcludeList"
   '
   Const Administrator   = "Administrator"
   Const Secret          = "********"

   Const ExcludeList     = "postmaster@mydomain.tld, blog@mydomain.tld"
   Const DomainList      = "mydomain.tld, acme.inc"
   Const SPAMFolders     = "SPAM, Junk E-mail, Uønsket e-mail, UCE - HighScore"
   Const HAMFolders      = "INBOX, UCE - FPs"

   Const SATemp          = "C:\SpamAssassin\temp\"
   Const SALearn         = "C:\SpamAssassin\sa-learn.cmd"
   Const RetainDays      = 7


   Sub BuildList(a, mExcludes, b, mDays, mTemp, mType)
      Dim i, j, k, l, oFile, oDomain, oAccount, oMessage, oMessages, mDomain, mDomains, mFolder, mFolders
      Set oFile = oFSO.CreateTextFile(mTemp & mType & ".CMD",True)
      mDomains = Split(a, ",")
      mFolders = Split(b, ",")
      WScript.Echo "Type: " & mType
      For Each mDomain in mDomains
         mDomain = Trim(mDomain)
         Set oDomain = oApp.Domains.ItemByName(mDomain)
         WScript.Echo "     Domain: " & oDomain.Name
         For i = 0 to oDomain.Accounts.Count - 1
            Set oAccount = oDomain.Accounts.Item(i)
            If InStr(mExcludes, oAccount.Address) = 0 And oAccount.Active Then
               WScript.Echo "          Account: " & oAccount.Address
               For Each mFolder In mFolders
                  mFolder = Trim(mFolder)
                  On Error Resume Next
                  If oAccount.IMAPFolders.ItemByName(mFolder) Is Nothing Then
                     On Error GoTo 0
                  Else
                     On Error GoTo 0
                     WScript.Echo "               Folder: " & mFolder
                     Set oMessages  = oAccount.IMAPFolders.ItemByName(mFolder).Messages
                     If Not IsNull(oMessages) Then
                        For j = 0 to oMessages.Count - 1
                           Set oMessage = oMessages.Item(j)
                           If oMessage.InternalDate > CDate(Now - mDays) Then
                              oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & mTemp & mType & "\" & CLng(oMessage.ID) & ".eml /Y" & vbCrLf
                           End If
                        Next
                     End If
                  End If
               Next
            End If
         Next
      Next
      oFile.Close
      Set oFile = Nothing
   End Sub

   '
   ' Define variables/objects
   '
   Dim oShell, oFSO, oApp
   Dim EventLog

   '
   ' Initialize environment
   '
   Set oShell = WScript.CreateObject("WScript.Shell")
   Set oFSO = CreateObject("Scripting.FileSystemObject")
   Set oApp = CreateObject("hMailServer.Application")
   Call oApp.Authenticate(Administrator, Secret)
   Set EventLog = CreateObject("hMailServer.EventLog")

   '
   ' Find SPAM messages
   '
   Call BuildList(DomainList, ExcludeList, SPAMFolders, RetainDays, SATemp, "SPAM")

   '
   ' Find HAM messages
   '
   Call BuildList(DomainList, ExcludeList, HAMFolders, RetainDays, SATemp, "HAM")

   '
   ' Execute file copy and sa-learn.exe - sequentially - no StdOut.
   '
   On Error Resume Next

   WScript.Echo "Copying SPAM mails"
   oShell.Run "cmd.exe /C " & SATemp & "SPAM.CMD", 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run SPAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   WScript.Echo "Copying HAM mails"
   oShell.Run "cmd.exe /C " & SATemp & "HAM.CMD", 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run HAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   WScript.Echo "Starting the learning process ... "
   oShell.Run "cmd.exe /C " & SALearn, 0, true
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oShell.Run SALearn, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If

   On Error GoTo 0
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-25 16:48

SorenR wrote:
2018-07-25 16:15
Remember to make a copy of your C:\SpamAssassin\bayes_db\bayes_backup file in case you need to restore your Bayes database.

"sa-learn.exe --restore bayes_backup" and please remember that this process is destructive - it WILL overwrite your current Bayes database.
Interesting. I hope I didn't screw something up. My bayes_backup only has the following:

Code: Select all

v	3	db_version # this must be the first line!!!
v	0	num_spam
v	0	num_nonspam
I have run this script on 2 different domains. Perhaps that's why its 0 / 0? Should it be any different? I haven't run your v5 script yet.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-25 17:06

Also, here:

Code: Select all

   Const SPAMFolders     = "SPAM, Junk E-mail, Uønsket e-mail, UCE - HighScore"
   Const HAMFolders      = "INBOX, UCE - FPs"
Are * wildcards OK? Particularly with "junk" it seems to be willy nilly with outlook/roundcube/etc. Thanks.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-25 17:57

I ran v5 with ALL my domains. Hey, why not? :) Anyway, 2 issues:

1) I'm getting "Windows Script Host" pop ups for every domain, account and folder the script comes across and it requires my pushing OK button before proceeding. I don't think this will run automatically. -----> nevermind - I added "cscript" ahead of it and it printed the output to the cmd window. I'm a total novice, so bear with me.

2) The log is showing 0 tokens learned. I don't know if that's a problem, but I think it should at least not be "0 messages examined"

Code: Select all

----------------------- 
Wed 07/25/2018 11:39:01.61 - START 
HAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
SPAM: 
Learned tokens from 0 message(s) (0 message(s) examined)
Wed 07/25/2018 11:39:19.23 - STOP 
I copied the format of your last posting of sa-learn.cmd for this. After I moved "del C:\SpamAssassin\temp\HAM\*.eml /Q" and the same for SPAM to the bottom like you had it in earlier versions, but I still got "0 messages examined".

3) FYI - bayes_backup repopulated with something like 200k lines.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-25 18:44

1) fixed by running with cscript ahead of vbs

2) I left off the trailing "\" in >>Const SATemp = "C:\SpamAssassin\temp\"<<

Also, SPAM and HAM folders MUST EXIST in SATemp folder. All good now. :-) Thanks again!

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-25 19:03

It seems I left a few "WScript.Echo" in there. Just comment them out if they cause any problems.

Yes, directories need to be in place.

Wildcards will NOT work.
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-26 11:43

This has really cut down the false positives. It seems to be a giant improvement so far.

One other thing I did was go back to the default required_score = 5 in spamassassin. Anything below that gave a lot of falls positives. I guess it's the default for a reason. Those guys know what they're doing.

Also, like you, I set HMS mark spam at 3, which will so give HMS an opportunity to catch spam base on rbl,SPF, etc. Seems to be highly accurate now.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-26 14:53

I use these 4 categories:

RBL's = 5
SURBL's = 5
SPF = 5
SpamAssassin = Actual value

I have been "playing". This is bleeding edge code that may - or may not - declare war on Asgaard and/or Jotunheim - use at your own discretion. :mrgreen:

sa-learn.exe do not really like curly braces "{}" in filepaths and that was the primary reason why the code was split and mails were copied into the SPAM and HAM folders.

I had a brain fart this morning and after some experimenting I believe I've cracked the nut - anyways, it works on my server.

Code: Select all

C:\hMailServer\Data\mydomain.tld\spam\D9\{D9EC1A93-6382-4DA6-AA69-C6D97C8CD944}.eml

... is simply translated to ...

C:/hMailServer/Data/mydomain.tld/spam/D9/\{D9EC1A93-6382-4DA6-AA69-C6D97C8CD944\}.eml
This is all that is needed, no cmd files and no HAM/SPAM folder, NO copying mails back and forth.
Temp, Log and Bayes_DB folders must exist however.

sa-learn.vbs Version 0.5.a

Code: Select all

Option Explicit
   '
   ' Version 0.5.a 26/07-2018, Soren Rathje - Experimental rewrite of filelist to fix curly brace problem in sa-learn.
   ' Version 0.5.0 25/07-2018, Soren Rathje - Multiple domains, skipping non-existing folders plus reworked code.
   ' Version 0.4.3 30/05-2018, Soren Rathje - Introduced two special folders; non-delivered SPAM and False Positives.
   ' Version 0.4.2 28/05-2016, Soren Rathje - Compatibility issues (curly brackets bug in sa-learn).
   ' Version 0.4.1 25/11-2014, Soren Rathje - Compatibility issues.
   ' Version 0.4.0 27/10-2014, Soren Rathje - Changed error logging.
   ' Version 0.3.0 11/10-2014, Soren Rathje - Bugfixing & Log error to Eventlog if IMAPFolder is missing
   ' Version 0.2.0 30/08-2014, Soren Rathje - Selection changed to DAYS.
   ' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
   '
   ' Configuration parameters
   '
   '     Administrative and automation accounts can be excluded from processing
   '     by defining them in "ExcludeList"
   '
   Const Administrator   = "Administrator"
   Const Secret          = "########"

   Const ExcludeList     = "postmaster@mydomain.tld, blog@mydomain.tld"
   Const DomainList      = "mydomain.tld, acme.inc"
   Const SPAMFolders     = "SPAM, Junk E-mail, Uønsket e-mail, UCE - HighScore"
   Const HAMFolders      = "INBOX, UCE - FPs"

   Const TempDir         = "C:\SpamAssassin\temp\"       ' Need permission for create, read & write
   Const LogDir          = "C:\SpamAssassin\logs\"       ' Need permission for create, read & write
   Const BayesDir        = "C:\SpamAssassin\bayes_db\"   ' Need permission for create, read & write
   Const RetainDays      = 7
   Const Verbose         = 0

   Sub BuildList(a, mExcludes, b, mDays, mTemp, mType)
      Dim i, j, k, l, strFileName
      Dim oFile, oDomain, oAccount, oMessage, oMessages
      Dim mDomain, mDomains, mFolder, mFolders

      Set oFile = oFSO.CreateTextFile(mTemp & mType, True)
      mDomains = Split(a, ",")
      mFolders = Split(b, ",")
      If Verbose Then WScript.Echo "Type: " & mType
      For Each mDomain in mDomains
         mDomain = Trim(mDomain)
         Set oDomain = oApp.Domains.ItemByName(mDomain)
         If Verbose Then WScript.Echo "     Domain: " & oDomain.Name
         For i = 0 to oDomain.Accounts.Count - 1
            Set oAccount = oDomain.Accounts.Item(i)
            If InStr(mExcludes, oAccount.Address) = 0 And oAccount.Active Then
               If Verbose Then WScript.Echo "          Account: " & oAccount.Address
               For Each mFolder In mFolders
                  mFolder = Trim(mFolder)
                  On Error Resume Next
                  If oAccount.IMAPFolders.ItemByName(mFolder) Is Nothing Then
                     On Error GoTo 0
                  Else
                     On Error GoTo 0
                     If Verbose Then WScript.Echo "               Folder: " & mFolder
                     Set oMessages  = oAccount.IMAPFolders.ItemByName(mFolder).Messages
                     If Not IsNull(oMessages) Then
                        For j = 0 to oMessages.Count - 1
                           Set oMessage = oMessages.Item(j)
                           If oMessage.InternalDate > CDate(Now - mDays) Then
                              strFileName = Replace(oMessage.FileName,"\","/")
                              strFileName = Replace(strFileName,"{","\{")
                              strFileName = Replace(strFileName,"}","\}")
                              oFile.Write strFileName & vbCrLf
                           End If
                        Next
                     End If
                  End If
               Next
            End If
         Next
      Next
      oFile.Close
   End Sub

   Dim oFSO : Set oFSO = CreateObject("Scripting.FileSystemObject")
   Dim oApp : Set oApp = CreateObject("hMailServer.Application")
   Call oApp.Authenticate(Administrator, Secret)

   '
   ' Find SPAM messages
   '
   Call BuildList(DomainList, ExcludeList, SPAMFolders, RetainDays, TempDir, "SPAM")

   '
   ' Find HAM messages
   '
   Call BuildList(DomainList, ExcludeList, HAMFolders, RetainDays, TempDir, "HAM")

   '
   ' Execute n'Stuff ...
   '
   CreateObject("WScript.Shell").Run "cmd /c Echo %DATE% %TIME% - START >> " & LogDir & "sa-learn.log", Verbose, True

   If Verbose Then WScript.Echo "Processing HAM mails"
   CreateObject("WScript.Shell").Run "cmd /c Echo HAM: >> " & LogDir & "sa-learn.log", Verbose, True
   CreateObject("WScript.Shell").Run "cmd /c sa-learn.exe --ham --folders=" & TempDir & "HAM >> " & LogDir & "sa-learn.log", Verbose, True

   If Verbose Then WScript.Echo "Processing SPAM mails"
   CreateObject("WScript.Shell").Run "cmd /c Echo SPAM: >> " & LogDir & "sa-learn.log", Verbose, True
   CreateObject("WScript.Shell").Run "cmd /c sa-learn.exe --spam --folders=" & TempDir & "SPAM >> " & LogDir & "sa-learn.log", Verbose, True

   If Verbose Then WScript.Echo "Synchronizing Bayes"
   CreateObject("WScript.Shell").Run "cmd /c sa-learn.exe --sync >> " & LogDir & "sa-learn.log", Verbose, True

   If Verbose Then WScript.Echo "Backing up Bayes"
   CreateObject("WScript.Shell").Run "cmd /c sa-learn.exe --backup > " & BayesDir & "bayes_backup", Verbose, True

   CreateObject("WScript.Shell").Run "cmd /c Echo %DATE% %TIME% - STOP >> " & LogDir & "sa-learn.log", Verbose, True
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-26 19:21

OK, this is slowly getting out of hand ... :mrgreen:

Support for "spamc.exe -L" and "sa-learn.exe". For spamc to work you must run spamd with "--allow-tell".

When Batch = 0 (spamc) mode is selected there will be no sync of the database and no backup. Spamd is doing the sync on-the-fly and backup is on you.

sa-learn.vbs Version 0.6.0

Code: Select all

Option Explicit
'
' Version 0.6.0 26/07-2018, Soren Rathje - Experimental support for both sa-learn AND spamc -L.
' Version 0.5.a 26/07-2018, Soren Rathje - Experimental rewrite of filelist to fix curly brace problem in sa-learn.
' Version 0.5.0 25/07-2018, Soren Rathje - Multiple domains, skipping non-existing folders plus reworked code.
' Version 0.4.3 30/05-2018, Soren Rathje - Introduced two special folders; non-delivered SPAM and False Positives.
' Version 0.4.2 28/05-2016, Soren Rathje - Compatibility issues (curly brackets bug in sa-learn).
' Version 0.4.1 25/11-2014, Soren Rathje - Compatibility issues.
' Version 0.4.0 27/10-2014, Soren Rathje - Changed error logging.
' Version 0.3.0 11/10-2014, Soren Rathje - Bugfixing & Log error to Eventlog if IMAPFolder is missing
' Version 0.2.0 30/08-2014, Soren Rathje - Selection changed to DAYS.
' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
'
' Configuration parameters
'
'     Administrative and automation accounts can be excluded from processing
'     by defining them in "ExcludeList"
'
Const Administrator   = "Administrator"
Const Secret          = "########"

Const ExcludeList     = "postmaster@mydomain.tld, blog@mydomain.tld"
Const DomainList      = "mydomain.tld, acme.inc"
Const SPAMFolders     = "SPAM, Junk E-mail, Uønsket e-mail, UCE - HighScore"
Const HAMFolders      = "INBOX, UCE - FPs"

Const Batch           = 1                             ' 0 = spamc, 1 = sa-learn
Const TempDir         = "C:\SpamAssassin\temp\"       ' Need permission for create, read & write
Const LogDir          = "C:\SpamAssassin\logs\"       ' Need permission for create, read & write
Const BayesDir        = "C:\SpamAssassin\bayes_db\"   ' Need permission for create, read & write
Const RetainDays      = 7
Const Verbose         = 1

Sub BuildList(a, mExcludes, b, mDays, mTemp, mType)
   Dim i, j, k, l, strFileName
   Dim oFile, oDomain, oAccount, oMessage, oMessages
   Dim mDomain, mDomains, mFolder, mFolders
   
   If Batch Then Set oFile = oFSO.CreateTextFile(mTemp & mType, True)
   mDomains = Split(a, ",")
   mFolders = Split(b, ",")
   If Verbose Then WScript.Echo "Type: " & mType
   For Each mDomain In mDomains
      mDomain = Trim(mDomain)
      Set oDomain = oApp.Domains.ItemByName(mDomain)
      If Verbose Then WScript.Echo "     Domain: " & oDomain.Name
      For i = 0 To oDomain.Accounts.Count - 1
         Set oAccount = oDomain.Accounts.Item(i)
         If InStr(mExcludes, oAccount.Address) = 0 And oAccount.Active Then
            If Verbose Then WScript.Echo "          Account: " & oAccount.Address
            For Each mFolder In mFolders
               mFolder = Trim(mFolder)
               On Error Resume Next
               If oAccount.IMAPFolders.ItemByName(mFolder) Is Nothing Then
                  On Error Goto 0
               Else
                  On Error Goto 0
                  If Verbose Then WScript.Echo "               Folder: " & mFolder
                  Set oMessages  = oAccount.IMAPFolders.ItemByName(mFolder).Messages
                  If Not IsNull(oMessages) Then
                     For j = 0 To oMessages.Count - 1
                        Set oMessage = oMessages.Item(j)
                        If oMessage.InternalDate > CDate(Now - mDays) Then
                           If Batch Then
                              strFileName = Replace(oMessage.FileName,"\","/")
                              strFileName = Replace(strFileName,"{","\{")
                              strFileName = Replace(strFileName,"}","\}")
                              oFile.Write strFileName & vbCrLf
                           Else
                              oCMD.Run "cmd.exe /C spamc.exe -d " & SAHost & " -p " & SAPort & " -L " & mType & " < " & Chr(34) & oMessage.FileName & Chr(34), Verbose, True
                           End If
                        End If
                     Next
                  End If
               End If
            Next
         End If
      Next
   Next
   If Batch Then oFile.Close
End Sub

Dim oFSO : Set oFSO = CreateObject("Scripting.FileSystemObject")
Dim oApp : Set oApp = CreateObject("hMailServer.Application")
Dim oCMD : Set oCMD = CreateObject("WScript.Shell")
Call oApp.Authenticate(Administrator, Secret)
Dim SAHost : SAHost = oApp.Settings.AntiSpam.SpamAssassinHost
Dim SAPort : SAPort = oApp.Settings.AntiSpam.SpamAssassinPort

If Batch Then oCMD.Run "cmd.exe /C Echo %DATE% %TIME% - START >> " & LogDir & "sa-learn.log", Verbose, True

'
' Find HAM messages
'
If Verbose Then WScript.Echo "Processing HAM mails"
Call BuildList(DomainList, ExcludeList, HAMFolders, RetainDays, TempDir, "ham")

If Batch Then 
   oCMD.Run "cmd.exe /C echo HAM: >> " & LogDir & "sa-learn.log", Verbose, True
   oCMD.Run "cmd.exe /C sa-learn.exe --ham --folders=" & TempDir & "HAM >> " & LogDir & "sa-learn.log", Verbose, True
End If

'
' Find SPAM messages
'
If Verbose Then WScript.Echo "Processing SPAM mails"
Call BuildList(DomainList, ExcludeList, SPAMFolders, RetainDays, TempDir, "spam")

If Batch Then 
   oCMD.Run "cmd.exe /C echo SPAM: >> " & LogDir & "sa-learn.log", Verbose, True
   oCMD.Run "cmd.exe /C sa-learn.exe --spam --folders=" & TempDir & "SPAM >> " & LogDir & "sa-learn.log", Verbose, True
End If

'
' Execute n'Stuff ...
'
If Batch Then
   If Verbose Then WScript.Echo "Synchronizing Bayes"
   oCMD.Run "cmd.exe /C sa-learn.exe --sync >> " & LogDir & "sa-learn.log", Verbose, True

   If Verbose Then WScript.Echo "Backing up Bayes"
   oCMD.Run "cmd.exe /C sa-learn.exe --backup > " & BayesDir & "bayes_backup", Verbose, True

   oCMD.Run "cmd.exe /C echo %DATE% %TIME% - STOP >> " & LogDir & "sa-learn.log", Verbose, True
End If
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-27 00:26

SorenR wrote:
2018-07-26 19:21
OK, this is slowly getting out of hand ... :mrgreen:
Firstly, its QUICKLY, not slowly. :lol: Secondly, its not getting out of hand. You seem to have a firm grasp on things. :D

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-27 01:39

Yeah well... I'm thinking of reorganizing my servers into a 'nix box with a Windows VM. If I move SpamAssassin and ClamAV to Linux, then hMailServer is the only thing left on Windows, so for that I need spamc.

An AMD Ryzen based system is less € per core and less KWh than Intel. The AMD Ryzen 7 2700 might be the thing with 65W idle and 137W at 56 FPS - and it's less than 300 USD.
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-27 02:59

Meh... When I get rid of my 80 amps worth of stove and clothes dryer I'll start worrying about server efficiency. :mrgreen:

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-27 11:59

I'm not seeing any Bayes related tests in the header reports so I did a little digging.

Code: Select all

spamassassin -D --lint
There is a minimum number of tokens to be learned before SA actually implements Bayes. The default is 200 messages. I have 196 currently so I guess I'll just leave it, but you can change the minimum number with bayes_min_spam_num.

So it seems my drop in false positives was due solely to rules changes. That's good to know. After Bayes is in use a while, I'll tighten up the rules bit by bit to find a good balance.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-27 12:30

palinka wrote:
2018-07-27 11:59
I'm not seeing any Bayes related tests in the header reports so I did a little digging.

Code: Select all

spamassassin -D --lint
There is a minimum number of tokens to be learned before SA actually implements Bayes. The default is 200 messages. I have 196 currently so I guess I'll just leave it, but you can change the minimum number with bayes_min_spam_num.

So it seems my drop in false positives was due solely to rules changes. That's good to know. After Bayes is in use a while, I'll tighten up the rules bit by bit to find a good balance.
200 is peanuts... Not sure where in the world you are, but you could try my database. Remember to save your own backup copy first!. I have been getting SPAM from all around the world for the past 4 years.
sa-learn.exe --restore bayes_backup will clear your existing database and load database from the backup file.
https://www.lolle.org/images/bayes_back ... 7_2018.rar

Also. I use KAM ...

Code: Select all

sa-update.exe -v --nogpg --channelfile UpdateChannels
wget -q http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf -O c:\spamassassin\etc\spamassassin\KAM.cf
net stop spamassassin
net start spamassassin
Last edited by SorenR on 2018-07-27 12:32, edited 1 time in total.
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-27 12:32

I'm just starting. Baby steps. :mrgreen:

Thanks for the info and the db. I'll give it a whirl.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 00:44

I'm trying out 0.6 but I get "permission denied" error (line 39 char 18). I ran from command prompt as administrator.

Code: Select all

   If Batch Then Set oFile = oFSO.CreateTextFile(mTemp & mType, True)
Not sure if this has anything to do with it, but I know for TempDir, LogDir and BayesDir, "Need permission for create, read & write" and both SYSTEM and administrators have full control over all those folders.

I then tried 0.5.a and got the same error:

Code: Select all

      Set oFile = oFSO.CreateTextFile(mTemp & mType, True)
Its monkey see monkey do with me and scripting. :( But I can usually muddle through if I get pointed in the right direction.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 00:51

Do you have a folder called HAM or SPAM in the temp directory ??

My c:\spamassassin\temp had a HAM and a SPAM folder and the script was driving me mad with the same "no permission" error ... DOH! I forgot that I renamed the HAM and SPAM file with no extension in the final version. :mrgreen:
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 04:16

Deleted spam/ham folders in temp and the script ran without permission issues. I ran it verbose and it seems to work. It went through all the accounts, created ham and spam files in temp, but the log is empty:

Fri 07/27/2018 21:29:43.65 - START
HAM:
SPAM:
Fri 07/27/2018 21:30:43.50 - STOP

Also, it obliterated my bayes_backup. Since I'm not sure what is going on, I restored the backup that I backed up before running 0.6. After running 0.6 again, bayes_backup is still 0 kb. Before it was 5mb.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 12:46

palinka wrote:
2018-07-28 04:16
Deleted spam/ham folders in temp and the script ran without permission issues. I ran it verbose and it seems to work. It went through all the accounts, created ham and spam files in temp, but the log is empty:

Fri 07/27/2018 21:29:43.65 - START
HAM:
SPAM:
Fri 07/27/2018 21:30:43.50 - STOP

Also, it obliterated my bayes_backup. Since I'm not sure what is going on, I restored the backup that I backed up before running 0.6. After running 0.6 again, bayes_backup is still 0 kb. Before it was 5mb.
Hmm... That would indicate it is not running sa-learn.exe as any output is piped ">>" into sa-learn.log

What OS are you on?

Any spaces in any paths?

You never tried 0.5.a ?
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 13:43

Ok... Two steps forward and one step back...

I suspect there continues to be an issue with curly brackets.

Version 0.6.1 is back to creating HAM.CMD and SPAM.CMD that will copy mails to c:\spamassassin\temp\ham and c:\spamassassin\temp\spam. SA-LEARN.CMD does the actual learning.

DO NOT forget to delete the HAM and SPAM files in .\temp or you WILL get the permission issue. You need to recreate the HAM and SPAM folders in .\temp.
Attachments
sa-learn.0.6.1.rar
(2.22 KiB) Downloaded 124 times
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 14:05

SorenR wrote:
2018-07-28 12:46
Hmm... That would indicate it is not running sa-learn.exe as any output is piped ">>" into sa-learn.log

What OS are you on?

Any spaces in any paths?

You never tried 0.5.a ?
* windows 10 pro

* no path in the user variables have spaces, BUT! I just realized one critical path does, but it does not appear in 0.5.a or 0.6

C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe

I tried to add the full path manually, but got a script error. I think it might have something to do with changing paths - in my case all the bayes, temp, etc are in X: while spamassassin is in C:. Or it could be the quotation marks I used around the path because it had spaces in it. Not sure.

Like i said before, its monkey see, monkey do for me. :mrgreen: But I can read and follow directions and I know the path issue came up before in this thread. I'll have a re-read and see how the poop was thrown so i can throw poop the same way. :mrgreen:

If nothing else works, I'll try moving temp, etc to C:

And I saw your latest version, but i have to go on a bike riding date with my wife, so this stuff will have to wait until this afternoon. :)

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 14:49

Windows is notoriously known for it's weird handling of spaces in paths and filenames. I have been advocating for years - actually since WFW 3.1 - to avoid spaces in boths. :mrgreen:

Happy trailing ... I remember the last bike ride I had with my wife was up L'Alpe d'Huez about 8 years ago. I was driving my Audi S4 2.7T Avant videofilming and carrying the cold water :mrgreen:
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 16:25

Wife late getting ready for bike ride. **shocker** :mrgreen: So I had a few minutes to check out 0.6.1 and everything is working again. I guess its about the same as .5 except for the spamc part?

One tiny issue: Const Verbose = is set to 0 but I get windows script host popups. I added "If Verbose Then" to "WScript.Echo "Copying SPAM mails"" and the 2 others it was missing at and that fixed it.

All good again. :D


edit - forgot to mention that the bayes backup is working again. I guess that's due to sa-learn.exe working.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 18:01

palinka wrote:
2018-07-28 16:25
Wife late getting ready for bike ride. **shocker** :mrgreen: So I had a few minutes to check out 0.6.1 and everything is working again. I guess its about the same as .5 except for the spamc part?

One tiny issue: Const Verbose = is set to 0 but I get windows script host popups. I added "If Verbose Then" to "WScript.Echo "Copying SPAM mails"" and the 2 others it was missing at and that fixed it.

All good again. :D


edit - forgot to mention that the bayes backup is working again. I guess that's due to sa-learn.exe working.
Yeah, it seems like I did forget to add that :oops:

Not all prompts can be supressed, the Verbose setting only handles the shell, not individual programs not complying to Windows standards. My server is running "headless" så I never see them anyway.

C:\>WScript /H:CScript
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 18:51

I'm not sure if sa-learn --sync is working. I took a dump. :lol:

Code: Select all

C:\Program Files\JAM Software\SpamAssassin for Windows>sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        206          0  non-token data: nspam
0.000          0       4736          0  non-token data: nham
0.000          0     184009          0  non-token data: ntokens
0.000          0 1531406382          0  non-token data: oldest atime
0.000          0 1532786693          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0 1532787700          0  non-token data: last expiry atime
0.000          0    1382400          0  non-token data: last expire atime delta
0.000          0       2470          0  non-token data: last expire reduction count
Notice the journal didn't sync from running the script, so I manually synced.

Code: Select all

C:\Program Files\JAM Software\SpamAssassin for Windows>sa-learn --sync
bayes: synced databases from journal in 0 seconds: 1034 unique entries (1045 total entries)
That worked and per the SA documentation, my journal file was deleted after syncing.

Took another dump.

Code: Select all

C:\Program Files\JAM Software\SpamAssassin for Windows>sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        206          0  non-token data: nspam
0.000          0       4736          0  non-token data: nham
0.000          0     184009          0  non-token data: ntokens
0.000          0 1531406382          0  non-token data: oldest atime
0.000          0 1532795366          0  non-token data: newest atime
0.000          0 1532795545          0  non-token data: last journal sync atime
0.000          0 1532787700          0  non-token data: last expiry atime
0.000          0    1382400          0  non-token data: last expire atime delta
0.000          0       2470          0  non-token data: last expire reduction count
This time the journal shows a sync date. I'll wait for some more spam to get processed and tomorrow I'll try again and see if the journal sync date changed along with running the script. But right now it appears that its not syncing the journal. BUT IT IS creating the journal.

From sa-learn.CMD:

Code: Select all

"C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe" --sync >> X:\xampp\htdocs\domain1.tld\status\sa-log\sa-learn.log
I just realized that nothing appears in the log for --sync
Sat 07/28/2018 10:21:29.65 - START
HAM:
Learned tokens from 0 message(s) (407 message(s) examined)
SPAM:
Learned tokens from 0 message(s) (33 message(s) examined)
???SOMETHING MISSING HERE???
Sat 07/28/2018 10:22:16.97 - STOP
I'll try simply deleting the log from from that line and see what happens - like this:

Code: Select all

"C:\Program Files\JAM Software\SpamAssassin for Windows\sa-learn.exe" --sync
But I still think I should wait for some spam to come in or there may be no journal to begin with.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 19:38

Please ignore my last post completely. I cannot replicate it after running with and without >> logfile.log after sa-learn --sync

The journal seems to sync fine.

But that still leaves me with one small question - should there be a log entry for sa-learn --sync?

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-07-28 22:36

palinka wrote:
2018-07-28 19:38
Please ignore my last post completely. I cannot replicate it after running with and without >> logfile.log after sa-learn --sync

The journal seems to sync fine.

But that still leaves me with one small question - should there be a log entry for sa-learn --sync?
I reused the system call and left the data pipe in there, just in case. I believe it is supposed to be quiet.

You could spice it up with sa-learn --sync -D ;-)
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-28 22:45

SorenR wrote:
2018-07-28 22:36
You could spice it up with sa-learn --sync -D ;-)
I like jalapeños but that's a little too spicy for muh log file. :-)

I've noticed that sometimes spamassassin doesn't take changes immediately even after restarting the service. There's definitely some weirdness going on one in a while. I guess chalk the journal thing up to that.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-07-29 01:22

SorenR wrote:
2018-07-28 22:36
I reused the system call and left the data pipe in there, just in case. I believe it is supposed to be quiet.
I did get this from running it manually.

Code: Select all

C:\Program Files\JAM Software\SpamAssassin for Windows>sa-learn --sync
bayes: synced databases from journal in 0 seconds: 1034 unique entries (1045 total entries)
So it looks like something should come out of it.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-01 12:22

Code: Select all

Wed 08/01/2018  4:30:16.74 - START 
HAM: 
Learned tokens from 85 message(s) (426 message(s) examined)
SPAM: 
Learned tokens from 5 message(s) (38 message(s) examined)
bayes: synced databases from journal in 0 seconds: 43 unique entries (43 total entries)
Wed 08/01/2018  4:31:04.94 - STOP 
I found a Bayes log entry in today's entry. First one I've seen (unattended). I wonder if SA waits for a certain number of new tokens or number of days before syncing the journal.

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-08-01 13:05

palinka wrote:
2018-08-01 12:22

Code: Select all

Wed 08/01/2018  4:30:16.74 - START 
HAM: 
Learned tokens from 85 message(s) (426 message(s) examined)
SPAM: 
Learned tokens from 5 message(s) (38 message(s) examined)
bayes: synced databases from journal in 0 seconds: 43 unique entries (43 total entries)
Wed 08/01/2018  4:31:04.94 - STOP 
I found a Bayes log entry in today's entry. First one I've seen (unattended). I wonder if SA waits for a certain number of new tokens or number of days before syncing the journal.
I was actually thinking of skipping the --sync and use --force-expire to see if that could make a difference. --sync is included in the process of expiring records in the Bayes database.

https://wiki.apache.org/spamassassin/BayesForceExpire

https://discussions.apple.com/thread/1077083
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-01 13:40

From the links you provided, it looks like (in relation to the script) it's 6 of one, half a dozen of the other.

When using the script, auto learn is off. Therefore tokens are collected only on running the script (daily for me). The script runs --sync, which, it appears to me, both syncs the journal and expires old tokens according to the default settings (again, for me, because I did not modify max db size, etc in local.cf).

I think force expire may only have an effect if auto learn is on? Am I understanding that correctly?

The only question is: does the journal sync every time the script is run? The link says it runs "opportunistically". I have watched, while running the script manually, the journal disappear from file explorer window. I do believe it's getting synced every time --sync is called. And because the log output is random for --sync, I can't help buy to think it's a bug of some kind.

Or maybe that's all bs because I have a journal file in the db folder *right now*. :roll: That could mean it's not syncing new tokens until it decides on some random schedule and all my Bayes learning is not being put into action until that time????

I think I want to wait and see. I'll keep an eye on the journal size and try to figure out next time it syncs with the db. You try force expire and see if it syncs daily. :D

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-08-01 13:52

palinka wrote:
2018-08-01 13:40
From the links you provided, it looks like (in relation to the script) it's 6 of one, half a dozen of the other.

When using the script, auto learn is off. Therefore tokens are collected only on running the script (daily for me). The script runs --sync, which, it appears to me, both syncs the journal and expires old tokens according to the default settings (again, for me, because I did not modify max db size, etc in local.cf).

I think force expire may only have an effect if auto learn is on? Am I understanding that correctly?

The only question is: does the journal sync every time the script is run? The link says it runs "opportunistically". I have watched, while running the script manually, the journal disappear from file explorer window. I do believe it's getting synced every time --sync is called. And because the log output is random for --sync, I can't help buy to think it's a bug of some kind.

Or maybe that's all bs because I have a journal file in the db folder *right now*. :roll: That could mean it's not syncing new tokens until it decides on some random schedule and all my Bayes learning is not being put into action until that time????

I think I want to wait and see. I'll keep an eye on the journal size and try to figure out next time it syncs with the db. You try force expire and see if it syncs daily. :D
SpamAssassin does not do anything on its own, only when called. So ... If I call SpamAssassin to rate an email and it decides to sync/expire data, it could take (from what I read) up to 25 seconds before it proceeds to rate the email and by that time hMailServer has moved on to the next email. I do occationally see errors in my log that SpamAssassin did not respond in time ...

Anyways, I've modded my script (sa-learn.cmd) and local.cf, so we'll see after a while if there is any change.

BTW, I don't see my journal. My Bayes is stored in MySQL.
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-01 14:51

I looked at my journal. It was created this morning after the first message received after running the script (which sa-learn would delete after --sync). New journal created at first message processed by spamassassin.

From documentation:
bayes_journal
While SpamAssassin is scanning mails, it needs to track which tokens it uses in its calculations. To avoid the contention of having each SpamAssassin process attempting to gain write access to the Bayes DB, the token timestamps are written to a 'journal' file which will later (either automatically or via sa-learn --sync) be used to synchronize the Bayes DB.
So the journal contains references to tokens previously learned while processing messages. This is a reinforcing mechanism. Therefore, I'm going back to my original estimation: force_expire and sync are effectively the same if run on a schedule (vs automatically).

If auto learn is off, no tokens should be created or expire, nor should the journal sync until --sync (or --force_expire) is called.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-01 15:20

SorenR wrote:
2018-08-01 13:52
SpamAssassin does not do anything on its own, only when called.
Exactly
So ... If I call SpamAssassin to rate an email and it decides to sync/expire data, it could take (from what I read) up to 25 seconds before it proceeds to rate the email and by that time hMailServer has moved on to the next email. I do occationally see errors in my log that SpamAssassin did not respond in time ...
Aha! See above:
SpamAssassin does not do anything on its own, only when called.
Including syncing journal, learning and expiring tokens. ;-) (unless auto learn is on, in which case it appears spamassassin decides when to do these things)

I think your timeouts are unrelated. Or, I'm getting too big for my britches, which could he the case since I'm making a lot of assumptions. :mrgreen:

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-08-01 18:54

Assumption is the Mother of All F*** Ups. :mrgreen:
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

User avatar
SorenR
Senior user
Senior user
Posts: 3564
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2018-08-03 23:13

Was going through my local.cf ... Well ...

# bayes_learn_to_journal (default: 0)

Exactly, my system has been learning directly into the database with no journal all the time.

I think I'll skip the --sync and --force-expire all together :mrgreen:
SørenR.

“With age comes wisdom, but sometimes age comes alone.”
- Oscar Wilde

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-06 01:57

That is interesting. I'll have to look at my local.cf. I don't think that entry exists in mine, hashed or not.

I definitely have a journal and it disappears after --sync.

palinka
Senior user
Senior user
Posts: 1916
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-27 11:25

Every morning I check the log (because I'm anal like that). I like the way Jimimaseye's backup & cleardown script emails a report, so I used that as a template for a task script that runs after sa-learn.vbs.

1) wait for sa-learn to finish - I think because sa-learn.cmd runs separately, the task scheduler moves on before sa-learn.cmd is done. So if the maintenance tasks run before that, it will screw up everything and you'll get a partial log. So let it wait. I set it to 5 minutes.
2) email the log using BLAT.exe (must have blat.exe to work) - assumes auth required
3) cycle the log - rename it to today's date. That way the one that get's emailed is only for that day, not a cumulative history
4) delete logs older than X days

sa-learn-tasks.bat

Code: Select all

Echo Off
rem   #### CONFIG START ####
rem  *******  FILL OUT VARIABLES BELOW  ***************************
set emailRecipient=admin@mydomain.tld
set emailFrom="SA-Learn <system@mydomain.tld>"
set emailSubject="Daily SA-Learn Log Report"
set authUser=system@mydomain.tld
set authPass=SecretPassword
set emailPort=587

rem ---  SET FULL PATH TO sa-learn.log  *** WARNING - ASSUMES LOG NAME IS sa-learn.log!!!!! *** ----
set learnLog="X:\sa-learn\sa-log\sa-learn.log"

rem --- SET LOG PATH (NO TRAILING SLASH "\") ---
set logPath="X:\sa-learn\sa-log"

rem --- SET NUMBER OF DAYS TO KEEP LOGS (DELETE AFTER "X" DAYS)
set daysToKeep=7

rem --- SET FULL PATH TO blat.exe ----
set BLATpath="C:\blat\full\blat.exe"

rem  ****  FILL OUT VARIABLES ABOVE  *******************************
rem   #### CONFIG END ####

rem --- GIVE SA-LEARN TIME TO FINISH OR YOU WILL BE SENDING PARTIAL LOG --- TIME IN SECONDS ---
:LETITFINISH
PING -n 300 127.0.0.1>nul

rem --- MAIL THE LOG USING BLAT.EXE ---
:MAILIT
%BLATpath% %learnLog% -mailfrom %emailFrom% -to %emailRecipient% -subject %emailSubject% -server localhost -u %authUser% -pw %authPass% -port %emailPort%

rem --- CYCLE THE LOG NAME TO TODAY'S DATE ---
:RENAMEIT
IF EXIST %learnLog% GOTO RENAME

IF NOT EXIST %learnLog% GOTO NOEXIST

rem *** WARNING - ASSUMES LOG NAME IS sa-learn.log!!! CHANGE BELOW IF DIFFERENT!!! ***
:RENAME
for /F "usebackq tokens=1,2 delims==" %%i in (`wmic os get LocalDateTime /VALUE 2^>NUL`) do if '.%%i.'=='.LocalDateTime.' set ldt=%%j
ren %learnLog% sa-learn-%ldt:~0,4%-%ldt:~4,2%-%ldt:~6,2%.log

GOTO DELETEIT

:NOEXIST

ECHO NO FILE EXISTS

rem --- DELETE LOG FILES OLDER THAN X DAYS ---
:DELETEIT
forfiles /p %logPath% /s /m *.* /D -%daysToKeep% /C "cmd /c del @path"
Run this as second action after sa-learn.vbs in the sa-learn scheduled task.

One thing I noticed is that task scheduler returns 0x1 after it completes the task. This is due to the "delete older than x days" throwing an error for not finding files older than x days (when you first set it up there is only the one log file to be renamed and nothing older than x days to delete). I threw some older hmailserver logs into the sa-learn.log folder and the task completed normally. I think the task should complete with 0x1 and everything should work until x days have passed and the error goes away. Or just do what I did and throw some old hms logs in to prevent the error. They will disappear after x days and anyway, the report gets emailed, so you don't even need to look at the log anymore.
Last edited by palinka on 2018-08-27 11:37, edited 1 time in total.

Post Reply