Need help creating rule

Use this forum for discussions about SpamAssassin and anti-spam in general.
Post Reply
palinka
Senior user
Senior user
Posts: 4455
Joined: 2017-09-12 17:57

Need help creating rule

Post by palinka » 2022-05-15 23:59

As part of my pURI-BL project, I want to create a custom ruleset. I can write basic rules, but my issue here is how that I'm looking to test body against hundreds or thousands of URLs, so its *probably* too long to create a regex string from all the URLs.

Ideally, a custom plugin would be better so the ruleset could be created using one URL per line, but I'm totally lost with perl.

Any hints or ideas?

User avatar
katip
Senior user
Senior user
Posts: 1158
Joined: 2006-12-22 07:58
Location: Istanbul

Re: Need help creating rule

Post by katip » 2022-05-16 09:39

i would suggest to test the body prior to delivery.
to avoid performance issues just first 100 lines might be sufficient i think.
oMessage.Body (HTMLBody) is a string and i suppose it can be VBS Split by VbCrLf. you RegEx-check first 100 lines, if any contains an URL (see below for a good pattern) extract and push it into an array, then lookup each item in your DB table.

Code: Select all

(("[^<>@\\]+")|([^<> @\"]+))@(\[([0-9]{1,3}\.){3}[0-9]{1,3}\]|(?=.{1,255}$)((?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9])(|\.(?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9]){1,126})
according to the result (bad URL found or not) you add a header such as X-HMS-BadURL = True
then you can play with rules as you like based on existence of this header..
just throwing out of my head :)
Katip
--
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8

User avatar
katip
Senior user
Senior user
Posts: 1158
Joined: 2006-12-22 07:58
Location: Istanbul

Re: Need help creating rule

Post by katip » 2022-05-16 10:29

sorry for nonsense pattern. here a good one:
https://regexr.com/39nr7
Katip
--
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8

palinka
Senior user
Senior user
Posts: 4455
Joined: 2017-09-12 17:57

Re: Need help creating rule

Post by palinka » 2022-05-16 12:43

katip wrote:
2022-05-16 09:39
i would suggest to test the body prior to delivery.
to avoid performance issues just first 100 lines might be sufficient i think.
oMessage.Body (HTMLBody) is a string and i suppose it can be VBS Split by VbCrLf. you RegEx-check first 100 lines, if any contains an URL (see below for a good pattern) extract and push it into an array, then lookup each item in your DB table.

Code: Select all

(("[^<>@\\]+")|([^<> @\"]+))@(\[([0-9]{1,3}\.){3}[0-9]{1,3}\]|(?=.{1,255}$)((?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9])(|\.(?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9]){1,126})
according to the result (bad URL found or not) you add a header such as X-HMS-BadURL = True
then you can play with rules as you like based on existence of this header..
just throwing out of my head :)
Good idea. I already split the body at "</head>" when it exists so I don't pick up things like w3.org and style urls.

Post Reply