Subject Decoding

Use this forum if you have problems with a hMailServer script, such as hMailServer WebAdmin or code in an event handler.
Post Reply
User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Subject Decoding

Post by RvdH » 2021-04-25 20:07

Probably one for SorenR :)

How do i decode the subject value? I like to save the mail as eml based on the datetime combined with subject

oMessage.Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=

This is how the message is tried to be saved (which fails due to the ? characters)

Output: E:\Email\Spam\2021042519342835_Restock alert=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=.eml

Code: Select all

dim mySubject : mySubject = Empty
dim myRegExp
Set myRegExp = New RegExp
myRegExp.Global = False
myRegExp.IgnoreCase = True
REM myRegExp.Pattern = "^(.+)?(\[(PossibleSpam|Spam)\]\s?\[\d+?\.\d\]\s?)|[\\\/\:\*\" & chr(63) & "\" & chr(34) & "\<\>\|]"
myRegExp.Pattern = "^(.+)?(\[(PossibleSpam|Spam)\]\s?\[\d+?\.\d\]\s?)"
If (oMessage.Subject <> "") then
	oMessage.EncodeFields = False
	If (myRegExp.Test(oMessage.Subject)) then
		mySubject = myRegExp.Replace( oMessage.Subject, "$1" )
	End If
	REM A filename in Windows cannot contain any of the following characters: \ / : * ? " < > |
	REM myRegExp.Pattern = "[\\\/\:\*\?\""\<\>\|]"
	myRegExp.Pattern = "([\x2F\x5C\x3A\x2A\x3F\x22\x3C\x3E\x7C]+)"
	If (Not IsEmpty(mySubject) And myRegExp.Test(mySubject)) then
		mySubject = myRegExp.Replace( mySubject, Empty )
	ElseIf (myRegExp.Test(oMessage.Subject)) then
		mySubject = myRegExp.Replace( oMessage.Subject, Empty )
	End If
End if
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 20:09

would this addition work?

Code: Select all

REM Remove UTF8
myRegExp.Pattern = "[^\x00-\x7F]+"
If (myRegExp.Test(mySubject)) then
	mySubject = myRegExp.Replace( mySubject, Empty )
End If

Code: Select all

dim mySubject : mySubject = Empty
dim myRegExp
Set myRegExp = New RegExp
myRegExp.Global = False
myRegExp.IgnoreCase = True
REM myRegExp.Pattern = "^(.+)?(\[(PossibleSpam|Spam)\]\s?\[\d+?\.\d\]\s?)|[\\\/\:\*\" & chr(63) & "\" & chr(34) & "\<\>\|]"
myRegExp.Pattern = "^(.+)?(\[(PossibleSpam|Spam)\]\s?\[\d+?\.\d\]\s?)"
If (oMessage.Subject <> "") then
	oMessage.EncodeFields = False
	REM EventLog.Write("Subject (before):" & oMessage.Subject)
	If (myRegExp.Test(oMessage.Subject)) then
		mySubject = myRegExp.Replace( oMessage.Subject, "$1" )
	End If
	REM Remove UTF8
	myRegExp.Pattern = "[^\x00-\x7F]+"
	If (myRegExp.Test(mySubject)) then
		mySubject = myRegExp.Replace( mySubject, Empty )
	End If
	REM A filename in Windows cannot contain any of the following characters: \ / : * ? " < > |
	REM myRegExp.Pattern = "[\\\/\:\*\?\""\<\>\|]"
	myRegExp.Pattern = "([\x2F\x5C\x3A\x2A\x3F\x22\x3C\x3E\x7C]+)"
	If (Not IsEmpty(mySubject) And myRegExp.Test(mySubject)) then
		mySubject = myRegExp.Replace( mySubject, Empty )
	ElseIf (myRegExp.Test(oMessage.Subject)) then
		mySubject = myRegExp.Replace( oMessage.Subject, Empty )
	End If
End if
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 20:26

mmmm.... or does oMessage.EncodeFields = true to make that removal of UFT8 regex work?
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 22:05

RvdH wrote:
2021-04-25 20:26
mmmm.... or does oMessage.EncodeFields = true to make that removal of UFT8 regex work?

Code: Select all

Dim ECFlag : ECFlag = oMessage.EncodeFields
oMessage.EncodeFields = False

'Do stuff with raw headers...

oMessage.EncodeFields = ECFlag
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 22:08

SorenR wrote:
2021-04-25 22:05
RvdH wrote:
2021-04-25 20:26
mmmm.... or does oMessage.EncodeFields = true to make that removal of UFT8 regex work?

Code: Select all

Dim ECFlag : ECFlag = oMessage.EncodeFields
oMessage.EncodeFields = False

'Do stuff with raw headers...

oMessage.EncodeFields = ECFlag
Thx
Will give it shot.... have to wait for a message containing [PossibleSpam] en emoji in Subject coming thru....not much spam on sundays :mrgreen:
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 22:13

Code: Select all

Function Base64Encode(ByVal sText)
   Dim fAsUtf16LE
   fAsUtf16LE=0

   ' Use an aux. XML document with a Base64-encoded element.
   ' Assigning the byte stream (array) returned by StrToBytes() to .NodeTypedValue
   ' automatically performs Base64-encoding, whose result can then be accessed
   ' as the element's text.
   With CreateObject("Msxml2.DOMDocument").CreateElement("aux")
      .DataType = "bin.base64"
      If fAsUtf16LE Then
         .NodeTypedValue = StrToBytes(sText, "utf-16le", 2)
      Else
         .NodeTypedValue = StrToBytes(sText, "utf-8", 3)
      End If
      Base64Encode = .Text
   End With

End Function


' Decodes the specified Base64-encoded string.
' If the decoded string's original encoding was:
' * UTF-8, pass False for fIsUtf16LE.
' * UTF-16 LE, pass True for fIsUtf16LE.
Function Base64Decode(ByVal sBase64EncodedText)
   Dim fAsUtf16LE
   fAsUtf16LE=0
   Dim sTextEncoding
   If fIsUtf16LE Then sTextEncoding = "utf-16le" Else sTextEncoding = "utf-8"

   ' Use an aux. XML document with a Base64-encoded element.
   ' Assigning the encoded text to .Text makes the decoded byte array
   ' available via .nodeTypedValue, which we can pass to BytesToStr()
   With CreateObject("Msxml2.DOMDocument").CreateElement("aux")
      .DataType = "bin.base64"
      .Text = sBase64EncodedText
      Base64Decode = BytesToStr(.NodeTypedValue, sTextEncoding)
   End With

End Function


' Returns a binary representation (byte array) of the specified string in
' the specified text encoding, such as "utf-8" or "utf-16le".
' Pass the number of bytes that the encoding's BOM uses as iBomByteCount;
' pass 0 to include the BOM in the output.
Function StrToBytes(ByVal sText, ByVal sTextEncoding, ByVal iBomByteCount)

   ' Create a text string with the specified encoding and then
   ' get its binary (byte array) representation.
   With CreateObject("ADODB.Stream")
      ' Create a stream with the specified text encoding...
      .Type = 2  ' adTypeText
      .Charset = sTextEncoding
      .Open
      .WriteText sText
      ' ... and convert it to a binary stream to get a byte-array
      ' representation.
      .Position = 0
      .Type = 1  ' adTypeBinary
      .Position = iBomByteCount ' skip the BOM
      StrToBytes = .Read
      .Close
   End With

End Function

' Returns a string that corresponds to the specified byte array, interpreted
' with the specified text encoding, such as "utf-8" or "utf-16le".
Function BytesToStr(ByVal byteArray, ByVal sTextEncoding)

   If LCase(sTextEncoding) = "utf-16le" Then
      ' UTF-16 LE happens to be VBScript's internal encoding, so we can
      ' take a shortcut and use CStr() to directly convert the byte array
      ' to a string.
      BytesToStr = CStr(byteArray)
   Else ' Convert the specified text encoding to a VBScript string.
      ' Create a binary stream and copy the input byte array to it.
      With CreateObject("ADODB.Stream")
         .Type = 1 ' adTypeBinary
         .Open
         .Write byteArray
         ' Now change the type to text, set the encoding, and output the
         ' result as text.
         .Position = 0
         .Type = 2 ' adTypeText
         .CharSet = sTextEncoding
         BytesToStr = .ReadText
         .Close
      End With
   End If

End Function
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 22:21

But is it BASE64 encoded?

Btw, this is also a bit weird....

Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=
X-Spam-Prev-Subject: Restock alert: lage chelsea boots =?UTF-8?B?4p2k77iP?=
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 22:47

RvdH wrote:
2021-04-25 22:21
But is it BASE64 encoded?

Btw, this is also a bit weird....

Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=
X-Spam-Prev-Subject: Restock alert: lage chelsea boots =?UTF-8?B?4p2k77iP?=
"?utf-8?B?" = utf-8 base64

Try paste it here.. https://www.base64decode.org/

You forgot the heart ;-) "lage chelsea boots ❤️"
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 23:07

I don't get it....why is the subject changing?

So I always have to strip =?utf-8?B? before decoding....thats crap!

Windows is able to write emoji's to filename's, right? I simply need a raw subject....no encodings
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 23:32

So this is/was the original subject:

X-Spam-Prev-Subject: Restock alert: lage chelsea boots =?UTF-8?B?4p2k77iP?=

[PossibleSpam] is added by hmailserver when score > Spam mark threshold

[4.5] is added by spamassassin
rewrite_header Subject [_HITS_]

Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=


All of the sudden "lage chelsea boots" is encoded as well
Something fishy going on here.... i can live with it if i only get the ASCI part en ignore emoji's
Last edited by RvdH on 2021-04-25 23:40, edited 1 time in total.
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 23:38

The form is: "=?charset?encoding?encoded text?="

https://en.wikipedia.org/wiki/MIME#Encoded-Word
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 23:41

RvdH wrote:
2021-04-25 23:32
So this is/was the original subject:

X-Spam-Prev-Subject: Restock alert: lage chelsea boots =?UTF-8?B?4p2k77iP?=

[PossibleSpam] is added by hmailserver when score > Spam mark threshold

[4.5] is added by spamassassin
rewrite_header Subject [_HITS_]

Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=


All of the sudden "lage chelsea boots" is encoded as well
Something fishy going on here....
=?UTF-8?B?4p2k77iP?= is the heart. It is perfectly legal to mix clear text and base64 encoded binary.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 23:44

SorenR wrote:
2021-04-25 23:41
RvdH wrote:
2021-04-25 23:32
So this is/was the original subject:

X-Spam-Prev-Subject: Restock alert: lage chelsea boots =?UTF-8?B?4p2k77iP?=

[PossibleSpam] is added by hmailserver when score > Spam mark threshold

[4.5] is added by spamassassin
rewrite_header Subject [_HITS_]

Subject: [PossibleSpam] [4.5] Restock alert:=?utf-8?B?IGxhZ2UgY2hlbHNlYSBib290cyDinaTvuI8=?=


All of the sudden "lage chelsea boots" is encoded as well
Something fishy going on here....
=?UTF-8?B?4p2k77iP?= is the heart. It is perfectly legal to mix clear text and base64 encoded binary.
All of the sudden "lage chelsea boots" is encoded as well?
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 23:45

I don't care if it legal in a Subject...i need/like to extract valid string to write as a filename as described
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-25 23:46

I have previously used this to remove EMOJI's - then I upgraded MySQL to use utf8mb4 :mrgreen:

Code: Select all

Sub FixEMOJI(oMessage)
    '
    '   With surrogate pairs, a Unicode code point from range U+D800 to U+DBFF (called "high surrogate")
    '   gets combined with another Unicode code point from range U+DC00 to U+DFFF (called "low surrogate")
    '   to generate a whole new character, allowing the encoding of over one million additional characters.
    '
    '   http://www.unicode.org/Public/emoji/11.0/emoji-test.txt
    '
    '   Dim strEmoji : strEmoji = "(?:[\u2700-\u27BF]|(?:\uD83C[\uDDE6-\uDDFF]){2}|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u0023-\u0039]\uFE0F?\u20E3|\u3299|\u3297|\u303D|\u3030|\u24C2|\uD83C[\uDD70-\uDD71]|\uD83C[\uDD7E-\uDD7F]|\uD83C\uDD8E|\uD83C[\uDD91-\uDD9A]|\uD83C[\uDDE6-\uDDFF]|[\uD83C[\uDE01-\uDE02]|\uD83C\uDE1A|\uD83C\uDE2F|[\uD83C[\uDE32-\uDE3A]|[\uD83C[\uDE50-\uDE51]|\u203C|\u2049|[\u25AA-\u25AB]|\u25B6|\u25C0|[\u25FB-\u25FE]|\u00A9|\u00AE|\u2122|\u2139|\uD83C\uDC04|[\u2600-\u26FF]|\u2B05|\u2B06|\u2B07|\u2B1B|\u2B1C|\u2B50|\u2B55|\u231A|\u231B|\u2328|\u23CF|[\u23E9-\u23F3]|[\u23F8-\u23FA]|\uD83C\uDCCF|\u2934|\u2935|[\u2190-\u21FF])"
    '   Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    '   Dim strEmoji : strEmoji = "^[\0\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u2600-\u26FF]"
    '
    Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    If Lookup(strEmoji, oMessage.Subject) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-Subject") = oMessage.Subject
            oMessage.Subject = Trim(.Replace(oMessage.Subject, ""))
            oMessage.Save
        End With
    End If
    If Lookup(strEmoji, oMessage.From) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-From") = oMessage.From
            oMessage.From = Trim(.Replace(oMessage.From, ""))
            oMessage.Save
        End With
    End If
End Sub
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-25 23:51

SorenR wrote:
2021-04-25 23:46
I have previously used this to remove EMOJI's - then I upgraded MySQL to use utf8mb4 :mrgreen:

Code: Select all

Sub FixEMOJI(oMessage)
    '
    '   With surrogate pairs, a Unicode code point from range U+D800 to U+DBFF (called "high surrogate")
    '   gets combined with another Unicode code point from range U+DC00 to U+DFFF (called "low surrogate")
    '   to generate a whole new character, allowing the encoding of over one million additional characters.
    '
    '   http://www.unicode.org/Public/emoji/11.0/emoji-test.txt
    '
    '   Dim strEmoji : strEmoji = "(?:[\u2700-\u27BF]|(?:\uD83C[\uDDE6-\uDDFF]){2}|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u0023-\u0039]\uFE0F?\u20E3|\u3299|\u3297|\u303D|\u3030|\u24C2|\uD83C[\uDD70-\uDD71]|\uD83C[\uDD7E-\uDD7F]|\uD83C\uDD8E|\uD83C[\uDD91-\uDD9A]|\uD83C[\uDDE6-\uDDFF]|[\uD83C[\uDE01-\uDE02]|\uD83C\uDE1A|\uD83C\uDE2F|[\uD83C[\uDE32-\uDE3A]|[\uD83C[\uDE50-\uDE51]|\u203C|\u2049|[\u25AA-\u25AB]|\u25B6|\u25C0|[\u25FB-\u25FE]|\u00A9|\u00AE|\u2122|\u2139|\uD83C\uDC04|[\u2600-\u26FF]|\u2B05|\u2B06|\u2B07|\u2B1B|\u2B1C|\u2B50|\u2B55|\u231A|\u231B|\u2328|\u23CF|[\u23E9-\u23F3]|[\u23F8-\u23FA]|\uD83C\uDCCF|\u2934|\u2935|[\u2190-\u21FF])"
    '   Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    '   Dim strEmoji : strEmoji = "^[\0\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u2600-\u26FF]"
    '
    Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    If Lookup(strEmoji, oMessage.Subject) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-Subject") = oMessage.Subject
            oMessage.Subject = Trim(.Replace(oMessage.Subject, ""))
            oMessage.Save
        End With
    End If
    If Lookup(strEmoji, oMessage.From) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-From") = oMessage.From
            oMessage.From = Trim(.Replace(oMessage.From, ""))
            oMessage.Save
        End With
    End If
End Sub
No oMessage.EncodeFields = False or similar stuff here? Or is that not needed?
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-26 00:10

Darn...i think my script in 2nd post actually works fine...without oMessage.EncodeFields = False that is....
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-26 00:18

RvdH wrote:
2021-04-25 23:51
SorenR wrote:
2021-04-25 23:46
I have previously used this to remove EMOJI's - then I upgraded MySQL to use utf8mb4 :mrgreen:

Code: Select all

Sub FixEMOJI(oMessage)
    '
    '   With surrogate pairs, a Unicode code point from range U+D800 to U+DBFF (called "high surrogate")
    '   gets combined with another Unicode code point from range U+DC00 to U+DFFF (called "low surrogate")
    '   to generate a whole new character, allowing the encoding of over one million additional characters.
    '
    '   http://www.unicode.org/Public/emoji/11.0/emoji-test.txt
    '
    '   Dim strEmoji : strEmoji = "(?:[\u2700-\u27BF]|(?:\uD83C[\uDDE6-\uDDFF]){2}|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u0023-\u0039]\uFE0F?\u20E3|\u3299|\u3297|\u303D|\u3030|\u24C2|\uD83C[\uDD70-\uDD71]|\uD83C[\uDD7E-\uDD7F]|\uD83C\uDD8E|\uD83C[\uDD91-\uDD9A]|\uD83C[\uDDE6-\uDDFF]|[\uD83C[\uDE01-\uDE02]|\uD83C\uDE1A|\uD83C\uDE2F|[\uD83C[\uDE32-\uDE3A]|[\uD83C[\uDE50-\uDE51]|\u203C|\u2049|[\u25AA-\u25AB]|\u25B6|\u25C0|[\u25FB-\u25FE]|\u00A9|\u00AE|\u2122|\u2139|\uD83C\uDC04|[\u2600-\u26FF]|\u2B05|\u2B06|\u2B07|\u2B1B|\u2B1C|\u2B50|\u2B55|\u231A|\u231B|\u2328|\u23CF|[\u23E9-\u23F3]|[\u23F8-\u23FA]|\uD83C\uDCCF|\u2934|\u2935|[\u2190-\u21FF])"
    '   Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    '   Dim strEmoji : strEmoji = "^[\0\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\u2600-\u26FF]"
    '
    Dim strEmoji : strEmoji = "(?:[\uD800-\uDBFF][\uDC00-\uDFFF])"
    If Lookup(strEmoji, oMessage.Subject) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-Subject") = oMessage.Subject
            oMessage.Subject = Trim(.Replace(oMessage.Subject, ""))
            oMessage.Save
        End With
    End If
    If Lookup(strEmoji, oMessage.From) Then
        With CreateObject("VBScript.RegExp")
            .Pattern = strEmoji
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            oMessage.HeaderValue("X-EMOJI-From") = oMessage.From
            oMessage.From = Trim(.Replace(oMessage.From, ""))
            oMessage.Save
        End With
    End If
End Sub
No oMessage.EncodeFields = False or similar stuff here? Or is that not needed?
Not with this one as I'm comparing binary 4 byte data.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-26 00:20

SorenR wrote:
2021-04-26 00:18
Not with this one as I'm comparing binary 4 byte data.

Code: Select all

REM Remove 4ByteUFT8
myRegExp.Pattern = "[^\x00-\x7F]+"
If (myRegExp.Test(mySubject)) then
	mySubject = myRegExp.Replace( mySubject, Empty )
End If

this is a function i used in a classic asp website years back

Code: Select all

Function Remove4ByteUFT8(strString)
	Dim objRegEx : Set objRegEx = CreateObject("VBScript.RegExp")
	objRegEx.Global = True   
	objRegEx.IgnoreCase = True
	objRegEx.Pattern = "[^\x00-\x7F]+"
	Remove4ByteUFT8 = objRegEx.Replace(strString, Empty)
	Set objRegEx = Nothing
End Function
Last edited by RvdH on 2021-04-26 00:27, edited 2 times in total.
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

User avatar
SorenR
Senior user
Senior user
Posts: 4493
Joined: 2006-08-21 15:38
Location: Denmark

Re: Subject Decoding

Post by SorenR » 2021-04-26 00:21

This I used in OnError() to capture EMOJI's that was not found by the first one..

Code: Select all

Sub FixEMOJIAgain(sDescription)
    '
    '   ixV(0) metadata_accountid
    '   ixV(1) metadata_folderid
    '   ixV(2) metadata_messageid
    '   ixV(3) metadata_dateutc
    '   ixV(4) metadata_from
    '   ixV(5) metadata_subject
    '   ixV(6) metadata_to
    '   ixV(7) metadata_cc
    '
    Dim oApp : Set oApp = CreateObject("hMailServer.Application")
    Call oApp.Authenticate(ADMIN, PASSWORD)
    Dim strV, ixV, oDomain, oAccount, oFolder, oMessage
    Dim i, strValue, hexValue, SaveIt, Emoji, ECFlag
    If (Left(sDescription, 30) = "MySQL: Incorrect string value:") Then
        strV = Mid(sDescription, InStr(sDescription, "VALUES") + 8)
        strV = Left(strV, Len(strV) - 2)
        strV = Replace(strV, Chr(34), "")
        strV = Replace(strV, "'", Chr(34))
        ixV = Eval("Array(" & strV & ")")
        strV = Mid(ixV(6), InStrRev(ixV(6), "@") + 1)
        strV = Replace(strV, ">", "")
        EventLog.Write( "Domain lookup value: " & strV )
        Set oDomain  = oApp.Domains.ItemByName(strV)
        EventLog.Write( "Found Domain name: " & oDomain.Name )
        Set oAccount = oDomain.Accounts.ItemByDBID(ixV(0))
        EventLog.Write( "Found Account address: " & oAccount.Address )
        Set oFolder  = oAccount.IMAPFolders.ItemByDBID(ixV(1))
        EventLog.Write( "Found Folder name: " & oFolder.Name )
        Set oMessage = oFolder.Messages.ItemByDBID(ixV(2))
        EventLog.Write( "Found MessageFile: " & oMessage.Filename )
        ECFlag = oMessage.EncodeFields
        If (InStr(sDescription, "for column 'metadata_subject'") > 0) Then
            oMessage.EncodeFields = False
            EventLog.Write( "Found Message Subject: " & oMessage.Subject )
            oMessage.EncodeFields = ECFlag
            '
            '   Fix 4-byte Emoji's in Subject
            '
            SaveIt = False : Emoji = False : strValue = ""
            For i = 1 To Len(oMessage.Subject)
                hexValue = Hex(AscW(Mid(oMessage.Subject, i, 1)))
                EventLog.Write( "test --> " & hexValue & " / " & Len(hexValue) & " / " & Emoji )
                If (Len(hexValue) = 4) Then
                    If (InStr("D8:26", Left(hexValue, 2)) > 0) Or (Emoji = True) Then
                        Emoji = Not Emoji
                        SaveIt = True
                    Else
                        strValue = strValue & Mid(oMessage.Subject, i, 1)
                        Emoji = False
                    End If
                Else
                    strValue = strValue & Mid(oMessage.Subject, i, 1)
                    Emoji = False
                End If
            Next
            If SaveIt Then
                EventLog.Write( "Saving Message Subject: " & strValue )
                oMessage.HeaderValue("X-EMOJI-Subject") = oMessage.Subject
                oMessage.Subject = strValue
                oMessage.Save
            End If
        End If
        If (InStr(sDescription, "for column 'metadata_from'") > 0) Then
            oMessage.EncodeFields = False
            EventLog.Write( "Found Message From: " & oMessage.From )
            oMessage.EncodeFields = ECFlag
            '
            '   Fix 4-byte Emoji's in From
            '
            SaveIt = False : Emoji = False : strValue = ""
            For i = 1 To Len(oMessage.From)
                hexValue = Hex(AscW(Mid(oMessage.From, i, 1)))
                EventLog.Write( "test --> " & hexValue & " / " & Len(hexValue) & " / " & Emoji )
                If (Len(hexValue) = 4) Then
                    If (InStr("D8:26", Left(hexValue, 2)) > 0) Or (Emoji = True) Then
                        Emoji = Not Emoji
                        SaveIt = True
                    Else
                        strValue = strValue & Mid(oMessage.From, i, 1)
                        Emoji = False
                    End If
                Else
                    strValue = strValue & Mid(oMessage.From, i, 1)
                    Emoji = False
                End If
            Next
            If SaveIt Then
                EventLog.Write( "Saving Message From: " & strValue )
                oMessage.HeaderValue("X-EMOJI-From") = oMessage.From
                oMessage.From = strValue
                oMessage.Save
            End If
        End If
    End If
    Set oApp = Nothing
End Sub
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
RvdH
Senior user
Senior user
Posts: 1422
Joined: 2008-06-27 14:42
Location: Netherlands

Re: Subject Decoding

Post by RvdH » 2021-04-26 11:21

Removing oMessage.EncodeFields = False did the trick, without it i can strip the 4ByteUFT8 chars and save the message to disk
(saving the file with subject embedded in it's name simplifies the spam/ham check at first sight, 'How to enlarge your penis by 35% minimum' is not likely ham :mrgreen: )
CIDR to RegEx: d-fault.nl/CIDRtoRegEx
DNS Lookup: d-fault.nl/DNSTools
DNSBL Lookup: d-fault.nl/DNSBLLookup
GEOIP Lookup: d-fault.nl/GeoipLookup

palinka
Senior user
Senior user
Posts: 2614
Joined: 2017-09-12 17:57

Re: Subject Decoding

Post by palinka » 2021-04-26 12:19

RvdH wrote:
2021-04-26 11:21
Removing oMessage.EncodeFields = False did the trick, without it i can strip the 4ByteUFT8 chars and save the message to disk
(saving the file with subject embedded in it's name simplifies the spam/ham check at first sight, 'How to enlarge your penis by 35% minimum' is not likely ham :mrgreen: )
I was always curious how pfizer implements spam rules. Must be extremely difficult. :mrgreen:

Post Reply