Post new topic Reply to topic  [ 16 posts ] 

Do you need this feature?
Yes 40%  40%  [ 6 ]
No 60%  60%  [ 9 ]
Total votes : 15
Author Message
 Post subject: Saving messages to the database
PostPosted: 2006-05-08 14:39 
New user
New user

Joined: 2006-05-08 01:32
Posts: 12
I know there's a locked topic on this subject, but I can't post to that because it's locked and I want to add to the discussion.

I found and installed hmailserver specifically because I thought (from what I read) that it used mySQL for storing messages. Which would suit my purposes splendidly.

I have since learned different and wish it were not so.

With the mailserver as the only machine exposed to the web I'd be happy to restrict it's access to filesystems and have it solely communicate with the file/database server via named pipes. Thus a more secure platform.

It's more convenient to include a database in backup procedures.

With a fileserver configured with large file allocation blocks (to efficiently store it's databases and large binary files) it is extremely wasteful to use one per 3 ~ 4k email message.

I also would prefer a database of messages to develop new applications against rather than files.

martin posted that he'd experimented with saving messages to the database but found it an awkward conglomeration of partitioning, escaping and other complexities.

I can understand worrying about escaping the contents (though I don't quite see why you'd have to - text should go in text fields with little need of escaping and binary attachments in blobs - though I know nothing of the programming environment hmailserver is developed in) being processing intensive but I don't get the discussion about breaking messages into chunks. mySQL is quite capable of handling single records of any reasonable required size.

I think the discussion misinterpreted adipose's message who only meant that wether you used files or databases you'd still be well off restricitng message sizes and there was no intent to mix the two methods. I certainly wouldn't expect such.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-08 15:40 
Normal user

Joined: 2005-11-01 16:25
Posts: 224
Location: CPH
I had the same considerations.

However, realizing that the eml file is the "format" of Internet mail and that they can be of size 10 to 100MB, it makes sense to store the complete message as such.
On the other hand, handling the messages is quite another business, and I think it is a good compromise that the info you need to handle the mail is stored in the database - for example the table hm_messages. Only when you need to transport the mail, the actual eml file is read.

Also, don't forget that the disk operating/file system NTSF (or FAT) is a database on its own, highly specialized for - well - storing files.
As for the disk consumption, you are right; most mails are small. If this is critical, use a separate volume volume for the eml files or - given the fast machines today - use disk compression for the data folder.

Your comments on security are well put but Martin is working on that (see very recent thread elsewhere) for the next version.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-10 01:42 
New user
New user

Joined: 2006-05-08 01:32
Posts: 12
Gustav wrote:
Also, don't forget that the disk operating/file system NTSF (or FAT) is a database on its own, highly specialized for - well - storing files.


This may be true, but file systems aren't as good at searching, ordering, recombining, connecting and examining data as purpose designed databases.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-10 23:54 
Developer
Developer

Joined: 2003-11-21 01:09
Posts: 6311
Location: Sweden
You're free to implement this functionality at any time. There are several good books on C++. :) I might implement it myself some day, but I don't prioritize it.

I'm not against the idéa of only using the database for persistence. But no-one has been able to tell me how to implement it without reducing performance a lot (and making the software much more complex).

I'm far from convinced that storing the information in the database generally would give better performance than storing it on disk. Say for example that you have a 4MB message in a MySQL row and you want to read the To-header. Today hMailServer can easily parse just the header and doesn't have to read the entire message from disk. But if the entire message was stored in a MySQL blob, hMailServer would have to run several statements to find out where the header begins and ends, or it would have to download all 4MB into memory and then parse it. Then say that the user wants to sort all messages in a folder based on the To:-header. How would that work smoothly if the entire messages were stored in the database?

Quote:
mySQL is quite capable of handling single records of any reasonable required size.

Depends on what MySQL version you're using. And try browsing the MySQL bugs database. I'm not sure that most people would agree with you that everything works smoothly when the records are larger than 16MB. :)


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-11 17:17 
Normal user

Joined: 2005-11-01 16:25
Posts: 224
Location: CPH
If you store the complete message in the database, it may quickly grow into the 100+ GB range like the MS Exchange storage can do.

A compromise is what GroupWise does. Small messages are stored in the database while large (attachments) are stored "offline" as separate files.

An option could be to extract the factual data of the messages like addresses, dates, subject, id, routing, names of attachments, flags and headers, etc. in a standard normalized relational database structure where the could be indexed and easily searched.

However, when you consider encrypted and/or digitally signed messages and the complexity of the MIME format, this may not at all be possible. Also, given the new requirements for corporations to archive mail for five years in its original form, the only realistic option may very well be the single file storage.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-11 20:38 
Normal user

Joined: 2006-04-17 20:29
Posts: 150
Location: Needham, MA 02492 USA
Also, given the new requirements for corporations to archive mail for five years in its original form

spam too? that'd be nasty...

- Al Weiner -


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-13 21:06 
Normal user

Joined: 2005-11-01 16:25
Posts: 224
Location: CPH
No.
One simple option is to insert a spam filter (proxy) in front of the mail server. Let it kill virus infected mail as well.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-13 21:59 
Normal user

Joined: 2006-04-17 20:29
Posts: 150
Location: Needham, MA 02492 USA
I figured not, but (conspiracy mode on) I could envision data-storage companies getting a law like that passed - imagine the storage requirements if you *did* have to store all the spam everyone in a company received!
(conspiracy mode off)

- Al Weiner -


Top
 Profile  
 
 Post subject:
PostPosted: 2006-05-14 14:42 
Site Admin
User avatar

Joined: 2005-07-29 16:18
Posts: 13810
Location: UK
If such a law was passed you could be damn sure they would do more than they currently are to fight spam emails ;)


Top
 Profile  
 
 Post subject: What about search?
PostPosted: 2006-07-18 16:09 
New user
New user

Joined: 2006-07-18 16:06
Posts: 4
Hi,

For me the big need in storing messages(message subject, body, from, to)
into the database is the ability to provide later on FAST SEARCH into the mesages.

Otherwise, what about possibilities to search into .eml?

Is there any other tool in indexing .eml messages?

Thx.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-07-18 16:20 
Normal user

Joined: 2005-11-01 16:25
Posts: 224
Location: CPH
That would require a database that could read MIME messages natively (haven't heard of any) - or a new field where MIME converted to plain text was copied into when the message was stored - and then the ability to maintain a full text index.


Top
 Profile  
 
 Post subject: ...
PostPosted: 2006-07-18 17:16 
New user
New user

Joined: 2006-07-18 16:06
Posts: 4
No, what I mean is:

- keep the messages just like now in .eml files

BUT

- in adition, keep a copy of the message subject, body and recipients in the database, just for fast search.

General search(into all acounts) is very important, at list for me.

I think this would bring more users to hmail.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-07-18 18:25 
Normal user

Joined: 2005-11-01 16:25
Posts: 224
Location: CPH
If you wish to index and search the body, you need to decode it - including HTML-only bodies.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-07-19 10:54 
New user
New user

Joined: 2005-09-21 15:07
Posts: 17
Just to put my pennies worth in.

I have several clients running MS Exchange. This uses a database to store the emails in. Believe me, when you get corruption in the database, you REALLY wish that MS had stored each email in seperate files.
Any corruption means taking the whole DB off line to fix, and this can take several hours (I have a client with a multiple DB' of ofer 100GB each).

I have found several clients wishing that they hadn't gone to Exchange for exactly this reason.

I do however think that a better way of searching would be great. Maybe, as a previous post stated, insert just the To, From, Subject, and maybe a text parsed version of the body (not attachments) into the database for indexing and searching purposes, while leaving the actual email in a file format. Obviously, when an email file is deleted, you would then delete it's associated entry in the db.
Presumably, when an email comes in, HMS generated a unique GUID for the email, so that could be used in the DB to relate the entry with the file.


just a thought.


Top
 Profile  
 
 Post subject: db store
PostPosted: 2006-07-19 15:27 
New user
New user

Joined: 2006-07-18 16:06
Posts: 4
It is a small issue to store a copy of the subject and body in the db fields,
please implement it, would help a lot.

Otherwise how can I do this? Please guide a little bit.

Thx.


Top
 Profile  
 
 Post subject:
PostPosted: 2006-07-19 15:35 
New user
New user

Joined: 2005-09-21 15:07
Posts: 17
Actually,

as HMS has an event trigger, it wouldn't be too hard to get the email properties (to, from, subject etc) within the event handler and dump this info into a new table in the DB.

Unfortunately, the event handlers are by no means complete, so you would have problems keeping the DB up to date when users delete mail.

maybe you could have a scheduled task that deletes oly entries...

dunno...just a thought


While i'm on the subject, Martin, is there any chance of a more advanced/comprehensive event handler system which would include...

OnEmailDelete
OnEmailSend (so that i could change the ip of the server it send to and override thge default etc.)

cheers


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 


Who is online

Users browsing this forum: No registered users and 0 guests



Search for:
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group