Search is slow an consumes a lot of CPU

Use this forum if you have installed hMailServer and want to ask a question related to a production release of hMailServer. Before posting, please read the troubleshooting guide. A large part of all reported issues are already described in detail here.
Locked
rosali
Normal user
Normal user
Posts: 101
Joined: 2008-01-13 18:32

Search is slow an consumes a lot of CPU

Post by rosali » 2014-03-21 20:14

UID SORT (DATE) US-ASCII TEXT some keywords to find
----------------------------------------------------------------------

The above SEARCH request is terribly slow and consumes a lot of CPU for hugh mailboxes. Are there any plans to store keywords of the first text part (or alternatively the first HTML part [html2text]) into hm_message_metadata database table to perform fast database searches instead of parsing email source files from filesystem?

I'm a great fan of the hMailserver project. SEARCH performance and the lack of THREADED MESSAGE LISTING is the only reason and drawback why I don't use hMailserver in production for bigger projects.

BTW, it would be nice to have a roadmap of your development goals to let the community know where you are heading to. Is there any? I can't find it.

Bill48105
Developer
Developer
Posts: 6192
Joined: 2010-04-24 23:16
Location: Michigan, USA

Re: Search is slow an consumes a lot of CPU

Post by Bill48105 » 2014-03-22 00:40

rosali wrote:UID SORT (DATE) US-ASCII TEXT some keywords to find
----------------------------------------------------------------------

The above SEARCH request is terribly slow and consumes a lot of CPU for hugh mailboxes. Are there any plans to store keywords of the first text part (or alternatively the first HTML part [html2text]) into hm_message_metadata database table to perform fast database searches instead of parsing email source files from filesystem?

I'm a great fan of the hMailserver project. SEARCH performance and the lack of THREADED MESSAGE LISTING is the only reason and drawback why I don't use hMailserver in production for bigger projects.

BTW, it would be nice to have a roadmap of your development goals to let the community know where you are heading to. Is there any? I can't find it.
Certain things in hmail or any mail server require a lot of work and searches are one of them. hmail's indexing helps with SOME but not everything can be indexed or might as well store messages in the database and there are no plans to do that. It has been discussed umpteen times regarding pros/cons of email in files vs database so no need to go into it again but it boils down to reasonable compromises for the general hmail community vs specialized features for select few especially if it causes slowness or overheard for those who don't want/need it. Searching & threading can be done with the client so changes to hmail are not NEEDED. Server-side would just be nice for those who use those things and not sure how many really do. Personally I absolutely despise threaded email so I can't ever see myself working on that but since it's open source feel free to get to work on it. :) Otherwise people who really need those features look elsewhere. You could always use Exchange. That bloated bitch uses more ram & cpu per minute than a week of hmail doing your searches. :D

There is no official roadmap just the feature request voting results & discussions in IRC & forums regarding what to work on otherwise we focus on issues as they come up & try to fix bugs asap if they do. Since we are just volunteers vs employees of a company getting paid to work on hmailserver we squeeze in time where we can and that means prioritizing things so some things like doc updates, web site updates or working on a roadmap are very back burner.

Btw here are links to voting results if you want to check out:
http://www.hmailserver.com/?page=featur ... g_extended
http://www.hmailserver.com/?page=feature_voting
We use the extended one in particular to help decide what things to work on when we get time.
Bill
hMailServer build LIVE on my servers: 5.4-B2014050402
#hmailserver on FreeNode IRC https://webchat.freenode.net/?channels=#hmailserver
*** ABSENT FROM hMail! Those in IRC know how to find me if urgent. ***

User avatar
mattg
Moderator
Moderator
Posts: 20837
Joined: 2007-06-14 05:12
Location: 'The Outback' Australia

Re: Search is slow an consumes a lot of CPU

Post by mattg » 2014-03-22 01:12

Bill48105 wrote:Searching & threading can be done with the client
+1
Thunderbird (with some add ons for threading) does this nicely for me thanks
Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

rosali
Normal user
Normal user
Posts: 101
Joined: 2008-01-13 18:32

Re: Search is slow an consumes a lot of CPU

Post by rosali » 2014-03-22 05:59

I see your priorities but I can't follow your general arguments.

Webmail clients usually do not store a (local) copy of the email. So SEARCHES can only be performed on server side. This also applies for THREADED message listing. One of the best examples is Roundcube webmail. This webmail client is meanwhile so good that one does not need a desktop client anymore. Unfortunately this is not true for those who provide email services by hMailserver.

We are living in the age of cloud hosting and you should consider web applications when you priorize things. Thanks!

BTW, I did not say you should store the entire message in the database. I see the pros of the filesystems. I said you should store the first text part (or alternatively the first HTML part as text version). I think this is a big difference. As you already store meta data (few headers) you are already parsing the email on arrival. It should not be too hard to extend it by saving one body part. I'm coding PHP. May be I have to learn C++.

rosali
Normal user
Normal user
Posts: 101
Joined: 2008-01-13 18:32

Re: Search is slow an consumes a lot of CPU

Post by rosali » 2014-03-22 16:20

OK. I have checked what I can do for Roundcube webmail with the existing information in hm_message_metadata table and I have already a basic threaded message listing working.

All what I really need to implement THREADS and BODY SEARCHES into Roundcube (or other webmails) as it should be at the end is:

#1- THREADS

Extend hm_message_metadata table by fields 'metadata_message_header_messageid' (Message-ID header), 'metadata_header_inreplyto', (In-Reply-To header) and 'metadata_header_references' (References header).

#2- SEARCH

Extend hm_message_metadata table by field 'metadata_body' (contains Body property of Message object)

I don't think this feature request is really hard to implement for you and I believe performance of hMailserver won't suffer.

You would make a lot of hMailserver admins who are using Roundcube webmail very happy. Thank you.

percepts
Senior user
Senior user
Posts: 5282
Joined: 2009-10-20 16:33
Location: Sceptred Isle

Re: Search is slow an consumes a lot of CPU

Post by percepts » 2014-03-22 18:25

what's the big deal with roundcube and webmail. Its dying out. Now so many people have smart phones, tablets and laptops which are wifi enabled, they don't need webmail, they can just use the email client which is installed on their device and bypass the slow and bandwidth hungry http used to get to an http web client which then has to emulate an email client. I think some mailserver administrators need to get with the times.

There's a lot more to implementing a mail body search and the effect it will have on performance than you will ever know.

rosali
Normal user
Normal user
Posts: 101
Joined: 2008-01-13 18:32

Re: Search is slow an consumes a lot of CPU

Post by rosali » 2014-03-22 18:32

That's not a valid argument. If you would know Roundcube and the existing plugins you would realize that it is in fact a replacement of mobile and desktop clients.

yoni5002
Normal user
Normal user
Posts: 36
Joined: 2010-07-13 15:04

Re: Search is slow an consumes a lot of CPU

Post by yoni5002 » 2014-03-23 00:33

rosali wrote:That's not a valid argument. If you would know Roundcube and the existing plugins you would realize that it is in fact a replacement of mobile and desktop clients.
I have to agree on this one. For the second time, I see myself deploying another Linux server to move users off of hMailserver into a *NIX solution because of these same features, specially the way searches are handled.
mattg wrote:
Bill48105 wrote:Searching & threading can be done with the client
+1
Thunderbird (with some add ons for threading) does this nicely for me thanks
Yes, a client application could be an alternative that most people will think as "solution" when if fact it isn't. While hMailServer offers good value, specially for its price (free), there are things that could be done to improve it in regards to searches. As an example, (and I believe I'm not alone) we have 23 accounts in 2 domains we have to move out into an alternative backend server due to the fact that hMailServer + Webmail cannot keep up with the demand.

These two domains have something in common and that's big mailboxes used for records archiving proposes. In accounts with 10-20 folders and between 17,000 and 25,000 emails per folder, it is impossible to perform a searches with webmail when running hMailServer in the backend. Webmail it is a big + and not an email client. It happens that these records are accessed by several users and having them sync with a client application makes no sense for more than few reasons. I'm not sure if you have ever attempted to synchronize a mailbox this big with Outlook or even Thunderbird. Outlook being the first one crashing and Thunderbird only being capable of managing it in PCs with SSDs and disabling "Message Synchronization" in the local computer.

As a true alternative, webmail shines when it comes to handling these big mailboxes; Roundcube being our preferred front-end email client. Not only access to these emails is really fast and configuration-free for the users we support but also searches are performed in a blink of an eye AS LONG AS we give up hMailserver and put those emails a Linux box. Yes, Dovecot handles it a lot better and I personally had a discussion here before regarding hMailServer. You were kind enough to respond at that time to Bill and few things were done in that regard with the 5.4 release.

I have no way to express my satisfaction when it comes to hMailServer but I have been burned few times already with this problem. Don't tell me to just replace my hMailServer boxes; I don't lose hope someday this will be addressed :)

There you go, a real example of something that is in fact very common in business nowadays... specially for institutions such as schools; by law, they have to keep email records for at least few years in many places and that's how I know I can't be alone in this boat ;)

Thanks for hMailServer!

User avatar
mattg
Moderator
Moderator
Posts: 20837
Joined: 2007-06-14 05:12
Location: 'The Outback' Australia

Re: Search is slow an consumes a lot of CPU

Post by mattg » 2014-03-23 03:15

If I'm not mistaken, doesn't roundcube utilise a database.
Why can't this be done on a client level by those who want it?

If you want to add message content to your hmailserver database then use this script>> http://www.hmailserver.com/forum/viewto ... 20&t=13890

Doesn't mean that this is indexed or used for sorting, but that can be done in roundcube if you need it...
percepts wrote:. Now so many people have smart phones, tablets and laptops which are wifi enabled, they don't need webmail, they can just use the email client which is installed on their device
+1 from me

If you want to go back to *nix from hmailserver, don't forget to ask for a refund for what you paid for hmailserver
Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

Bill48105
Developer
Developer
Posts: 6192
Joined: 2010-04-24 23:16
Location: Michigan, USA

Re: Search is slow an consumes a lot of CPU

Post by Bill48105 » 2014-03-23 04:21

yoni5002 wrote: I have to agree on this one. For the second time, I see myself deploying another Linux server to move users off of hMailserver into a *NIX solution because of these same features, specially the way searches are handled.
There you go choosing the better tool for the job. You get a gold star for making the right decision.
mattg wrote:
percepts wrote:. Now so many people have smart phones, tablets and laptops which are wifi enabled, they don't need webmail, they can just use the email client which is installed on their device
+1 from me

If you want to go back to *nix from hmailserver, don't forget to ask for a refund for what you paid for hmailserver
+1 as well on both accounts. Webmail use is down and there are other mail servers with varying features, pros/cons.

Gotta love people bitching about free software with free support. Hey we're all glad if hmailserver suits someone's needs but there is NO loss of sleep if there is a better option available to you & you use it. Heck I manage Exchange servers and nix servers too because they fit that particular need better than hmailserver. I mean if you need to move from Texas to Ohio you don't go to the Chevy dealer & complain your stuff won't fit in your car & they should make it bigger.. You go get a U-Haul and load it up because that's the better tool for your needs.

Anyway I'm done ranting. It's simple: If hmailserver does what you need it to then great enjoy the free software with free support all done by volunteers with real jobs & lives outside of hmail. If not it's open source so fell free to work on it or hire someone if you are unable to yourself. Or of course as you pointed out you can find another tool that better fits your needs.
Bill
hMailServer build LIVE on my servers: 5.4-B2014050402
#hmailserver on FreeNode IRC https://webchat.freenode.net/?channels=#hmailserver
*** ABSENT FROM hMail! Those in IRC know how to find me if urgent. ***

rosali
Normal user
Normal user
Posts: 101
Joined: 2008-01-13 18:32

Re: Search is slow an consumes a lot of CPU

Post by rosali » 2014-04-01 21:52

#1- Generally
Does Open Source mean that users are not heard, when posting legit feature requests? Is hMailServer a closed shop for those people who are satisfied with existing features and others are locked out? Is it really an offense to mention other applications where certain things are implemented better, for instance *NIX Dovecot and that not only in regards to threaded message listing and search query performance? I'm programming a lot of web related stuff and when my code is compared with existing better solutions my goal is to beat the competitors.

#2- In regards to Webmail
Please let me know only one ISP who does not offer Webmail access for desktop and mobile devices for their customers. Do you really say ISPs should not use hMailServer because it does not fit their needs and it won't never be developed in a way to reach such goals?

#3- The background of my request
I have solved everything with a lot of efforts. From my point of view wasted time because all I was asking for was to add few database fields which are filled by hMailServer core on runtime - that's really nothing fancy and that's the straight forward way. Finally for all who say that such things should be done on client side ... Desktop/mobile clients can store their stuff easily in a local cache. Webmail clients can't. In other words when you say the client should do such things then you accept to duplicate data (hMailserver database/filesystem plus Webmail cache) and resource consumption. Is this really the right approach?

#4- @ mattg
Thanks for the link to the script. It was the only useful hint in this thread. Unfortunately it is only the half way to reach the goal. When thinking about it, you may notice that there is no hMailServer event which handles messages that are not delivered by SMTP. You can't index messages which are saved by IMAP protocol (APPEND - for example when an outgoing message is saved in "Sent" folder or when messages are imported ...).

User avatar
mattg
Moderator
Moderator
Posts: 20837
Joined: 2007-06-14 05:12
Location: 'The Outback' Australia

Re: Search is slow an consumes a lot of CPU

Post by mattg » 2014-04-02 00:16

1. Not at all. There is a 'feature request' section of this forum. Use it to add a request. There are many requests, and these are implemented in rough order order of priority when volunteers have the time and the inclination to implement them. OPEN SOURCE means that you can download the source and modify it yourself (there are licence conditions prohibiting you from re-selling).

2. Again, not at all. Web mail use is on the decline (as detailed by some real providers of this stuff)

3. Importantly to me, your solution didn't slow my server down. Glad that you sorted this

4. I'm sure that you can manually run a re-index every hour or every day, from a scheduled task.
Just 'cause I link to a page and say little else doesn't mean I am not being nice.
https://www.hmailserver.com/documentation

Bill48105
Developer
Developer
Posts: 6192
Joined: 2010-04-24 23:16
Location: Michigan, USA

Re: Search is slow an consumes a lot of CPU

Post by Bill48105 » 2014-04-02 00:59

mattg wrote:1. Not at all. There is a 'feature request' section of this forum. Use it to add a request. There are many requests, and these are implemented in rough order order of priority when volunteers have the time and the inclination to implement them. OPEN SOURCE means that you can download the source and modify it yourself (there are licence conditions prohibiting you from re-selling).

2. Again, not at all. Web mail use is on the decline (as detailed by some real providers of this stuff)

3. Importantly to me, your solution didn't slow my server down. Glad that you sorted this

4. I'm sure that you can manually run a re-index every hour or every day, from a scheduled task.
Indeed. What mattg said.
This thread is still going on? wth I thought it was done already.. Look rosali use hmailserver or don't use it. It either suits your needs or not. Obviously we are open to suggestions & constructive criticism but at the same time whining & bitching gets very old to those of us who VOLUNTEER to support hmailserver. Not sure what you don't get about that. It's not a job. We don't get paid. We like hmailserver & we use hmailserver so we want to return something to the community. You think your need is so important then work on it or hire someone. Otherwise why does it get to jump ahead of the 100's of other requests that have been waiting YEARS?! As mattg said check for feature request or post one & get some votes. Votes are a HUGE factor in deciding the order to do things. Obviously security issues & big fixes are done ASAP (feel free to check the experimental build & official change logs to see that's the case. If urgent enough I often get something posted within hours when possible) but for everything else we weight many factors which have been gone over numerous times in the forums. First & foremost it should follow RFC's. It should help as many people as possible without causing problems for others. The polls help decide that but even if tons of people vote for it if it is high risk of breaking something it is likely passed over. STARTTLS is a good example of that. There is no doubt it was WANTED BY MANY but we knew there were very big risks of doing it so no one wanted to take that on. Eventually I got the time & inclination to work on it & it's partially done. Other times things have low votes or no poll at all but they are easily done with little risk so they are done just to improve hmail. There are other factors as well but in the end nothing gets done unless someone volunteers to do it and not sure how likely someone will be to volunteer to do something if it doesn't suit their own needs but especially if someone is being difficult which I know for me gives me zero inclination to work on it. Not too mention how much time has been wasted in this thread already.

Feel free to do a feature request & add a poll assuming one doesn't already exist.
hMailServer build LIVE on my servers: 5.4-B2014050402
#hmailserver on FreeNode IRC https://webchat.freenode.net/?channels=#hmailserver
*** ABSENT FROM hMail! Those in IRC know how to find me if urgent. ***

Locked