## External File Store, multiple issues

yinrunning




### External File Store, multiple issues

I've got hmailserver using a Samba share for files for about a week now. We've had multiple issues with message files. Examples:

- Can't send message for several tries, multiple clients, error message something along the lines of "failed to save files". Intermittent.
- When moving IMAP messages to folders or deleting them, receive "IMAP command failed: Failed to save copy of message". Intermittent.
- Viewing new emails sometimes fails. The preview list will show the sender, receiver, etc., but when the message is selected the view pane shows nothing. No headers, nothing. If the message is viewed in a different client it will usually show up fine. Checking / viewing the physical .eml file shows everything there correctly. Intermittent.

The reason I moved away from hmail's local disk is because the ntfs drive is fragged to hell & back, refuses to de-frag, and is slow as all get-out. The samba share is used by two Windows web servers and we've never had i/o issues with it in the past year and a half. It can get over 6MB/s throughput over the network ( I've watched it ), so the idea that it's the source of the issues seems unlikely to me.

I'm wondering if I need to back out of this configuration and go back to suffering through the local hard drive, or if anyone has experienced similar and has any ideas? I'm using the UNC path to the share, all the AD domain stuff checks out fine. There's nothing related in the Windows event viewer. hmail's error log is chock full of file i/o warnings / errors. Server 5.2.1-B360

Either way I've sat on it for a week and need to make a decision / find a fix soon.

Cheers.

^DooM^




### Re: External File Store, multiple issues

This would suggest a network problem would it not? Have you tried upgrading the nic drivers on the windows server? Perhaps running wireshark on the network card will reveal any irregularities.
yinrunning




### Re: External File Store, multiple issues

Trying that now. Network issue sounds reasonable if the box was exhibiting anything else along those lines, but it isn't. Just has a slow HDD. I'm just stuck for ideas.

yinrunning




### Re: External File Store, multiple issues

Might sound silly, but what am I looking for? Everything on the LAN nic ( which is where the smb traffic is running ) looks fine in the capture I took as far as I can tell. I'm not a packet wizard. There's alot of noise on the WAN nic, but that's expected seeing as how it's a mail server. Malformed packets, usual haxorz noise.

yinrunning




### Re: External File Store, multiple issues

K, found an error message that led to this. http://support.microsoft.com/kb/948496/ Trying.

yinrunning




### Re: External File Store, multiple issues

I think we may have a winner. Just picked up 463 messages and flopped them back and forth between two imap folders a couple times, no glitches. Good call on Wireshark, wasn't familiar with that one. Someone cross your fingers and hope that that hotfix does the trick for good.

^DooM^




### Re: External File Store, multiple issues

\o/ i tell ya, Wireshark is an awesome bit of kit and for free... Can't beat it, hats off to their Devs. I have my Fingers X'd for you
yinrunning




### Re: External File Store, multiple issues

Nope, that didn't do it. I've had a number of issue reports all afternoon...

^DooM^




### Re: External File Store, multiple issues

Doh! well i'm out of Ideas, I don't have an SMB box to test with either.
martin






### Re: External File Store, multiple issues

Can you post the actual error message from the hMailServer error log? These typically contains the error code from Windows...
Have you tried googling for the error code and Samba?

yinrunning




### Re: External File Store, multiple issues

Code: Select all

"ERROR"	2564	"2009-08-25 16:05:09.536"	"Severity: 3 (Medium), Code: HM5050, Source: File::CreateDirectory, Description: Could not create the directory \\fs\store\hmailserver. Tried 5 times without success. Windows error code: 1351 (Unknown)"
"ERROR"	2104	"2009-08-25 16:05:09.536"	"Severity: 2 (High), Code: HM5047, Source: File::DeleteFile, Description: Could not delete the file \\fs\store\hmailserver\{E3E72B0A-1085-4975-9605-2D6CDBB75329}.eml. Tried 5 times without success. Windows error code: 1351 (Unknown)"
"ERROR"	2904	"2009-08-25 16:05:09.536"	"Severity: 3 (Medium), Code: HM5050, Source: File::CreateDirectory, Description: Could not create the directory \\fs\store\hmailserver. Tried 5 times without success. Windows error code: 1351 (Unknown)"
"ERROR"	2564	"2009-08-25 16:05:09.536"	"Severity: 3 (Medium), Code: HM5026, Source: PersistentMessage::_WriteDataToMessageFile, Description: Message retrieval failed because message file \\fs\store\hmailserver\domain.com\nico\99\{99AB8C68-BACE-4C83-8283-0765479D411F}.eml did not exist."
"ERROR"	2904	"2009-08-25 16:05:09.536"	"Severity: 3 (Medium), Code: HM5026, Source: PersistentMessage::_WriteDataToMessageFile, Description: Message retrieval failed because message file \\fs\store\hmailserver\domain.com\robert\8F\{8F0E0F43-B7C4-4691-9C17-4FFC1963C917}.eml did not exist."
"ERROR"	2904	"2009-08-25 16:05:09.536"	"Severity: 3 (Medium), Code: HM4403, Source: Message::GetHeader, Description: Could not read the message header, since the file was not available. File: \\fs\store\hmailserver\domain.com\robert\8F\{8F0E0F43-B7C4-4691-9C17-4FFC1963C917}.eml"
"ERROR"	2564	"2009-08-25 16:05:18.896"	"Severity: 3 (Medium), Code: HM5026, Source: PersistentMessage::_WriteDataToMessageFile, Description: Message retrieval failed because message file \\fs\store\hmailserver\domain.com\nico\9A\{9A6EFA81-6D35-4B7C-82F6-4DC1C1919732}.eml did not exist."
That's what I have at the moment as a brief sample. Have to work on some money jobs for a bit but I'll be back to poking at it later.

martin






### Re: External File Store, multiple issues

You found a reference to a Microsoft KB item. Was it Wireshark which gave you that link, or did it give you a specific error message?

The error message in your hMailServer log from Windows, 1351, seems to mean: "ERROR_CANT_ACCESS_DOMAIN_INFO - Indicates a Windows NT Server could not be contacted or that objects within the domain are protected such that necessary information could not be retrieved."

From Microsoft KB:
Indicates a domain controller could not be contacted or that objects within the domain are protected and necessary information could not be retrieved.

Was it the same error code (1351) that Wireshark gave you?

Not really sure what to do about it. If it was my network, I would guess that there's either some glitch with the network, or that there's some configuration issue with the AD. Or that there's some DNS issue which makes the server unable to locate the domain controller at certain times. But I really don't know what may cause it. Google turns up a few hits with people having the same problem (not related to hMailServer) when accessing the network , but no real solutions.

yinrunning




### Re: External File Store, multiple issues

Wireshark gave me a smb TCP/UDP error that I can't remember offhand that led me to the KB article through a blog post with the same setup ( Dell box, Broadcom card, Win2K3 ) and same behaviors. I'm getting a bit desperate so I gave that a shot. I don't think the error is hMailServer's at this point, but not sure what to do at all. Like I said I ran down all the AD domain stuff, can't find any issues in that regard. It's definitely in the domain, and it definitely gets to the file store 99% of the time. It's that 1% that's causing all the issues.

This just occured to me, and not sure how helpful it might / might not be, with an external db and file store, what would it take to port the service to another machine assuming that I installed the same version? Just the server ini file?

^DooM^




### Re: External File Store, multiple issues

If you install the same version you can pretty much just copy the whole directory over to the other server as your DB and Data storage are on different machines as long as the new server is allowed to connect to the other machines. Make sure you kill the service before copying the data over.
yinrunning




### Re: External File Store, multiple issues

Here's the error that Wireshark is catching:

Code: Select all

4180	91.817083	172.16.112.1	172.16.112.200	SMB	NT Create AndX Request, Path: \fs\store\hmailserver\{C30CBD01-3C38-4284-A7D2-CD289D0D4B55}.eml

4181	91.821148	172.16.112.200	172.16.112.1	SMB	NT Create AndX Response, FID: 0x0000, Error: STATUS_OBJECT_NAME_NOT_FOUND
I know this isn't the smb share forum, anyone got any ideas on where one is? Oh, and the first Google result for "STATUS_OBJECT_NAME_NOT_FOUND" matches my hardware more or less. That's what got me to the hotfix which seemed to work for a bit... and then not so much.

yinrunning




### Re: External File Store, multiple issues

Ok, so it looks like I _might_ have finally got it. Called Dell, went through all the driver options, upgraded drivers, etc. no effect. I upgraded Samba on the Linux machine from 3.0.28 to 3.0.30. That ended up actually blocking the mail server from the share completely for an hour ( big suck ), until I promo'd him to a DC and downgraded Samba and that fixed that. However, NONE of this fixed the STATUS_OBJECT_NOT_FOUND errors ( That I can still only see through Wireshark on the windows machine ). After all of that, I re-upgraded Samba on the Linux machine, and now I can't find any of these messages in Wireshark.

I found them, for posterity's reference, by running a Wireshark capture on the interface in question ( I have two and windows/rpc/smb traffic only flows on one ), then filtering for SMB and scanning the results. I couldn't find a way to get more granular than that, although I'm sure that's just a RTFM issue on my part. but that at least isolated it to the relevant traffic for me.

Crossing fingers ( again ), and keeping an ear out for user griping. WEEEEE.

Still very curious why this happens on hmailserver only and never has on any of the other 10 or so ways that that fileshare gets accessed over the various networks and programs that can get to it.

Don't know why they gripe. They should thank me for not getting emails.

^DooM^




### Re: External File Store, multiple issues

yinrunning wrote:Don't know why they gripe. They should thank me for not getting emails.
I always wonder that as well.. meh
yinrunning




### Re: External File Store, multiple issues

Ok guys, sorry to report, but as far as I can tell at this point this behavior is being caused by hMail's interaction with the file store. Reasoning:

I went ahead and set up hmail on one of the aforementioned web servers and moved the ip address of the mail server over to that machine. This was about 6pm yesterday local time. I kept an eye on him all through the evening. Fell asleep around 12:30, woke up at 7:30 to a whole bucketfull of phone calls. The server couldn't access the share at all, which took down mail as well as all of the websites on that machine. This machine has never had any issues getting files off of that share before. The only things I did to the machine were:

- Put a new copy of hmailserver on it ( latest stable as of yesterday, direct download off the site ).
- Gave it a new IP Address
- Opened up the firewall ports for hmail and dns.

I officially have to go back to having mail on the local drive of the mail machine. Really wish this worked.

^DooM^




### Re: External File Store, multiple issues

Do you have the resources to try setting up a NAS on windows machine and see if that causes the same issues?
yinrunning




### Re: External File Store, multiple issues

I do, but I don't have time to move the 35g mail store more than one more time and keep my job. :/ I'm reformatting and re-installing Windows on the original mail server today and doing the move tonight. Go go gadget rsync!

martin






### Re: External File Store, multiple issues

hMailServer uses standard Windows API for all file access. I haven't written my own file writing modules or anything like that but rely on Windows to do all the file access. Due to that, I don't really see how it could be a problem in hMailServer. For instance, to create directories (one of the operations which fail on your system), hMailServer uses the CreateDirectory Windows API:
http://msdn.microsoft.com/en-us/library ... 85%29.aspx
hMailServer specifies the directory name to create, and tells the API to use the default security descriptors (to inherit from parent directory). In this example, there's not much changes I could do in hMailServer which I believe would change what happens on your system. I only tell Windows what directory to create. hMailServer doesn't care whether this is a local drive, a network share or anything else.

Also, a single program shouldn't be able to block the access to a network drive just by accessing it. If hMailServer manages to do this, I feel that that would indicate a problem with either the drivers or the hardware.

I understand that you have other programs accessing this drive with no fault, but maybe their usage patterns differs from hMailServers. For example, maybe hMailServer creates a lot more files than your web system. A web system is often mostly read-only access, while hMailServer is very much a combination of both read and write.

yinrunning




### Re: External File Store, multiple issues

I understand what you're saying. All we know at this point is: hmailserver on local HDD: everyone happy. hmailserver on network share: not happy. Regardless at this point which machine we put him on.

I've done a clean install on the mail server and this completely changed the dynamic of the error messages being tossed on the other two servers. Before they were STATUS_OBJECT_NAME_NOT_FOUND. Now they're STATUS_PATH_NOT_FOUND. These coincide with an hmail error log entry giving Windows Error 123 ( "ERROR_INVALID_NAME 123 (0x7B) The filename, directory name, or volume label syntax is incorrect." < from MSDN ). To which I reply "Yes, I know it's not there. So why not create it, eh?"

The problem seems to be occuring when hmail calls the function to move file \a\b\x to \a\b\c\d\f\x where f does not exist yet. In these cases I have randomly selected files that he's tossing the 123 error on and gone and searched for the directories. Sure enough, they're not there.

And, he still tries to create \a and \a\b ( those being the base share drives folders ) every once in a while, even though they already exist and are readable. He then errors out in his logs because he can't create them ( because they exist ).

Another oddity is that on the original mail server, all file addressing was \\fs\store\a\b\c\, whereas when watching the same Wireshark capture on the current "mail server", all file addressing is just \a\b\c\. It leaves off the rest of the UNC. I have no idea if that's normal / not.

It's unclear to me how the existence or behavior of one domain memeber can affect the interactions of two other domain members, but ever since he went offline the other two ( the current live "mail server" and the file share have been having different conversations. I'm assuming that something was drastically wrong with the mail server's ability to deal with AD replication and authentication, and that that was causing auth errors and connect aborts on the file share box who may be more touchy about these things.

I'm still planning on going back to local file store as soon as I get him back up to full speed software wise. I will have him up on the file share for a little bit while I get the bulk of the files moved over to his HDD, so we'll see how he does with that. If he doesn't have any issues whatsoever then I may just leave it and consider re-installing windows on the web servers as well.

The battle continues! We ( well, I at any rate ) shall overcome! *sigh*

yinrunning




### Re: External File Store, multiple issues

Also, side-note on the current environment: It's not as read-only Web as you might think. There's actually quite a lot of i/o among the ~200 sites on there. Alot of dynamic applications doing file writes, and we have 6 developers pushing changes out there all day long. But hmail's usage is definitely different. Mainly alot of "create this and then read it back and then move it right away". It seems to generally be ok doing straight reads, until the error frequency catches up to it and makes it puke.

martin






### Re: External File Store, multiple issues

I don't see why reinstalling the mail server would change anything since it's the interaction with the network drive which is the problem. I don't think I've seen any case where reinstalling hMailServer on top on an already existing installation was worth the work.
And, he still tries to create \a and \a\b ( those being the base share drives folders ) every once in a while, even though they already exist and are readable.
Readable using Windows Explorer you mean? hMailServer clearly has problems accessing the folder structure. If hMailServer has problems creating folders, moving files, one can expect that hMailServer has problem to detect whether the folder already exists as well..
Another oddity is that on the original mail server, all file addressing was \\fs\store\a\b\c\, whereas when watching the same Wireshark capture on the current "mail server", all file addressing is just \a\b\c\. It leaves off the rest of the UNC. I have no idea if that's normal / not.
Does it always do this?

yinrunning




### Re: External File Store, multiple issues

It doesn't always do any of this. Only under load, and after it's been up for at least an hour. On a restart it does just fine for at least 60 minutes ( much longer at night when usage isn't peak ), but eventually it will start eating itself on these errors.

I'm doing the final syncs and prepping to move back to local HDD. Trying to figure out if the day is quiet enough to just do it around lunch. The re-install was actually completely worth it in this case. The machine was dogged down to the point of being nearly unusable, as well as aforementioned Domain issues. Afterwards, snappy, peppy, and the set of errors that I had been seeing across all of the domain members have completely vanished. Unfortunately there's some new errors in their place. It's definitely not inconceivable that there have been improvements on AD functionality/code since I originally installed any of the pieces involved, and since I have 2 web servers I can take one of them down for a while to figure that out. However, there's no real way to take the file share offline long enough to do any serious troubleshooting / reconfig on that. Upgrading Samba was the most I could do to it. That's why that box and his warm backup are Linux: they just keep going, and going, and going...

Going to an external MySQL instance has still been the best thing that's happened through all of this, and was the huge performance boost 2 weeks ago. So I'm leaving that alone, taking the files back, and ending this little experiment. Perhaps someday I'll get a chance to try again.

yinrunning




### Re: External File Store, multiple issues

btw, martin, while I haven't found any root cause anywhere, and agree with your assessment that it isn't hmail's fault that this is happening: This is the first time in 3 years that I can't say "The mail server is doing exactly what it's supposed to be doing, whether you like it or not." ( read: really poorly set up third-party mail domains / clients ). Other than that it's been taking a lickin and keepin on ticking. In this case, unfortunately, "it" as a whole entity isn't doing what it's supposed to be doing for whatever reason. Thus my high level of concern.

Cheers.

martin






### Re: External File Store, multiple issues

Well, I understand your concern; Consumer of services doesn't care what part of the service which does not work. :-\

yinrunning




### Re: External File Store, multiple issues

Well, my point was that other than this it always behaves correctly.

maggiore81






### Re: External File Store, multiple issues

Hello
instead that pointing the user data to a UNC network share, have you tried to map network drives? They are a lot faster than simple UNC shares
pepsi





### Re: External File Store, multiple issues

make sure that HMS is running under an account and not local sysytem and that the account HMS is running with has read/write access

maggiore81






### Re: External File Store, multiple issues

I assume that the user account is correct, otherwise it won't have written a single byte!
martin






### Re: External File Store, multiple issues

maggiore81 wrote:Hello
instead that pointing the user data to a UNC network share, have you tried to map network drives? They are a lot faster than simple UNC shares
Assuming you're referring to the mapping done in Windows Explorer, there's a Microsoft KB item somewhere stating that this may be a bad idea. Problem is something like this: Network drives are mapped per-session. The service doesn't have access to your user session. Not sure, but wouldn't be surprised if the mapping is initiated by Windows Explorer, which isn't run in the service session. Hence, Windows services will have problems accessing network drives you've mapped using your own account.

^DooM^




### Re: External File Store, multiple issues

I just tried using a local service app to write to a mapped drive and it failed. I had to use a UNC path with correct logon details and permissions.
mattg





### Re: External File Store, multiple issues

I have used a local service to write to a mapped network drive, but only after I log onto a GUI once as the user who the service will be running under, and map the network drive with that user's credentials.

Matt
