Mail Cluster Service Interuption

One of the four heads in our redundant NFS filer clusters that handle all of our email storage crashed this morning due FCAL bus instabilities after a disk failure.  It’s partner attempted to take over it’s operation but was unable to due to the specific nature of the failure. This is one of the few edge cases where all of the redundancy built into the system isn’t able to help as the only way to reestablish service is to powerfail all of the disk shelves to completely reset the FCAL buses.  No email or data was lost and the systems otherwise performed as expected.  During the 15 minutes while the filer was down approximately one quarter of our users would have been unabled to check their email.  At this time all services have been restored.

Update Mon Jun 22 10:41:48 PDT: IMAP users who’s message stores were on the affected filer may have continued to be unable to check their mail until a few minutes ago due do clock skew between the filer and the servers.

Leave a Reply

Your email address will not be published. Required fields are marked *

*