One of the two pairs of storage clusters we use for email storage had a cluster takeover event at 11:09 this AM. One of the heads lost communication with all of its disks and its partner successfully took over all of its services without any interruption. However, while tracing the failure of the first head to a failed FCAL optics package – which was replaced – the cluster interconnect adapter in the second head locked up and triggered a panic. This panic lead to a brief interruption if POP/IMAP services at approximately 12:00. The second filer rebooted successfully still in partner takeover.
Unfortunately, since the second filer is still in takeover mode and can’t see the cluster interconnect adapter, the most conservative resolution requires that we halt the second filer and replace the failed adapter. We’ve tentatively scheduled this for tomorrow after midnight provided that we receive the replacement adapter from our vendor in time. POP/IMAP services will be offline for the duration of the maintenance which should take less than an hour to complete.
-Kelsey and William
Update 01:00: The cluster interconnect adapter has been replaced and all services have been fully restored.
-Kelsey and William