IMAP/POP3 outage

Due to the ongonig logging issues we have seen today, there was a brief outage of our IMAP/POP3 services. As of right now both are up and running. Total down time was 25 minutes.

SOC

Internal Log Server Issues

One of our central syslog servers had a series of raid failures that lead to filesystem corruption late last night.  Normally, this would only be an issue for our backend systems but at approximately 8:30AM this morning local logging on our MX servers, which forward logs to that  syslog server, blocked.  This caused inbound mail handling to stop along with other services including the redis sentinel cluster that is used to store configuration information.  That, in turn, caused the membertools to have issues since they depend upon some of the configuration stored in redis.  The MX servers were all back up and running by 9:15AM.  We’re continuing to work on restoring the failed syslog server but there is no customer impact at this time.  – Sonic Ops

Fusion/FlexLink Outage

Update (4:52PM): All affected customers should be back online at this point.

Today, June 16, starting at 3:35pm, configuration changes on equipment serving Fusion/FlexLink subscribers in northern California caused connectivity issues for a subset of subscribers.

We are working to restore service to all affected subscribers as soon as possible and expect all connectivity to be restored within 15 minutes.

-Tomoc and NOC

Core Router Maintenance – Palo Alto

Update (2:00AM): This maintenance is now complete.

Update: This maintenance has been delayed until THURSDAY (6/18/2015) night.

Beginning tonight at midnight we will be performing maintenance on a core router in the Palo Alto area. No downtime is expected from this operation but users may notice some routing changes.

– Robbie

Non-Intrusive Maintenance – LA Area

2:37AM PDT – This maintenance is now complete.

Tonight starting at 11:59PM PDT we will be performing software upgrades on routers in the Los Angeles area. No customer impact is expected as traffic will be migrated from these routers prior to the software upgrade.

-Tim J.

Mail Connectivity Issues

Email connectivity for IMAP, POP3, and Webmail is currently down. We believe this to be caused by a power event effecting an isolated set of servers in our datacenter. We will post more information when the systems come back online.

— Joe and the SOC

Update:  Our IMAP/POP3 cluster is having a thundering herd problem and we’re working hard to encourage it to come up smoothly.  -Kelsey and William

Update: All services have been restored and stabilized.  It appears that the root of the problem lies in our central radius authentication servers – while it may be as simple as needing to upgrade the existing cluster of 6 servers with faster CPUs, we suspect we may have hit an internal performance limitation in the radius server software that prevented them from being able answer requests fast enough.  We strive for complete redundancy and should not have had any lasting stability problems after a few servers lost power.  We’ll investigate this failure over the next few days and resolve as needed to ensure this problem doesn’t occur again.  -Kelsey and William