Category: Uncategorized

Call Center Inbound Calls Down

The two PRIs feeding the customer support call center are currently down.  We are working with our PRI vendor to restore service.  We have narrowed it down to a physical issue, but do not have an ETA to restoration of service.   We will update this issue as they become available.

Currently our  turn around time for email support (support@sonic.net) is remarkably quick.

Update: Service has been restored.  The remote terminal serving our location lost AC power.  It is now on generator back-up.

Update: As of 5:40pm, the two PRIs serving our call center appear to be down again. While we work to expedite a second repair, our customer service representatives are still available at support@sonic.net .

 

Update: At 6:08 service was once again restored and we are taking incoming calls.  We will continue to monitor this situation and update any changes in status.

 

– Sonic.net Senior Support

 

 

Mail Server Upgrades

Thursday morning we’re going to replace the mail server cluster that handles delivery to customer mail spools.  For all but a handful of users that are using procmail to filter their mail on our servers this upgrade is expected to be completely seamless.  This upgrade will eliminate the queue delay problems that have shown up a few times over the last couple of months.  For more information please see this forum post.

Update:  The upgrades have been completed and everything appears to be working correctly at this time.  Mail delivery latency is back under our target of less than 1 second.

Inbound Email Delays and Pending Upgrades

This morning a portion of our inbound email flows ended up queuing on our MX edge servers due to excessive load on the internal servers that handle email delivery and filtering to local destinations  All queued email was delivered when the issue resolved itself on its own by 9AM.  We are currently tracking down how our monitoring failed to alert us to the problem.  Coincidentally, we’ve been working on several upgrades to our email server clusters.  We’ve recently replaced all of our spam filtering servers and already have hardware on order to replace the cluster that was responsible for the delays this morning.

Invasive Router Maintenance

Tonight, Wednesday, August 1, at 12:01AM we will be performing invasive maintenance on equipment serving Fusion and FlexLink products in San Bruno. Customer down time is expected to be less than 20 minutes.

-Sonic.net NOC

Sonic Telecom Emergency Backbone Maintenance

Due to a hardware failure in the Sonic Telecom backbone we will need to perform emergency router maintenance over the next hour. We do not believe this work will have any impact on customer traffic but there is the potential for a brief service interruption. Further updates will follow shortly.

-Tim and Nathan

Update: Maintenance work has been completed. There should have been no impact to customer traffic. This evening, from 12:00AM until 1:00AM, we will be performing further maintenance of a similar nature. Again we do not expect any impact to customer traffic but there is a possibility of brief service interruptions during this time.

Webcluster Service Interuption

A number of hosted websites (including www.sonic.net) were offline for a approximately seven minutes this afternoon.  We’re investigating the initial cause that caused the servers to drop out but believe a known bug in our load balancer configuration was also triggered that prevented all of the sites from being automatically brought back online.

Update: Thu Jun 28 10:16:44 PDT 2012.  We’ve seen a couple more events since this happened yesterday and we believe we have identified the DoS attack and are working to mitigate its affects now.

Update: Thu Jun 28 13:13:41 PDT 2012.  We’ve installed a custom apache module that should eliminate or, at least, reduce the affects of the DoS attack on our web cluster and are hopeful that this will prevent this issue from causing problems again.

-Kelsey, William and Kevan

 

DSL Aggregation Router Reload

This Friday, June 22 at 12:01 AM we will be performing maintenance reloads of two Redbacks that terminate traditional DSL service. This will affect some of our Bay Area and LA DSL subscribers. Expected downtime is 5 minutes.
Update: Reloads complete. Estimated down time was under 5 minutes.

-Tomoc

Emergency Router Reload

At 7:33am, a protocol issue between two redundant routers caused network instability which required an emergency reboot. This caused intermittent connectivity issues to our DNS servers as well as a subset of customer connections. Service has been restored as of 8:14am, and we will be working with our vendors to ensure this does not happen again.

-Sonic.net NOC

Sonic.net Company Picnic

Support will be closed from 11am-6pm on 6/9/2012.  Sonic.net will be celebrating its 18th Birthday on Saturday, June 9th.  In order to allow all employees to attend the celebratory company picnic support will be closed from 11am until 6pm.  We apologize for any inconvenience this may cause.

Emergency Legacy DSL Maintenance

Tonight, at 12:01AM, we will be performing invasive maintenance on an ATM switching device that serves a subset of legacy DSL customers in the Bay Area as well as Chico. Downtime for affected customers should be less than 10 minutes.

-Tim, Nathan, and the NOC

Update: The ATM switch in question suffered a redundancy fail-over before the maintenance work could be performed this evening. Affected customers should have experienced about 5 minutes of downtime while the fail-over occurred. We are currently investigating the cause of the issue in order to prevent further recurrences.