Author: admin

DSL Aggregation Router Lockup

Tonight at approximately 1:20 AM, one of our DSL aggregation routers that serves the greater Bay Area had it’s network uplink interface lock up. This problem was resolved by reloading the affected router. At this time, the router is back up and fully operational, and all affected customers should be back online. Total downtime was approximately 15 minutes for affected customers

-Jared

Infrastructure maintenance

Wednesday (09, Dec.) we will be performing maintenance on some of the internal networking and NetApp storage filers that many of our servers depend on (including e-mail and web); from 11pm to 1am the following morning you may see a slow down in performance when retrieving or sending e-mail or accessing your web content as we perform our maintenance.

-Augie, Aaron

Los Angeles Area Outage

We are currently experiencing an outage affecting our Los Angeles area aggregation equipment. This is affecting all DSL and ATM-based Business-class services. We are working on restoring service and will provide an update shortly.

-Jared and Tim

Update: The outage was caused by a crashed management card in our core ATM switch in Los Angeles. The ATM switch does have a redundant management card, but that card did not load its configuration properly. The ATM switch had to be completely power failed to restore it to service. At this time the switch is up and functioning properly, and all connectivity should be restored. We have also isolated and fixed what caused the backup management card to not load its configuration properly.

Sonic.net Office Phone System Outage

This morning our phone lines in to the Sonic.net office are reporting all circuits busy.  Due to this fact our support, sales, and billing departments are currently unreachable by telephone.  We have reported this outage to our vendor who has found a problem between their network and AT&T. They have dispatched a technician to fix the problem, but have not yet provided us with an estimated time of repair.

This outage affects inbound and outbound calls through our office phone system. No other services are affected by this outage.

Thank you for your patience while we work with our vendor to resolve this issue.

Update — Our phone provider is working on the issue which has resulted in intermittent availability.  Customers that have gotten through to support have been dropped at times due to the work being performed.  We appologize for the inconvenience, and appreciate your patience while we get this issue resolved.

Update — The problem was diagnosed as an overheating Fiber Mux card at the carrier’s central office.  The card has been replaced and service has been restored.

Telephone outage

This morning our phone lines are reporting all circuits busy.  We are working with our vendors to troubleshoot the problem but we do not currently have an ETR.   Telephone support, sales, and billing are currently unavailable.  Thank you for your patience during this time.

Update: Our vendor has found a problem between their network and AT&T.  They have dispatched a technician to fix the problem but have not provided us with an ETR.  As always we are available for troubleshooting at support@sonic.net. This outage only affects inbound and outbound calls through our office phone system.  Dial-up services are not affected by this outage.

Update: A problem, localized to the fiber-optic cable at  AT&T’s central office, has been resolved. Service to our phone system has been fully restored.

DSL Aggregation Equipment Failure

One of our DSL aggregation routers handling customers in California’s LATA 1 (largely the San Francisco area) started having problems at approximately 10:41pm tonight. Customers experienced a complete service outage until 10:47pm, at which point we were able to restore partial service. We’re presently working to move the affected customer’s service to alternate equipment, which will involve 10-15 minutes worth of additional downtime in the near future. At present, customers on this particular piece of aggregation gear are experiencing ~7% packet loss. We’ll update this entry as our work progresses.

-Nathan, Matt and Jared

Update 11:14pm:

The problem turned out to be a failing ATM port inside of our network. We’ve migrated all traffic to a hot-spare port and service has returned to normal.

Infrastructure maintenance.

Update : (0123 20, Nov.) Maintenance complete; everything is fully operational again.

Tonight (19, Nov.) we will be performing maintenance on some of the internal networking and NetApp storage filers that many of our servers depend on (including e-mail and web); from 11pm to 1am the following morning you may see a slow down in performance when retrieving or sending e-mail or accessing your web content as we perform our maintenance.

-Augie, Aaron

Large DDoS Attack

This morning at 11AM, a large DDoS was aimed at a server in our colocation facility in Santa Rosa. The DoS was large enough (1.5 million pps) to disrupt connectivity to our Santa Rosa datacenter, which would have affected access to mail and web services hosted by Sonic. The DoS was blocked by 11:07 AM, and no further ill effects should have been felt after that point.

-Jared

POP3 problems

We experienced a brief issue with our POP3 (email) performance today due to a problem with our load balancing servers. This issue has been resolved and all POP3 services should be working normally at this time.

DSL Performance Problems in LATA1

This morning at about 9:30 AM we began tracking a problem that was causing some DSL customers in the Bay Area to experience poor speeds and performance. We have tracked the cause of the problem back to a failing ATM card in one of our core routers, and have removed that card from service. At this time, all DSL should be functioning normally, but we continue to monitor the situation.

-Jared and Nathan