Sonic Status

A hardware failure in one of our DSL gateway…

May 8, 2007

Tue May 8 10:57:02 PDT 2007 — A hardware failure in one of our DSL gateway routers has caused packet loss and slow performance for a percentage of our DSL customers. We have identified the problem and expect service to be fully restored in 30 minutes. -Eli, Operations Update Tue May 8 11:21:40 PDT 2007 — We have replaced the failed equipment, and service has been restored.

DSL migration.

May 7, 2007

Mon May 7 14:16:09 PDT 2007 — DSL migration. At 12:01am on Wednesday, May 7th we will be upgrading our DSL aggregation circuit in the Modesto/Stockton area. Customers in that region will experience an outage of approximately 15 minutes as their circuits are re-routed to the new link. -Nathan and Kelsey

Update Tue May 8 11:34:58 PDT 2007 — This migration has been re-scheduled to 12:01am on Thursday, May 10th. -Nathan

Update Wed May 9 16:04:10 PDT 2007 — Continued problems with some SONET equipment have caused this migration to be re-scheduled yet again. It will likely happen early next week. -Nathan

DSL interruption.

May 5, 2007

Sat May 5 17:13:08 PDT 2007 — DSL interruption. At approximately 5:00 PM today, some of our DSL customers experienced a 5-minute interruption of service. The interruption was found to be caused by a brief traffic flood, possibly a DoS attack. The flood stopped shortly after it began, and service was restored. We are currently investigating the source of the traffic flood. -Jared and Nathan

Server Upgrades.

April 20, 2007

Fri Apr 20 11:42:51 PDT 2007 — Server Upgrades. We have been quietly upgrading many of our servers and clusters over the past few months to improve the overall quality of our ISP services. The upgrades include the complete replacement of our SpamAssassin cluster with six new quad-core Xeon servers, a new web cluster server member, four new internal DNS servers, as well as the recent deployment of a pair of new FTP servers and the addition of two new inbound MX servers this morning. These upgrades, in addition to others not mentioned, allow us to continue to provide the high quality always-on services that our customers have come to expect. It is gratifying to see all of our hard work and preparation pay off. For instance, our careful selection of power sources for individual machines and networking hardware kept all of our cores services available during the recent UPS event even though roughly half of our systems lost power. -Kelsey, Nathan, Augie and Dan

Power issue update.

April 19, 2007

Thu Apr 19 16:54:52 PDT 2007 — Power issue update. The UPS technicians expect to finish the repair within the next couple of hours. Provided that it completes it’s tests and inspection we will transition live load back to the UPS at 10:00PM tonight. We do not anticipate any service interruption however will be fully staffed to handle the unexpected. This UPS serves a small number of customers in our colo facility and redundant load from our own server clusters. -Nathan, Kelsey, Russ

Update Thu Apr 19 22:55:20 PDT 2007 — The UPS is back in service and supporting critical load. Many thanks to the UPS technicians from JT Packard. -Nathan, Kelsey, Russ, Dane, and Matt

Power issue at our Santa Rosa datacenter…

April 18, 2007

Wed Apr 18 11:37:52 PDT 2007 — Power issue at our Santa Rosa datacenter update. We confirmed that the UPS experienced a catastrophic massive internal failure. At this time we are continuing to run the UPS’ load on external bypass to PG&E service power with our generator running in the event that PG&E service is interrupted.

www.sonic.net/ups-failure/IMG_3332.jpg www.sonic.net/ups-failure/IMG_3321.JPG www.sonic.net/ups-failure/IMG_3326.JPG www.sonic.net/ups-failure/IMG_3318.JPG

-Nathan, Kelsey, Dane and Russ.

Power issue at our Santa Rosa datacenter.

April 18, 2007

Wed Apr 18 07:46:24 PDT 2007 — Power issue at our Santa Rosa datacenter. This morning at approximately 6:49am one of the UPSes that feeds our Santa Rosa colocation facility dropped its critical load, causing the 15 customers connected to the UPS to lose power. When we arrived on-site to investigate, the power room where the UPS is located smelled of burnt plastic and the input circuit breaker to the UPS was tripped. We placed the load into external bypass around 7:10am, at which time power was restored.

At this time, we surmise that a massive internal failure inside the UPS caused the fault. A UPS technician is currently en route to do further diagnostics, required repairs, and to start the unit back up. Until fixed, we will be operating with the UPS’ critical load supported by our Automatic Transfer Switch. Our generator is running in the event that our building PG&E feed fails.

On the plus side, Sonic.net services such as mail, web, ftp as well as our core networking equipment was unaffected by the outage. All of this critical infrastructure is fed by both of our datacenter UPSes to handle circumstances such as this.

-Nathan, Kelsey, Dane, Jen, Clay and everyone in Support

Power maintenance.

April 12, 2007

Thu Apr 12 17:29:24 PDT 2007 — Power maintenance. At 8:00am on Saturday, April 14th one of the San Francisco colocation facilities we use for customer termination will experience a planned power outage. A number of circuit breakers inside power distribution units located on the colocation floor have failed infrared testing and are being proactively replaced before they can cause unplanned downtime. While we have been given a 4 hour outage window, we expect the total power loss to be under 2 hours.

All of our critical equipment is dual-fed from redundant feeds, and as such we do not expect any customer impact. However, there is the possibility for cascading power failure as well as the failure of our transit and transport providers at that facility. We will have staff on-site to facilitate restoration in the event of a power failure, and our Network Operations Center will be fully manned to deal with any networking problems that arise.

We’ll keep the MOTD up to date in the event of any problems, but expect things to go smoothly.

-Nathan, the Network Operations Center and Tech Support

Update Tue Apr 17 09:44:20 PDT 2007 — The power outage was a non-event. Despite half of the colo facility going dark, Sonic’s gear remained up and fully operational as the PDU breakers were replaced. It’s a joy to see our carefully planned redundancy work as expected, and quite a thrill to watch large circuit breakers tripped on live loads! -Nathan and Matt

Emergency SQL Maintenance.

March 26, 2007

Mon Mar 26 23:52:40 PDT 2007 — Emergency SQL Maintenance. Our customer MySQL and Postgres database server is undergoing emergency maintenance to replace a failing disk and suspect controller in it’s local RAID. Due to some complications we may incur a few minutes of downtime as the hardware is swapped out. -Kelsey and Nathan

Update Tue Mar 27 01:22:57 PDT 2007 — The customer SQL server emergency maintenance went off without a hitch. Actual down time was less than 8 minutes in total. -Kelsey and Nathan

Dial move.

March 20, 2007

Tue Mar 20 11:01:25 PDT 2007 — Dial move. At approximately 10:00am on March 23rd we will be migrating a piece of our dial-up modem capacity to a new provider. While we expect little to no customer impact from this swap, it is possible that the moved numbers will briefly fail to function. No in-process calls will be dropped, and we’ll have increased technical support staff on-hand to handle any problems as they come up.

All dial-up numbers ending in a “3” will be affected by this migration. Redundant, unaffected numbers are available in all areas. To find them, browse to:

www.sonic.net/popf/

-Nathan and Eli