Power issue at our Santa Rosa datacenter…

Wed Apr 18 11:37:52 PDT 2007 — Power issue at our Santa Rosa datacenter update. We confirmed that the UPS experienced a catastrophic massive internal failure. At this time we are continuing to run the UPS’ load on external bypass to PG&E service power with our generator running in the event that PG&E service is interrupted.

www.sonic.net/ups-failure/IMG_3332.jpg www.sonic.net/ups-failure/IMG_3321.JPG www.sonic.net/ups-failure/IMG_3326.JPG www.sonic.net/ups-failure/IMG_3318.JPG

-Nathan, Kelsey, Dane and Russ.

Power issue at our Santa Rosa datacenter.

Wed Apr 18 07:46:24 PDT 2007 — Power issue at our Santa Rosa datacenter. This morning at approximately 6:49am one of the UPSes that feeds our Santa Rosa colocation facility dropped its critical load, causing the 15 customers connected to the UPS to lose power. When we arrived on-site to investigate, the power room where the UPS is located smelled of burnt plastic and the input circuit breaker to the UPS was tripped. We placed the load into external bypass around 7:10am, at which time power was restored.

At this time, we surmise that a massive internal failure inside the UPS caused the fault. A UPS technician is currently en route to do further diagnostics, required repairs, and to start the unit back up. Until fixed, we will be operating with the UPS’ critical load supported by our Automatic Transfer Switch. Our generator is running in the event that our building PG&E feed fails.

On the plus side, Sonic.net services such as mail, web, ftp as well as our core networking equipment was unaffected by the outage. All of this critical infrastructure is fed by both of our datacenter UPSes to handle circumstances such as this.

-Nathan, Kelsey, Dane, Jen, Clay and everyone in Support

Power maintenance.

Thu Apr 12 17:29:24 PDT 2007 — Power maintenance. At 8:00am on Saturday, April 14th one of the San Francisco colocation facilities we use for customer termination will experience a planned power outage. A number of circuit breakers inside power distribution units located on the colocation floor have failed infrared testing and are being proactively replaced before they can cause unplanned downtime. While we have been given a 4 hour outage window, we expect the total power loss to be under 2 hours.

All of our critical equipment is dual-fed from redundant feeds, and as such we do not expect any customer impact. However, there is the possibility for cascading power failure as well as the failure of our transit and transport providers at that facility. We will have staff on-site to facilitate restoration in the event of a power failure, and our Network Operations Center will be fully manned to deal with any networking problems that arise.

We’ll keep the MOTD up to date in the event of any problems, but expect things to go smoothly.

-Nathan, the Network Operations Center and Tech Support

Update Tue Apr 17 09:44:20 PDT 2007 — The power outage was a non-event. Despite half of the colo facility going dark, Sonic’s gear remained up and fully operational as the PDU breakers were replaced. It’s a joy to see our carefully planned redundancy work as expected, and quite a thrill to watch large circuit breakers tripped on live loads! -Nathan and Matt

Emergency SQL Maintenance.

Mon Mar 26 23:52:40 PDT 2007 — Emergency SQL Maintenance. Our customer MySQL and Postgres database server is undergoing emergency maintenance to replace a failing disk and suspect controller in it’s local RAID. Due to some complications we may incur a few minutes of downtime as the hardware is swapped out. -Kelsey and Nathan

Update Tue Mar 27 01:22:57 PDT 2007 — The customer SQL server emergency maintenance went off without a hitch. Actual down time was less than 8 minutes in total. -Kelsey and Nathan

Dial move.

Tue Mar 20 11:01:25 PDT 2007 — Dial move. At approximately 10:00am on March 23rd we will be migrating a piece of our dial-up modem capacity to a new provider. While we expect little to no customer impact from this swap, it is possible that the moved numbers will briefly fail to function. No in-process calls will be dropped, and we’ll have increased technical support staff on-hand to handle any problems as they come up.

All dial-up numbers ending in a “3” will be affected by this migration. Redundant, unaffected numbers are available in all areas. To find them, browse to:

www.sonic.net/popf/

-Nathan and Eli

Santa Rosa colocation stays up.

Mon Mar 19 12:44:54 PDT 2007 — Santa Rosa colocation stays up. Early Friday morning we experienced an outage on one of the two AT&T-provided Gigabit links leaving our Santa Rosa datacenter. Our Network Operations Center worked to isolate the trouble, which turned out to be a failed regenerator card in a Fremont central office. The card was replaced around 4:00pm, bringing the link back into service. The end result was a total of 2 seconds of packet loss for colocated customers, plus a very happy NOC staff. It’s always fun to see our carefully planned redundancy work in the real world! -Nathan and the NOC (Jared, Zeke, and Matt)

PG&E Service Outage at our Santa Rosa…

Thu Mar 15 13:34:37 PDT 2007 — PG&E Service Outage at our Santa Rosa datacenter and colo facility. We lost PG&E electrical services briefly this morning; Our backup power systems functioned perfectly. By the time we’d made it to the electrical room our genset had already been automatically started and the ATS had just switched over to it. There was no interruption of any services in or provided by our datacenter. It was exciting and reassuring to see everything work exactly like it is supposed to! -Sonic.net Operations Staff

Partial DSL outage.

Wed Mar 14 19:04:41 PDT 2007 — Partial DSL outage. A software problem in our provisioning tools caused a ‘sync no surf’ condition for about 8% of DSL subscribers. The duration was about 15 minutes. We are working to identify and correct the bug as we speak. – Eli, Nathan, Ross

Sacramento Area DSL Circuit Upgrade.

Mon Mar 12 12:47:39 PDT 2007 — Sacramento Area DSL Circuit Upgrade. Early Wednesday morning at approximately 12:01 AM, the circuit that handles all DSL traffic for the Sacramento area is going to be migrated to a new, larger circuit. Customers may experience up to 30 minutes downtime as they are migrated in batches from the old to new circuit. -Nathan and Kelsey 😉

MOTD Correction.

Thu Mar 8 19:56:02 PST 2007 — MOTD Correction. The DSL maintenance the MOTD reported as being scheduled for early Saturday Morning is actually going to be taking place in several hours, not tomorrow night. We apologize for any confusion. -Kelsey