Sonic Status May

System Maintenence Tonight.

May 28, 2015

Update: Maintenance complete

Tonight starting at 11:59pm SOC will be updating software on some of our core systems. The following services may experience brief interruptions:

Website hosting
IPv6 tunnels
Incoming and outgoing mail

We will also be upgrading the SSL certificate for imap.sonic.net from SHA1 to SHA256. This is the last of our SSL certificates that we need to upgrade so we don’t expect most clients to have problems, but very old mail clients may not support the new certificate.

-Grant, Joe, and SOC

Fusion VDSL2 Intrusive Maintenance – Forestville

May 28, 2015

Update: This maintenance is complete.

Beginning tonight at midnight I will be performing intrusive maintenance on equipment serving a small portion ofÂ Fusion customers in the Forestville area. Expected downtime is around 15 minutes.

– Robbie

Credit card processor down

May 27, 2015

UPDATE:Â Our vendor got back to us and we now have the problem resolved.

Currently our credit card processor is down and we are unable to process new payments. We have already contacted our vendor but unfortunately we do not expect to have a resolution until early tomorrow morning.

-William

Intrusive Network Maintenance – Brentwood

May 27, 2015

Tonight beginning at 11:59PM PDT we will be performing a software upgrade of equipment serving the Brentwood/Pittsburgh/Antioch/Concord areas. This maintenance is expected to last 30-45 minutes and may potentially be service impacting for the duration.

-Tim J.

Network Maintenance – Legacy DSL

May 27, 2015

Update (2:22AM): This maintenance is now complete.

Beginning tonight at midnight, I will be performingÂ maintenance on equipment that serves legacy DSL customers in northern California. Although the majority of the equipment I will be working with is redundant, aÂ small portion of customers may experience some downtime.

– Robbie

Non-Intrusive Network Maintenance – Palo Alto – CANCELLED

May 27, 2015

This maintenance has been cancelled.

Tonight beginning at 11:59PM PDT we will be performing a software upgrade of 2 devices in Palo Alto. Traffic will be moved to other links prior to the maintenance, and no customer impact is expected.

-Tim J.

UPS Failure Redux

May 22, 2015

First, we’d like to clarify the extent of the problems causes by the UPS failure and subsequent dropping of load in the Datacenter. Â This had no impact on any residential or enterprise connectivity services including Legacy DSL, Fusion and Fusion FTTN. Â The UPS that failed was the smallest of the three UPSes in Santa Rosa and we hadÂ been working to migrate load from it. Â As such, less than 20 customers in total lost some or all of their power circuits, some of which may have been part of redundant A/B circuits. Â Some colo customers lost connectivity as several distribution switches did loose power. Â Most sonic services, including pop, imap, webmail were not affected or only saw a brief outage as single PSU equipment rebooted and/or clusters converged as load shifted to systems thatÂ were unaffected. Â The only public service that had lingering issues was our webhosting cluster which required a little manual attention for it to come online.

The outage was eventuallyÂ caused by a physical failure of the maintenance bypass switch – one of the phases in the switch stuck and/or didn’t close correctly – Â in theÂ bypass cabinet for the PDU we were moving. Â In hindsight, it isÂ unfortunate that we chose to operate the switch in the first place as it wasn’t strictly the simplest way to migrate the load. Â The last power failure in theÂ datacenter was in Oct ’04 — where the same, UPS failed.

We will schedule migration off of the temporary feeds put in place in the coming weeks. Â This final move is significantly easier to execute and has an exceedingly low likelihood of causing any service interruptions.

-Kelsey, Russ, and the rest of System and Network Operations.

intermittent dns failure

May 22, 2015

Between 5:00 pm yesterday and 9:00 am today,Â customers may have experiencedÂ intermittent DNSÂ failures or slower than normal name resolution.Â At 9:00 am this morning we noticed a configuration failure on one of our name server clusters. We immediately disabled the cluster which allowed traffic to flow over to our other redundant cluster. We have since addressed the issue and restored the cluster to working service. We are currently investigating our monitoring procedures to identify why this issue wasn’t detected earlier and to make sure it doesn’t happen again. We apologize for any inconvenience this may have caused.

– William & Kelsey

UPS Failure in Santa Rosa Datacenter

May 22, 2015

One of the three UPSes that handles load in our Santa Rosa datacenter failed early this morning and tripped into bypass. Â Unfortunately, the internal failure is significantÂ and at leastÂ involves the primary IGBTs. Â We are exploring our repair options but the most likely outcome is that we will be accelerating the planned decommissioning of this UPS and migration of its associated PDU to one of our other two UPSes. Â This is something that we had planned on completing at some point in the next six to twelve months but have not yet scheduled or scripted. Â It is a relatively straight forward procedure but must be executed with great care to ensure both the safety of our workers and that live load in the datacenter is not dropped. Â Updates will be posted as needed.

Current status: Our standby generator is currently running to enable theÂ ATS to transfer load without interruption in theÂ event that ourÂ primary PG&E power feed drops.

Update: Friday 14:00, we have electricians on site placing the cable to move the PDU from the failed UPS to one of our other UPSes. Â We plan to completeÂ the migration as soon as the cable is staged and ready to go. Â Once the cable is placed, the new target UPS will be placed into maintenance bypass. Â This allows us to transition the PDU from the old bypassed UPS to the new UPS without dropping its load. Â Once the cable is terminated, the breaker on the target UPS is closed, the oldÂ breaker can be opened completing the transition. Â At this point, the target UPS will be restarted.

Update: Friday 15:05, we’re beginning the bypass procedure now.

Update: Friday 15:15, unfortunately, load the PDU was dropped momentarily but we are continuing to complete the migration. Â Power was lost to several of ourÂ single PSU systems but most affected services have already been restored. Â More information forthcoming.

-Kelsey and Russ

Service Impacting Network Maintenance – Business Park Customers in Santa Rosa – 5/20/15

May 19, 2015

This maintenance is now complete.

Tomorrow night (5/20/2015) beginning at 11:59PM PDT, we will be performing a software upgrade of routers serving business park customers in Santa Rosa. The expected customer outage is expected to last 15-20 minutes.

-Tim J.