Author: admin

CLEC Intrusive Maintenance

This evening, between 12:01AM and 6:00AM, we will be performing intrusive maintenance on equipment serving Fusion and FlexLink ADSL2+ customers in the Pacific Heights, Mission Terrace, Excelsior, and Bayview areas of San Francisco. Affected customers should experience less than 15 minutes of downtime.

-Tim and Juston

CLEC Intrusive Maintenance

This evening, between 12:01AM and 6:00AM, we will be performing intrusive maintenance on equipment serving Fusion and FlexLink ADSL2+ customers in the Financial District, Castro, SoMa, and Mission areas of San Francisco. Affected customers should experience less than 15 minutes of downtime.

-Tim and Juston

Further Non-Intrusive Backbone Maintenance

Tonight at midnight, we will be performing card insertions and replacements at our San Francisco POP. These operations are being performed on our redundantly-built backbone systems, and traffic will be routed around each device as the work is being done. This should cause no interruption of service to customers served by this POP. This work is to support further DSL expansion, and our new core backbone.

-Jared, Nathan, Matt, and Jacob

The backbone maintenance for tonight has been completed. There were no major incidents, but customers may have noticed 15-30 seconds of poor connectivity as links were removed from service for the maintenance.

CLEC Intrusive Maintenance

This evening, between 12:01AM and 6:00AM, we will be performing intrusive maintenance on equipment serving Fusion and FlexLink ADSL2+ customers in the Pacific Heights, Richmond District, and Inner Sunset areas of San Francisco. Affected customers should experience less than 15 minutes of downtime.

-Tim and Juston

Non-Intrusive Backbone Maintenance

Tonight between 10 and 11PM, we will be replacing a core switch at our San Jose POP. We will be moving all traffic off that switch, via redundant paths before the maintenance begins, so there should be no customer impact. This switch replacement is one of the first steps in our network backbone upgrade.

-Jared and Nathan

Update: This maintenance was completed at approximately 1AM. The switch replacement was completed without incident.

ATM Customer Aggregation Router Reload

This Tuesday, March 23 at 12:01 AM, we will be performing a maintenance reload on our ATM customer aggregation routers. This will result in 5-10 minutes of downtime for Business-T and FRATM customers.

-Jared

Update: The maintenance reload has been completed without incident. All affected customers are back online at this time.

San Francisco ATM switch failure.

At approximately 10:40 we had a hardware failure on an ATM switch in San Francisco. We are presently rebooting it. Approximate downtime should be 5-7 minutes. -Sonic NOC

Update 11:10AM: The ATM switch reload is complete and traffic appears to be returning to normal. If you continue to have DSL sync-no-surf connectivity issues, please contact our tech support.

San Francisco datacenter running on generator

Currently PG&E utility is offline at the 200 Paul San Francisco facility, and it is running normally on generator.  Automatic transfer switching worked as designed, and all is well.

After the failure last week of a transfer switch in this San Francisco datacenter, it’s good to see that repairs to that redundant power system worked and that the repaired transfer switch did its job.

-Dane & Nathan

2n CRAC Redundancy Pays Dividends

At 7:11PM tonight one of our two Core4 CRAC (Computer Room Air Conditioning) units unexpectedly shut itself down.  Nothing instills fear more than receiving pages titled “Sys A Enable Switch Turned Off / Service Now” and “High Discharge Air Temperature” in rapid succession.  After the initial panic passed, it was clear that all redundant systems we operating correctly and the second system had responded correctly and ramped up to handle the total cooling load.  Once on site, there was no outward indication why System A had shutdown as the system enable switch was correctly in the “On” position and it had supply power.  However, upon further investigation, it was apparent that the enable switch was water-logged, oxidized and shorted-out, signaling the system to shutdown.  The switch has been removed from service and both systems are 100% operational again.

Although it is disappointing to see a system failure caused by something as simple as an improperly weatherproofed mechanical room and control panel, it is rewarding to see our commitment to and investment in redundancy pay off.  And, ultimately, that the prototype Core4 CRAC system behaved as expected.

Special thanks goes out to Jimmy and Kent of Bell Products who interrupted their dinners to come out and verify that all systems were functioning correctly.

-Kelsey, Nathan and Russ

Non-Impacting Transport Issue

One of our backbone network transport links began having issues this morning. We have removed that link from service, and are routing traffic internally around the problem as we work with the transport circuit provider to diagnose the intermittent problems. These problems did not cause any customer impact, but as we route around the problem, customers may notice sub-optimal paths inside Sonic’s network (i.e. from San Francisco to San Jose via Santa Rosa).

We are keeping a close eye on the situation and will restore normal routing once we are certain that the transport circuit is fully resolved.

-Jared and Nathan