Month: April 2010

CLEC DSLAM Maintenance

This evening, at 10:00PM, we will be performing a software upgrade on the DSLAM that serves Fusion customers in the Rincon Valley area of Santa Rosa. Expected downtime is less than 15 minutes while the DSLAM is rebooted onto the new software release.

-Tim and Nathan

Update:

Things did not go as planned. The upgrade failed to import a wide variety of important customer settings, causing us to attempt our pre-scripted roll-back procedure to undo the software upgrade. That process went even worse, and caused our DSLAM to forget a large chunk of even more important stuff. What was left was severely corrupted.

We keep a full library of historical device configurations, so the logical course of action was to re-program the device from one of those saved copies. This operation didn’t work. We thought it was a version mismatch problem between the saved copies (they’re in binary — PLEASE, device vendors, don’t keep your configurations in binary!) and the exact software load we were attempting to restore on. We tried 4 or 5 different combinations. Nothing worked.

Typically, our devices are provisioned by automated systems. Due to changes wedged into this code, our automated systems don’t quite know how to talk to the new version properly, so the automation was next to useless. In the end, we re-configured the whole device by hand on the code we were attempting to upgrade to.

Despite the saga above, this particular issue affected less than 20 of our customers. Our sincere apologies to those folks, who experienced an outage from around 10:30pm until 1:30am or so. We’ll be hammering out these issues with our equipment vendor to ensure this doesn’t happen again.

-Nathan + Matt and Jared for moral support

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Forestville and Sebastopol. Affected customers should experience less than 10 minutes of downtime.

-Tim

DSL Service Disruption

We are currently tracking a network event that is disrupting network connectivity for several of our DSL aggregation routers. We are working to identify and resolve this event as quickly as possible.

-Jared and Nathan

Update:  We believe the issue is a backplane problem on the ATM switch serving many of our DSL customers.  A reload of the affected device should resolve the trouble — we’ll let you folks know how it goes. -Nathan and Jared

Update: Things are looking much better after a reload of the ATM switch. We’re still working to ensure that all services are up and functional, and will be working with the equipment vendor to diagnose the trouble that we’re having. Sorry to all affected! -Nathan and Jared

ATM OC-12 outage

One of our ATM OC-12s suffered a five minute outage, we are investigating, and a further update will follow.

Update: The ATM OC-12 has remained stable and we have tickets open with the provider on the circuit to ensure that the circuit will not have further problems. The outage we had from 16:55 to 17:00 would have affected some of our DSL, Business-T and FRATM customers in the bay area.

-Jared

Internal database failure.

Update : 5:20pm all services have been restored.

We suffered an internal database failure around 4:30pm today; the data is currently being restored, and the repair time is estimated to be 20 minutes.

You may see errors when trying to use some of our Member Tools because of this.

Updates will be published as they happen.

Generator Maintenance in Santa Rosa

On Monday, April 27th, our backup generator for our Santa Rosa datacenter is getting its routine annual maintenance, inspection and load testing.  A rental generator will be on site wired into the standby terminals on our automatic transfer switch for the duration of the work so at no time will any of our redundant power systems be compromised.

Directory listing and permission changes

We will be making changes to directory permissions tomorrow which will limit the ability of users to list the contents of some directories.  You will be able to change directory through them, but you will not be able to list the contents.

This will for example affect users connecting via FTP to upload files – you can go to the directory you own, but you will not be able to list all the directories one level up.  This is a cosmetic change, you can still get to the directory, but you must know its name.

If you have any questions after this change is made, please post to news://news.sonic.net/sonic.os.unix

Power Maintenance in San Jose

This Sunday, April 4 at 11PM we will begin power maintenance at our San Jose POP. This maintenance involves moving the A and B sides of our redundant power to a new physical configuration. We will be taking every precaution to ensure that nothing loses power, however, when doing work of this nature, outages can occur. We expect the work to be completed by midnight.

-Jared and Nathan