DNS Resolution Problems

This evening at about 5:20 PM, a network reconfiguration caused our DNS resolver load to be shifted onto only one of our multiple load-balanced DNS servers. The server was unable to cope with the full load alone, and was providing sluggish DNS responses, causing Internet connections to appear down or sluggish. We resolved the issue at about 6:45 by re-balancing the DNS traffic. We apologize for this disturbance and will be revisiting our DNS server architecture soon.

-Jared, Kelsey and Nathan

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Berkeley, Los Altos, and the SoMa area of San Francisco. Affected customers should experience less than 10 minutes of downtime.

-Tim

Backbone Maintenance

This Tuesday, May 11 at 11:01 PM, we will be performing backbone maintenance at our San Francisco POP. This maintenance will be moving our existing transit and transport load to our new T320 backbone routers. We should be able to perform this work without disruption to customer traffic, but work of this magnitude always carries a risk of impact.

-Jared and Nathan

Update: The backbone maintenance was completed at about 2AM. There should have been no major outages as we reconfigured our backbone. We are now completely live on our new 10 Gigabit Backbone ring!

DSL Aggregation Router Reload

This Saturday, May 8 at 12:01 AM we will be performing maintenance reloads of two Redbacks that terminate traditional DSL service. This will affect all some of our Los Angeles and Sacramento DSL subscribers. Expected downtime is 5 minutes.

-Jared

The scheduled reloads have been completed without incident. All affected customers are back online after about a 5 minute outage.

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Napa, Oakland, and Windsor. Affected customers should experience less than 10 minutes of downtime.

-Tim

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Rohnert Park and Petaluma. Affected customers should experience less than 10 minutes of downtime.

-Tim

Backbone Maintenance

Tonight at 12:01 AM, we will be performing backbone maintenance at our San Jose POP. This maintenance will be moving our existing transit and transport load to our new T320 backbone router. We should be able to perform this work without disruption to customer traffic, but work of this magnitude always carries a risk of impact.

-Jared and Nathan

Update: All appears well. We’re calling it a night, without any observable customer impact.

-Nathan and Jared

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Healdsburg. Affected customers should experience less than 10 minutes of downtime.

-Tim

CLEC DSLAM Maintenance

This evening, at 10:00PM, we will be performing a software upgrade on the DSLAM that serves Fusion customers in the Rincon Valley area of Santa Rosa. Expected downtime is less than 15 minutes while the DSLAM is rebooted onto the new software release.

-Tim and Nathan

Update:

Things did not go as planned. The upgrade failed to import a wide variety of important customer settings, causing us to attempt our pre-scripted roll-back procedure to undo the software upgrade. That process went even worse, and caused our DSLAM to forget a large chunk of even more important stuff. What was left was severely corrupted.

We keep a full library of historical device configurations, so the logical course of action was to re-program the device from one of those saved copies. This operation didn’t work. We thought it was a version mismatch problem between the saved copies (they’re in binary — PLEASE, device vendors, don’t keep your configurations in binary!) and the exact software load we were attempting to restore on. We tried 4 or 5 different combinations. Nothing worked.

Typically, our devices are provisioned by automated systems. Due to changes wedged into this code, our automated systems don’t quite know how to talk to the new version properly, so the automation was next to useless. In the end, we re-configured the whole device by hand on the code we were attempting to upgrade to.

Despite the saga above, this particular issue affected less than 20 of our customers. Our sincere apologies to those folks, who experienced an outage from around 10:30pm until 1:30am or so. We’ll be hammering out these issues with our equipment vendor to ensure this doesn’t happen again.

-Nathan + Matt and Jared for moral support

CLEC Intrusive Maintenance

This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink Ethernet customers in Forestville and Sebastopol. Affected customers should experience less than 10 minutes of downtime.

-Tim