Month: April 2009

CLEC Intrusive Maintenance

This evening, April 29th at 12:01AM, we will be performing intrusive maintenance on equipment serving Fusion and FlexLink customers in the downtown Santa Rosa area. This will be a service affecting maintenance window and we expect customer downtime to be less than 15 minutes.

-Tim and Clay

Update: All work has been completed as planned. Customer downtime was less than 15 minutes.

DSL Uplink Failure

Earlier today, our support department diagnosed a problem with one of our DSL aggregation routers that resulted in customers served by that router having difficulties reaching some sites on the Internet. After escalating the issue, it was diagnosed that one of the redundant ATM uplinks of that router went faulty. We have disabled that uplink and will be replacing it soon. First reports of the problem were tracked by Support at approximately 3:45 PM and the faulty uplink was disabled at 4:30 PM.

-Jared and John

Brief FTP outage.

FTP access to ftp.sonic.net was unavailable for about 20 minutes today when some routine maintenance had unexpected consequences and subsequent failure — all systems are normal now. –Augie

DNS hardware upgrades.

Yesterday we completed an upgrade of our recursive DNS clusters (what you see as ns1.sonic.net and ns2.sonic.net).

This upgrade gives those DNS servers a substantial performance boost and enables them to handle the ever increasing ammout of requests that they see.

DNS is core to the Internet experience — without it : E-mail ceases to function, and surfing your favorite web-sites is impossible; it is because of this that we have always made DNS a priority in our infrastructure and are very proud of our ability to stay available even during times of high traffic and maintenance.

— Augie and Don.

Spam Filtering Glitch

One of our eight inbound mail servers has been intermittently failing to apply some of our front line spam filters for the past couple of days.  This lead to additional spam getting through to SpamAssassin and Graymail as well as into all of our inboxes.  We’ve added additional monitoring to catch and alert us of this failure in general and have also identified the specific cause and are taking steps to correct it.  -Kelsey

DSL Aggregation Router Maintenance Reload

Tonight, April 21 at 12:01 AM, we will be performing a maintenance reload of one of our DSL aggregation routers. This router serves DSL to customers in the Bay Area. Expected impact is less than 5 minutes of service interruption for DSL customers served by this router.

-Jared

Update: The SMS reboot did not go as smoothly as planned. Due to a configuration error, on reboot, the router loaded an archived configuration that was out of date. The proper configuration has been restored and all affected customers should now be back online. We apologize for this additional downtime.

Denial of Service Attack

At approximately 3PM today a DoS attack was aimed at one of our Business-T customers. The attack saturated the uplinks of one of our customer aggregation routers, causing 5-10 minutes of service interruption to customers served by that router. The DoS attack has been blocked at our network edge.

-Jared

Customer Router Hardware Maintenance

This evening, April 15th, at 10:15PM we will be inserting a new interface card into one of our aggregation routers. This router provides FlexLink coverage for the Healdsburg area. This work is not expected to cause an interruption of service, but a router reload may be required to complete this operation. If a reload is required, customers served by the affected router will experience 5-10 minutes of service interruption.

-Tim and Clay

Backup power systems online

Update :  Power was restored yesterday (14 April) around 5:55pm and all systems are normal.  –Augie

Sonic.net’s Santa Rosa datacenter and offices are currently running on our power backup system due to a PG&E utility outage.  All backup systems are operating as designed, and there is no customer impact.

It’s rare that we have an actual utility power outage here, so most of our use of the backup systems is weekly, semi-annual and annual testing and maintenance.  It is very interesting to see the office, darker than usual (only limited lighting is on), but otherwise functioning as usual.  Technical support PCs are online, our phone system is online, and we are providing customer service as usual.

I’d like to offer thanks and congratuations to the team here who put together our backup systems.  Russ Irving, Kelsey Cummings, Nathan Patrick, Juston Pierce and former employees Matt Kirk and John Harkin.  Thanks team!  -Dane

AT&T ATM Outage

At approximately 2:30 AM today, AT&T lost ATM connectivity to the Santa Cruz area, causing all DSL in that area to be non-functional. AT&T has no ETR at this time, but we are in contact with them and will update as soon as they have any new information.

-Jared and Nathan

Update: AT&T has an official ETR on this outage of 8 PM today. We will continue to update as we get more information from AT&T on this outage. This news article provides additional information on the nature and cause of the AT&T outage: http://www.digitalnewsreport.com/2009/04/phone-internet-outage-in-santa-clara-santa-cruz/1302

Update: As of 2:40 PM we started seeing customers affected by the AT&T fiber cut come back online, and currently the live customer count is steadily increasing.
Update: At this time, AT&T reports that the fiber cut that caused this outage has been repaired, and all affected customers should be back online.