DNS hardware upgrades.

Yesterday we completed an upgrade of our recursive DNS clusters (what you see as ns1.sonic.net and ns2.sonic.net).

This upgrade gives those DNS servers a substantial performance boost and enables them to handle the ever increasing ammout of requests that they see.

DNS is core to the Internet experience — without it : E-mail ceases to function, and surfing your favorite web-sites is impossible; it is because of this that we have always made DNS a priority in our infrastructure and are very proud of our ability to stay available even during times of high traffic and maintenance.

— Augie and Don.

Spam Filtering Glitch

One of our eight inbound mail servers has been intermittently failing to apply some of our front line spam filters for the past couple of days.  This lead to additional spam getting through to SpamAssassin and Graymail as well as into all of our inboxes.  We’ve added additional monitoring to catch and alert us of this failure in general and have also identified the specific cause and are taking steps to correct it.  -Kelsey

DSL Aggregation Router Maintenance Reload

Tonight, April 21 at 12:01 AM, we will be performing a maintenance reload of one of our DSL aggregation routers. This router serves DSL to customers in the Bay Area. Expected impact is less than 5 minutes of service interruption for DSL customers served by this router.

-Jared

Update: The SMS reboot did not go as smoothly as planned. Due to a configuration error, on reboot, the router loaded an archived configuration that was out of date. The proper configuration has been restored and all affected customers should now be back online. We apologize for this additional downtime.

Denial of Service Attack

At approximately 3PM today a DoS attack was aimed at one of our Business-T customers. The attack saturated the uplinks of one of our customer aggregation routers, causing 5-10 minutes of service interruption to customers served by that router. The DoS attack has been blocked at our network edge.

-Jared

Customer Router Hardware Maintenance

This evening, April 15th, at 10:15PM we will be inserting a new interface card into one of our aggregation routers. This router provides FlexLink coverage for the Healdsburg area. This work is not expected to cause an interruption of service, but a router reload may be required to complete this operation. If a reload is required, customers served by the affected router will experience 5-10 minutes of service interruption.

-Tim and Clay

Backup power systems online

Update :  Power was restored yesterday (14 April) around 5:55pm and all systems are normal.  –Augie

Sonic.net’s Santa Rosa datacenter and offices are currently running on our power backup system due to a PG&E utility outage.  All backup systems are operating as designed, and there is no customer impact.

It’s rare that we have an actual utility power outage here, so most of our use of the backup systems is weekly, semi-annual and annual testing and maintenance.  It is very interesting to see the office, darker than usual (only limited lighting is on), but otherwise functioning as usual.  Technical support PCs are online, our phone system is online, and we are providing customer service as usual.

I’d like to offer thanks and congratuations to the team here who put together our backup systems.  Russ Irving, Kelsey Cummings, Nathan Patrick, Juston Pierce and former employees Matt Kirk and John Harkin.  Thanks team!  -Dane

AT&T ATM Outage

At approximately 2:30 AM today, AT&T lost ATM connectivity to the Santa Cruz area, causing all DSL in that area to be non-functional. AT&T has no ETR at this time, but we are in contact with them and will update as soon as they have any new information.

-Jared and Nathan

Update: AT&T has an official ETR on this outage of 8 PM today. We will continue to update as we get more information from AT&T on this outage. This news article provides additional information on the nature and cause of the AT&T outage: http://www.digitalnewsreport.com/2009/04/phone-internet-outage-in-santa-clara-santa-cruz/1302

Update: As of 2:40 PM we started seeing customers affected by the AT&T fiber cut come back online, and currently the live customer count is steadily increasing.
Update: At this time, AT&T reports that the fiber cut that caused this outage has been repaired, and all affected customers should be back online.

ATM Customer Aggregation Router Reload

This upcoming Friday, April 10 at 12:01 AM, we will be performing a maintenance reload on our ATM customer aggregation routers. This will result in 5-10 minutes of downtime for Business-T and FRATM customers.

-Jared

Update: The maintenance reloads were completed without issue. Total downtime for affected customers was under 10 minutes.

T1 Cable Maintenance

This Wednesday, April 8 at 12:01 AM, we will be doing brief cable maintenance on the DS3 that backhauls some of our T1 capacity in our San Francisco POP. This will result in approximately 15 minutes of downtime for T1 customers on the affected DS3.

-Jared and Nathan

Edit: Fixed incorrect month.

Update: This maintenance is complete. Actual downtime for affected customers was approximately 221 seconds.  -Nathan and Clay

Intermittent DHCP Issue

At 11 AM today we proactively failed out of service a DHCP server that serves DHCP to some DSL customers in the Bay Area and Sacramento area due to a RAID disk failure. The DHCP traffic fell back to our backup DHCP server. Unbeknownst to us, the backup DHCP server had a hardware issue that was causing it to respond to DHCP lease requests slowly, thus causing intermittent DHCP service to the affected customers. We have restored the primary DHCP server while we diagnose and repair the backup server. We apologize for any interruption of service that this DHCP issue caused.

-Jared, Nathan, Jasper, Kavan and Kim