Month: March 2013

DDoS Distruption

This morning, at around 9:15AM, a DDoS attack towards Sonic.net owned equipment de-stabilized part of our hosting network resulting in the inability to access to the www.sonic.net website and some others. The issue has been handled and access to the sites restored as of 9:45AM.

– Noc and Soc

Emergency Legacy DSL Maintenance

At 4:15pm, a small subset of legacy DSL subscriber connections became unstable. We are investigating the cause of this issues and are working to stabilize the affected circuits as fast as possible.

Update: All affected subscribers have been migrated to new equipment. Affected customers may have noticed several service interruptions lasting no more than 3 minutes each. Please note that under some circumstances, rebooting the DSL equipment on site may be required.

-Tomoc and Robbie

Ongoing DNS Server DoS Attack

Over the past few days we’ve seen a massive increase in both the number and volume of DNS Amplification Attacks using our recursive name servers.  This is likely due to the fact that our new name servers provide more verbose answers and are therefore amplify traffic more effectively than our old servers.  We unfortunately had to roll back blocking off-net use of our recursive servers and blocking these requests entirely is not currently an option at this time.  To mitigate the effects of the attacks both on our systems and their targets, we’ve instituted rate limits on the total number of queries per second any given IP address is able to source to our servers.  The rate limits are high enough that they should not interfere with any normal (and acceptable) use.  However, it is possible that a customer doing bulk DNS lookups (such as log processing or running a busy mail server) may run into issues and experience intermittent delays resolving host names.

-Kelsey, Augie and Nathan

DNS Service interruption.

DNS service was interrupted to our ns1.sonic.net name server cluster at 5pm PST for approximately 10 minutes.

All service has been restored and we are investigating the root cause of the problem.

–Augie

Update: Another brief service interruption today ( 21 March ) at 4:31pm PST for approximately 7 minutes as we were updating our name servers. Service was restored at 4:38pm PST.

Emergency ATM maintenance

We are performing an emergency reload on equipment serving a subset of legacy DSL, Business-T, and FRATM customers in the Bay Area and Modesto/Stockton area. We will send another update shortly once the maintenance has completed.

 

-Tim and the NOC

 

Update: Service for affected customers should now be restored. At around 3:03PM, a controller card in one of our aggregation switches performed an automatic failover to the standby card. For unknown reasons, the standby card was not operating correctly once it took over, causing an outage for some percentage of subscribers. We were forced to perform a full reset of the device to restore service. We are investigating both the cause of the failover as well as the additional issues experienced afterwards and we apologize for the duration of this outage.

ATM Maintenance

This evening, beginning at 11:00PM, we will be performing invasive maintenance on equipment serving a subset of ATM subscribers in the Bay Area and Sacramento. This will affect Business-T, FRATM, and legacy DSL services specifically. We expect downtime for the affected customers to be less than 15 minutes.

 

-Tim, Robbie, and Tomoc

Update: The maintenance work has been completed as planned. All affected customers should be back online.

Legacy DSL Maintenance

Tonight, March 7, starting at 11:30pm, we will be performing maintenance on equipment serving a small subset of legacy DSL customers in the Bay Area. Expected customer down time is less than 30 minutes.

Update: Maintenance complete. Affected customers may need to restart their DSL equipment.

-Tomoc

DNS Policy Changes Deployed

We’ve completed the deployment of our new DNS policies as covered in the Upcoming DNS changes MOTD.  To reiterate, we’ve disabled access to our recursive servers from off our network, are enforcing DNSSEC validation and have enabled two commercial RPZ lists to help protect our customers from phishing, viruses and malware.  For more information please see this forum post.

Update:  We disabled one of the commercial RPZ lists earlier this afternoon.  Despite substantial testing and review prior to deploying this to our customers that the service – at least as it stands now – is overly aggressive in its listing policy.

-Kelsey and Augie

DNS resolution problem.

Today at 3:45pm PST we suffered a DNS outage to one of our two DNS clusters, service was restored to the cluster at 3:51pm.

All services are functioning as expected at this time, and we are investigating the cause of the outage.

–Augie

Legacy DSL DHCP Maintenance

Tonight, at midnight, we will reboot one of our DHCP servers that handles legacy DSL customers terminated in San Francisco onto a new kernel.  The server should be down for only a few minutes and little to no interruption of services should occur for the 2,700 affected customers.  This update will resolve a problem with an Ethernet transceiver lockup affecting one of the servers redundant uplinks.

-Kelsey

Update Wed Mar  6 00:14:32 PST 2013,  maintenance complete.  Total downtime for the server was less that 5 minutes.