Category: Network

DSL Subscriber Rebalancing

On Tuesday September 30 at 12:01 AM we will be moving several hundred DSL subscribers from one of our DSL aggregation routers to another, to balance network load. This move should take less than 5 minutes and will only cause a brief service interruption.

-Jared

Update: Rebalancing completed. Downtime for affected customers was under 5 minutes.

System Authentication Failure

This afternoon, around 4:30PM we experienced a problem in our main authentication system. This problem effected a handful of user accounts and associated mailboxes. Account and mailbox owners would have seen authentication failures and possible timeouts when attempting to connect to services such as shell, pop/imap, and webmail services. The underlying problem has been fixed.  Customers may have seen approximately 10 minutes of authentication failure. If you are experiencing problems with authentication, please call Sonic.net Support at (707) 522-1000. We appologize for the inconvenience this has caused. –Don and Jared

VPN Concentrator Certificate Expiry

Today the identity certificate for our VPN concentrator expired. This would have prevented new VPN sessions from establishing. We have renewed the certificate, and the VPN concentrator is accepting connections normally now.

-Jared and Nathan

LATA1 AT&T ATM Outage

At 5:03 PM today, all of our DSL and Business-T subscribers in LATA1 went offline. The problem appeared to be internal to AT&T’s ATM network. As of 5:12 PM, the problem in AT&T’s network appears to have been resolved, and DSL and Business-T customers are back online. We are continuing to monitor the situation and work with AT&T to find out what happened in their network. More details will be forthcoming as we get them.

-Jared, and everyone in the NOC and Support

Update: Word from AT&T and ASI (AT&T’s ATM division) is that they suffered a major fiber issue in one of their major hub Central Offices in San Francisco. Traffic has been routed around the problem, and they are currently working on resolving the fiber issue. There is a small chance that traffic may be interrupted as AT&T works on their issue, but they will be doing everything they can to prevent that. We will be monitoring our ATM links very closely for the next 24 hours to mitigate any potential problems.

Update: AT&T has determined that the outage was caused by a failing ATM fabric card. Service was restored when traffic was moved to the spare fabric card. The failed card has been replaced, with no further service interruption.

ATM Switch Maintenance

Tonight at midnight we will be performing potentially invasive maintenance on one of our ATM switches. We will be reloading the primary route processor in the ATM switch and it should automatically fail over to the backup. Impact should be only a few seconds as the failover occurs, though it may be up to 10 minutes if problems are encountered. The ATM switch in question serves DSL and Business-T customers in the Bay Area, Sacramento and Fresno areas.

-Jared and Nathan

Update: The failover to the backup route processor did not go smoothly, so we were forced to reboot the ATM switch chassis. This resulted in approximately 10 minutes of connectivity interruption for customers connected through this ATM switch. We apologize for the interruption. We are monitoring traffic levels across the ATM switch and they appear to be moving back to normal levels. We will continue to monitor the switch closely.

BroadLink wireless outage

BroadLink is a wireless carrier partner for a group of customers in Santa Rosa and some surrounding areas.  BroadLink has had a failure of a critical component in the core of their network, and they are currently attempting repair.

We have scheduled a meeting for an hour and a half from now for next status update.  At this point, we have no realistic estimated time to repair to offer for them.

We remain guardedly hopeful that BroadLink will find a way to work around the failed component.  That said, customers for whom DSL is available who wish to transition to DSL can begin that process by contacting Sonic.net sales

Update: A workaround has been devised, and if all goes well, we expect to see BroadLink back online within the hour.

Massive DDoS Attack

Approx 30 mins ago a Distributed Denial of Service attack was aimed at one of our colo customers. The NOC quickly identified and blocked the DDoS, but the attack ramped up to over 700 Mbps for several minutes. This traffic congested one of our transit links for 5-10 minutes, possibly causing poor performance or loss of connectivity to some sites on the Internet. As we were contacting the congested transit provider to have them put in a block, the DDoS ceased.

Our internal network remains sound, and we are continuing to monitor the network for any further disruptions.

-Jared and Nathan

DSL Subscriber Rebalancing

On Wednesday August 13 at 12:01 AM we will be moving several hundred DSL subscribers from one of our DSL aggregation routers to another, to balance network load. This move should take less than 5 minutes and will only cause a brief service interruption.

-Jared

Update: The migration has been completed. Downtime for most affected customers was less than 5 minutes. However there was a minor mix-up during the migration that caused approx 75 customers to lose connectivity for 15-20 minutes. All problems have been resolved and all customers should have proper connectivty again. We apologize for any inconvenience caused by this maintenance.

Inbound DDoS

We recently experienced an inbound DDoS attack that was directed at one of our customers. The attack was promptly blocked at our edge and any connectivity issues related to it should be resolved. Customers may have experienced slightly degraded performance during this time. -Dusty, Nathan, and Jared

Customer mySQL maintenance tonight

Tonight, at 1:00AM PDT I will be performing maintenance on custsql as well as documenting some performance statistics.  A few customers may receive a “too many connections” message when their client attempts to connect to the system during this short period.

Update: All maintenance performed with no customer service interruption.

–Don