Early this afternoon, we experienced a failure in one of our Netapps which caused some content to be unavailable, and a few key systems to present timeout errors. Downtime is estimated to be between 5 and 7 minutes. We have identified the problem and restored all services. –William, Augie, and Don
LATA1 AT&T ATM Outage
At 5:03 PM today, all of our DSL and Business-T subscribers in LATA1 went offline. The problem appeared to be internal to AT&T’s ATM network. As of 5:12 PM, the problem in AT&T’s network appears to have been resolved, and DSL and Business-T customers are back online. We are continuing to monitor the situation and work with AT&T to find out what happened in their network. More details will be forthcoming as we get them.
-Jared, and everyone in the NOC and Support
Update: Word from AT&T and ASI (AT&T’s ATM division) is that they suffered a major fiber issue in one of their major hub Central Offices in San Francisco. Traffic has been routed around the problem, and they are currently working on resolving the fiber issue. There is a small chance that traffic may be interrupted as AT&T works on their issue, but they will be doing everything they can to prevent that. We will be monitoring our ATM links very closely for the next 24 hours to mitigate any potential problems.
Update: AT&T has determined that the outage was caused by a failing ATM fabric card. Service was restored when traffic was moved to the spare fabric card. The failed card has been replaced, with no further service interruption.
ATM Switch Maintenance
Tonight at midnight we will be performing potentially invasive maintenance on one of our ATM switches. We will be reloading the primary route processor in the ATM switch and it should automatically fail over to the backup. Impact should be only a few seconds as the failover occurs, though it may be up to 10 minutes if problems are encountered. The ATM switch in question serves DSL and Business-T customers in the Bay Area, Sacramento and Fresno areas.
-Jared and Nathan
Update: The failover to the backup route processor did not go smoothly, so we were forced to reboot the ATM switch chassis. This resulted in approximately 10 minutes of connectivity interruption for customers connected through this ATM switch. We apologize for the interruption. We are monitoring traffic levels across the ATM switch and they appear to be moving back to normal levels. We will continue to monitor the switch closely.
New email Member Tools released today
We deployed the email portion of our New Member Tool suite this morning. Significant improvements have been made, particularly in the following tools:
- Graymail now features improved searching, sorting, and best of all, we’ve FINALLY added a message preview feature! Check it out at https://members.sonic.net/email/graymail.
- Our Spam Filtering Configuration tools have been rewritten from the ground up to be easier and more intuitive to use. They’re available at https://members.sonic.net/email/spam.
Please check them out and let us know what you think! Feedback can be sent to members@sonic.net.
– Dianne & Kelsey
Web cluster outage
At 12:55PM this afternoon, we experienced an outage of our main web cluster. Customer downtime was under 5 minutes. All services have been restored. –Don
Update: Upon investigation, we have taken steps to increase the robustness of this cluster to ensure this particular problem does not occur again. We apologize for the inconvenience.
BroadLink wireless outage
BroadLink is a wireless carrier partner for a group of customers in Santa Rosa and some surrounding areas. BroadLink has had a failure of a critical component in the core of their network, and they are currently attempting repair.
We have scheduled a meeting for an hour and a half from now for next status update. At this point, we have no realistic estimated time to repair to offer for them.
We remain guardedly hopeful that BroadLink will find a way to work around the failed component. That said, customers for whom DSL is available who wish to transition to DSL can begin that process by contacting Sonic.net sales
Update: A workaround has been devised, and if all goes well, we expect to see BroadLink back online within the hour.
New features added to our DNS Editor tool.
We have added the ability for our customers to edit their Reverse DNS in our DNS Editor Member Tool:
https://members.sonic.net/websites/nameservers/host_records
This tool works for anyone with IP Address services with us; be it Colocation, Leased Server, DSL, or anything else with an IP address tied to it; you can now use this tool to assign the corresponding DNS records to those addresses.
This feature is most useful to those running their own mail servers and can also be of use to those people who want both the forward and reverse of their domain name to match.
Sonic.net Backup Service
Sonic.net would like to invite you to try out our new online backup service. Our online backup service is available for OS X and Windows 2K or greater. It comes with a 50GB quota to backup all of your important documents and pictures.
For helping us test this new service you will receive in addition to the 90 day trial period an extra 90 days of backup service. If you would like to sign up for this service, please email support@sonic.net so we can sign you up!
Massive DDoS Attack
Approx 30 mins ago a Distributed Denial of Service attack was aimed at one of our colo customers. The NOC quickly identified and blocked the DDoS, but the attack ramped up to over 700 Mbps for several minutes. This traffic congested one of our transit links for 5-10 minutes, possibly causing poor performance or loss of connectivity to some sites on the Internet. As we were contacting the congested transit provider to have them put in a block, the DDoS ceased.
Our internal network remains sound, and we are continuing to monitor the network for any further disruptions.
-Jared and Nathan
DSL Subscriber Rebalancing
On Wednesday August 13 at 12:01 AM we will be moving several hundred DSL subscribers from one of our DSL aggregation routers to another, to balance network load. This move should take less than 5 minutes and will only cause a brief service interruption.
-Jared
Update: The migration has been completed. Downtime for most affected customers was less than 5 minutes. However there was a minor mix-up during the migration that caused approx 75 customers to lose connectivity for 15-20 minutes. All problems have been resolved and all customers should have proper connectivty again. We apologize for any inconvenience caused by this maintenance.