Sonic Status January

Mail Server Backlog: Our mail servers are…

January 28, 2003

Tue Jan 28 12:13:22 PST 2003 — Mail Server Backlog: Our mail servers are currently experiencing a high load and are deep in queue. No mail has or will be lost, however, it may be delayed. We have taken some stop-gap measures to try to improve performance and it appears to be helping the situation. We expect to have it cleared up shortly.

We are in the process of testing a new mail server architecture that will resolve these periodic loading issues. We’ve got six blazing fast dual Xeon servers with plenty of RAM, local RAID, dual gigabit Ethernet NICs and two new shelves for our NetApp cluster currently testing the new architecture. The new architecture will include a new locally customized message store format that should provide better end-user performance while also putting less load on the back-end. There is a thread in news://news.sonic.net/sonic.general titled ‘Upcoming changes to mail handling…’ that details the changes and how they should improve the situation. -Kelsey

BroadLink has made an upgrade to the backhaul

January 27, 2003

Mon Jan 27 18:09:18 PST 2003 — BroadLink has made an upgrade to the backhaul radio that serves some Santa Rosa customers – more speed, less latency. Nice work, guys! Please direct discussions to:

news://news.sonic.net/sonic.dsl

-Dane

Sonic.net CEO Dane Jasper will be on KSRO AM…

January 27, 2003

Mon Jan 27 17:03:11 PST 2003 — Sonic.net CEO Dane Jasper will be on KSRO AM 1350 at a bit after 5:30 PM today on the Pat Thurston show to talk about the SQL Slammer worm. Tune in!

Update on the worm.

January 25, 2003

Sat Jan 25 11:06:01 PST 2003 — Update on the worm. Two of the five colocated customers who were disabled last night because they had been infected by the worm were brought back online at approximately 10:00AM this morning. Neither of these customers had properly secured their servers and they promptly began flooding 100mbits of outbound traffic again. Approximately 40 minutes later, Nathan and I had the ports locked down and Eli was en route to assist at the data center. While these customers were up again, reachability through our network was minimal. Technical support is experiencing long hold times and a high call volume, largely due to the affects of the worm. -Kelsey, Nathan and Eli.

More information on the Microsoft security…

January 25, 2003

Sat Jan 25 09:32:31 PST 2003 — More information on the Microsoft security problem that caused so many network issues across the Internet last night:

Internet Security Systems Security Brief January 25, 2003 Microsoft SQL Slammer Worm Propagation Synopsis: ISS X-Force has learned of a worm that is spreading via Microsoft SQL servers. The worm is responsible for large amounts of Internet traffic as well as millions of UDP/IP probes at the time of this alert’s publication. This worm attempts to exploit MS/SQL servers vulnerable to the SQL Server Resolution service buffer overflow (CVE CAN-2002-0649). Once a vulnerable computer is compromised, the worm will infect that target, randomly select a new target, and resend the exploit and propagation code to that host.

Impact:

Although the Slammer worm is not destructive to the infected host, it does generate a damaging level of network traffic when it scans for additional targets. A large amount of network traffic is created by the worm, which scans random IP addresses for vulnerable servers.

For the complete ISS X-Force Security Advisory, please visit: bvlive01.iss.net/issEn/delivery/xforce/alertdetail.jsp?oid=21824

Patching and disinfection information can be found at the URL above.

-Dane

Update on the worm #2.

January 25, 2003

Sat Jan 25 18:19:08 PST 2003 — Update on the worm #2. We have been filtering the spread of the worm on ingress and egress from our network for most of the day and appear to have stopped the spread of the infection inside of our network. All infected users have been contacted. Over the course of the day we have had a few brief outages related to the worm. None of them lasted for more than a few minutes.

This Event was entirely avoidable. The patches which fix the bug in the MS SQL service have been available for some time. The entire Sonic staff urges our users to keep their workstations and servers properly patched according to their vendor’s recommendations. In doing so, we can all help prevent something like this from happening again. -Kelsey

Night Operations Complete: Both the SMS and…

January 25, 2003

Sat Jan 25 03:04:47 PST 2003 — Night Operations Complete: Both the SMS and NetApp maintenance proceeded as planned. It’s been a wild night. -Kelsey and Nathan

The Internet has broken.

January 25, 2003

Sat Jan 25 00:09:11 PST 2003 — The Internet has broken. We do not yet know the source of the DoS but we believe that it may be another worm that attacks Microsoft servers. At this time, large portions of the Internet are still unreachable. There have been widespread reports that this DoS also took the lives of many Cisco routers, including one of our 7500 border routers. Our internal network is functioning at 100% but many sites have yet to restore service and I would anticipate, may not return for some time.

A very interesting graph of this event and it’s effect on the Internet as a whole is available at the following link: average.matrix.net/ -Kelsey and Nathan

A configuration error (operator error) on our

January 24, 2003

Fri Jan 24 10:47:59 PST 2003 — A configuration error (operator error) on our part caused a problem for customers that have us listed as a secondary mail exchanger for their domains. This is a very small percentage of our customers, and the symptom manifested itself as bounced mail only when the customers’ primary mail server was unavailable. _I_ apologize for the misconfiguration. — Eli

High latency and packet loss.

January 24, 2003

Fri Jan 24 22:29:43 PST 2003 — High latency and packet loss. We are currently experiencing high latency and packet loss in portions of our network, including part of our core. We are investigating the situation, and will hopefully have things back to normal soon. Kelsey is on-site. -Scott, Kelsey, and Nathan

Update Fri Jan 24 23:11:01 PST 2003… This turns out to be a massive worm which is causing denial of service (DoS) across the Internet. UUNet has characterized this as “the DoS of the year”. The vector is Microsoft SQL servers. So far, we have found 7 servers pumping 100 megabit/second into our core, which is the cause of the high latency and packet loss within portions of our network. We continue to work the problem. -Scott, Kelsey, and Nathan