All offsite remote access equipment has been…

Thu Dec 31 11:10:11 PST 1998 — All offsite remote access equipment has been upgraded to ComOS 3.8.2. This does not affect equipment located in Santa Rosa. This version (just released) fixes many connectivity problems. For a full list of fixes, you can check out the release notes at ftp.livingston.com/pub/le/upgrades/release382.txt.

Sonic would also like to wish all of you a very happy New Year and best wishes for a prosperous year ahead! -Brian

Above.net had a systems crash at…

Wed Dec 23 15:25:34 PST 1998 — Above.net had a systems crash at approximately 2:30 PM today that affected dial up access to some areas in the South Bay. The outage lasted for about 1 hour. They have resolved the problem and our Above.net POP is now back online. -Brian

Between about 1:50pm and 2:11pm, one of our…

Sat Dec 19 15:36:32 PST 1998 — Between about 1:50pm and 2:11pm, one of our web servers (‘Storm,’ www.sonic.net) stopped serving web pages. Investigation reveals an almost certain explanation: a CGI program caused the system to become ‘swap-bound’ (that is, doing a lot of work using disk-based virtual memory).

Our web servers run CGI’s within specific resource limits; but as it turns out, one resource limit remained unset on Storm — RLIMIT_AS, the ‘address space’, or virtual memory, limit. That limit is now set, so we shouldn’t see this problem anymore. (Nevertheless, we’ll keep a weather eye out for any more problems with Storm. 🙂 -Scott

Network connections for nas21 and nas22 have…

Fri Dec 4 13:45:14 PST 1998 — Network connections for nas21 and nas22 have been moved. We also utilized the opportunity to upgrade nas21’s power system so that its capacity may be agumented, as well as giving it a much-needed firmware upgrade. Multilink PPP across chassis (MPIP) seems to be working much better now.

Bolt’s move went well, and it’s been upgraded to a Pentium II/450 with 256mb of ram (the old system was a PPro-200/128), and has also been moved from 10mbit to 100mbit switched ethernet. There should be a noticeable performance improvement, especially during periods of heavy usage. The new system also includes redundant power and cooling, SMP capability, and a spiffy deathstar-black rackmount chassis. 🙂 In other news, while we were moving bolt, sonic.sonic.net (the primary authentication/DNS server) started having some swap corruption problems and needed a reboot. There should have been no noticeable reduction in service, since the secondary auth/DNS server (boom.sonic.net) was functioning properly. Also, the news server (ultra.sonic.net) had filesystem troubles, and some local posts may have been delayed while it was repaired. -Scott, Brian, Devin, Eric, Ian, Logan, Eli, Dane, Asa, Kat, and Arak (who doesn’t really work here)

Some folks have noticed a difference in…

Thu Dec 3 13:01:46 PST 1998 — Some folks have noticed a difference in performance between dialup connections on nas21 and nas22. After some investigation (including some special monitoring by Devin and Ian — thanks guys), we have identified a congestion situation in our switched infrastructure, as well as a way to prevent it. To avoid a similar congestion situation this evening, we are currently moving network connections to some of our gear; unfortunately, this may result in intermittent loss of connectivity (for a few seconds) while our switch tables adjust themselves to the new topology. We apologize for the inconvenience, and we’ll try to make the impact as minimal as possible.

Additionally, we will be moving the shell server (Shell.sonic.net, aka Bolt.sonic.net) tonight at 10:30pm. That will be a physical move into our new facility, which should take less than 30 minutes. Again, we apologize for the inconvenience. -Scott, Brian, Ian, Devin, Logan, Eli, Asa

Starting around 8pm or so, our mail server…

Thu Nov 19 21:42:15 PST 1998 — Starting around 8pm or so, our mail server was severely impacted by a tremendous influx of email. Someone spammed AOL from a forged domain name, and unfortunately, we host that domain. This means that the tens of thousands of bounce messages were returned to the domain owner’s mailbox. [As of Fri Nov 20 00:31:16, I’ve counted 23,000 messages so far. /sd]

That wouldn’t have been a problem — except, AOL’s mail server farm was all cooperating to deliver these messages. In other words: thundering herds of AOL mail servers stampeded Sub, our mail server. We resolved that by turning off connectivity between AOL’s server farm’s networks and us, and then slowly bringing up connectivity, allowing traffic to clear one network at a time. I’ll post more about that on news:sonic.general. -Scott

UUNet experienced a major outage across their

Sun Nov 8 17:45:57 PST 1998 — UUNet experienced a major outage across their backbone from ~9am until ~3pm today. Discussion on a network operators’ mailing list intimates that this may have been due to software bugs in two vendors’ routers — bugs which reenforced each other destructively, causing massive instability on the Internet’s global routing mesh. There are reports of instability on other backbones, but from our neck of the Internet woods, UUNet seemed hardest hit. We were able to shut down our connection with UUNet, and balance traffic between our Sprint link and CW (formerly MCI) link. If memory serves, the last Internet outage of this magnitude occured in May of 1997, when a new exchange point on the East coast created a routing loop in the global routing mesh. (They had passed BGP routes between borders using RIP, with exciting results: their router announced that it was easiest to reach a large chunk of the Internet through their exchange point.) We here at Sonic hope that the effects of today’s outage didn’t cause too much frustration; we will be evaluating the outage to determine how we can prevent those effects in future. -Scott & Dane