Sonic Status admin |

We apologize that this update didn’t make it…

September 12, 2000

Wed Sep 12 13:33:35 PDT 2000 — We apologize that this update didn’t make it into the MOTD sooner. The Broadlink problem was solve by swapping in a new primary ATM router, and we were all impressed with the speed with which Shane synced the new router’s config with the old and brought it online. -Shane(BL), Dane, John (BL), and Scott

Tonight at midnight, we will be doing…

September 11, 2000

Mon Sep 11 18:33:46 PDT 2000 — Tonight at midnight, we will be doing upgrades on our core router, mega.sonic.net and on the machine serving additional email boxes, mailbox.sonic.net, aka tsunami.sonic.net.

Mega’s upgrade will include a new operating system, and it will be rebooted to load this OS. Downtime is expected to be around three minutes, during which Internet access will not be available. Local email, news, shell and other services should not be affected.

Tsunami’s upgrades include RAM and a faster network card, and downtime will be around 15 minutes, during which time add-on email boxes will be unavailable. We will also be migrating user data from the local disk over to the redundant Network appliance equipment. During this migration, some email delivery may be delayed for a few minutes. -Dane (for Scott and Kelsey)

UUNet problem solved.

September 10, 2000

Sun Sep 10 16:35:00 PDT 2000 — UUNet problem solved. UUNet engineers reset some ATM interfaces in San Francisco, and the problem went away. We noticed the problem because we constantly monitor the path over UUNet between here at Sonic.net and ora.com in Sebastopol. Members probably didn’t notice, as this is a path that we only use to talk to a few sites — traffic to the Internet in general takes a different path. -Scott and Dane

Intermittent, brief periods of high latency…

September 10, 2000

Sun Sep 10 15:28:33 PDT 2000 — Intermittent, brief periods of high latency and packet loss on UUNet’s San Francisco network. We are seeing fleeting amounts of high latency and packet loss on UUNet’s San Francisco network, and we are investigating the problem with UUNet. -Scott

(707) 929-9816 and 17 “not in service”.

September 10, 2000

Sun Sep 10 15:22:28 PDT 2000 — (707) 929-9816 and 17 “not in service”. Our Cobb Mountain tech support and sales lines — 929-9817 and 929-9816 — are giving a “not in service” recording. We are investigating the problem with the phone company serving that number. Note that both Cobb Mountain modem lines (929-9811 and 993-0174) are fine. -Scott, Chris, and Kevan

Excessive disk usage on /home.

September 10, 2000

Sun Sep 10 13:08:36 PDT 2000 — Excessive disk usage on /home. Between 10am this morning and 12:59 this afternoon, the “/home” filesystem was filled up — shell users saw this as a “no space left on device” error. Impact: shell users couldn’t write data to the filesystem. This includes procmail processes running on a user’s behalf that filter mail into individual files under their /home directory. (If you use procmail in this manner, your mail may have been delayed.)

After executing a “quota resize” on our Network Appliance (NetApp) to clear the condition, we found the reason for the excessive disk usage on /home: -rw——- 1 culprit user 1402852697 Sep 10 12:59 ErrorLog While 1.4 GB files are not normally allowed for members, investigation of the culprit’s shell environment reveals that he had selected options that, as a side effect, did not implement the standard file size checks for user files. This will be corrected shortly.

Additionally (and much to our chagrin), the monitoring tool that pages us when free space on NetApp filesystems shrinks too low didn’t detect the problem. So, though the tool detects physical space problems, the part that checks free space for NetApp quota trees isn’t working properly — and we didn’t know that was the case because we’ve never had a quota tree fill up while using the tool. This, too, will be corrected shortly.

Finally, we apologize to anyone who noticed the problem, most especially those who contacted tech support and got a wrong answer. We will review our processes — both computer and human — to ensure this doesn’t happen again. -Scott, Eli, and Dane

Our 1003 dial group was returning a reorder…

September 9, 2000

Sat Sep 9 18:36:31 PDT 2000 — Our 1003 dial group was returning a reorder tone for about 15 minutes; it seems that one of the T1 PRI cards lost it’s tiny mind, I’ve swapped it with the very last card in the group and a reboot seems to have fixed it. -Dane

Fiber Cut in Bay Area.

September 8, 2000

Fri Sep 8 11:34:57 PDT 2000 — Fiber Cut in Bay Area. Global Crossing (gblx.net) has experienced what has been termed a “catastrophic” fiber cut. Multiple circuits are effected, and there is the possibility that multiple ISP’s are affected. Sonic.net members may experience high latency and packet loss to gblx-connected sites while gblx’s network is repaired. According to our “Internet-weather” monitoring, neither UUNet or Cable & Wireless are affected, nor is Sonic.net. -Scott

BroadLink will be doing radio work that may…

September 8, 2000

Fri Sep 8 16:50:42 PDT 2000 — BroadLink will be doing radio work that may affect customer links this evening. Customers may experience a brief interruption or degraded service levels at 11 p.m. tonight while equipment is exchanged. Downtime is expected to be approximately fifteen minutes. If you are experiencing degraded service levels outside this time-frame please contact support@broadlink.com. -Dane and Anna

BroadLink found that a network switch had…

September 7, 2000

Thu Sep 7 14:19:32 PDT 2000 — BroadLink found that a network switch had locked up, and a reboot fixed the trouble. Total downtime for BroadLink customers was 15 minutes. They will be working with the manufacturer to see if a cause can be pinned down. There has been no previous trouble with this equipment, and it’s connected to a remote power management unit, so it can be recycled from BroadLink’s offices. -Scott, Dane, Kelsey, Eli and Shane (the doughnut guy)