There was a wide spread Internet outage this…

Mon Jun 17 09:01:34 PDT 2002 — There was a wide spread Internet outage this morning. We are still investigating why this happened and will post an update as soon as we have more info. It does appear to have cleared up and all is working well now. -Steve

Update: The trouble affected some T1 and T3 connected customers, plus dialup customers calling our remote POP numbers. Basically, customers and sites connected to our network via mega.sonic.net were impacted because Mega lost it’s ability to peer at a BGP level with it’s upstream routers.

The problem was caused by changes to the edge router structure in preparation for our moves of the coming weeks. Mega had an incorrect upstream route that depended upon one of the two edge routers for BGP peering adjacency. What this meant was that customers downstream from Mega could not communicate with the Internet, but were able to get to local systems without any trouble.

From the perspective of our internal monitoring systems which manage paging of operations staff in case of trouble, everything looked fine. This delayed our response, as from inside the network, everything registered as normal both toward the remote sites via mega, and toward the Internet itself.

As we complete our datacenter move over the next few weeks, we’ll exercise care and planning to assure that problems like this do not affect our customers, and that we’re monitoring well in case of any unexpected trouble. To this end, Eli is working with a remote partner to deploy end to end testing of equipment in our network, so that a problem like this won’t catch us unprepared.

The MOTD is likely to be pretty busy in the next few weeks, as we’ll be posting information about all moves and network changes here in advance, though many will be transparent to most customers. -Dane, Kelsey, Nathan, Scott, Eli

Possible isolated DSL maintenance.

Mon Jun 17 20:34:18 PDT 2002 — Possible isolated DSL maintenance. We’ve noticed some unacceptable packet loss on one of the numerous DSLAMs in the Santa Rosa area, and SBC/ASI has just informed us that they may have to take the DSLAM down for maintenance in the next few hours. This affects a small portion of our Santa Rosa DSL customers, about 100 in all. ASI expects any outages to be short. — Eli, Aaron.

News server reboot.

Sat Jun 15 16:41:30 PDT 2002 — News server reboot. A few minutes ago, our reader news server stopped emitting banners for its services, causing attempts to access the news service to time out. A reboot of the server solved the problem. -Scott

ICMP packets delayed.

Wed Jun 12 14:12:25 PDT 2002 — ICMP packets delayed. A problem with our Black Diamond switch is delaying ICMP packets to parts of the Sonic.net network. This does not impact the delivery of actual data. We are currently watching to see if the problem will go away on its own, or whether or not a reboot of the switch will be necessary. -Scott and Kelsey

Update: Inspection of Netflow statistics for one of our routers revealed that it was locked in a deadly embrace of continuous ICMP packets with the Black Diamond switch. Problem solved. -Nathan, Scott, Kelsey

Mail Server Trouble: Starting at about 3:00PM

Tue Jun 11 09:35:27 PDT 2002 — Mail Server Trouble: Starting at about 3:00PM yesterday we observed and sharp increase in the number of POP3 sessions that were ‘lost’ by our servers. After many hours of troubleshooting last night we believed the problem had cleared itself up. However, this was incorrect- we are still losing about 5% of our POP3 connections for no apparent reason. If you experience any difficulty receiving mail please be patient and try again. We hope to have this resolved shortly -Kelsey and Nathan

UPDATE: We found a faulty network connection which we overlooked last night and have resolved the problems with dropped POP3 sessions.

First diesel generator test run.

Sun Jun 9 23:33:48 PDT 2002 — First diesel generator test run. In a simulated PG&E outage, the Leibert UPS and Detroit Diesel generator at our new 5000 sq. ft. datacenter facility delivered smooth and uninterrupted power. We ran on diesel for a 10 minute test interval, then switched back to utility power. The automatic startup, transfer, return to utility and shutdown of the generator were all handled smoothly by our transfer switch, UPS system and generator.

The generator is a 23.9 liter v-12 twin turbocharged Detroit Diesel, which produces 1024 horsepower and 750 kilowatts of power. We can effectively run forever, just by topping up the fuel tanks as they are used. This assures availability of your server, website or Internet access, regardless of summer black-outs or emergencies. -Dane

Planned Supernews Maintenance: Supernews has…

Fri Jun 7 12:32:09 PDT 2002 — Planned Supernews Maintenance: Supernews has just informed us that they have begun migration from Above.net to Equinix today at Noon. They do not expect any outage during this migration but anticipate degraded performance until they have completed, which should should take all day. Coincidentally we will also be installing some facilities at Equinix in July in order to participate in their public peering fabric. The peering fabric enables us to peer with other networks that have facilities at Equinix allowing for superior overall performance to those networks. -Kelsey

Redback SMS reboot.

Wed Jun 5 08:33:59 PDT 2002 — Redback SMS reboot. Our Redback SMS rebooted for some unknown reason, causing downtime for FRATM, DSL, and Broadlink customers. The system is back up and authenticating interface bindings — we expect all active interfaces to be bound within the next 10 minutes.

We will be emailing the Redback’s crash dump to Redback engineers to determine the cause of the reboot. -Scott and Dane

Update: We had the SMS pre-loaded with a new software release, which had been prescribed to fix the rebooting problem. – Eli

On Friday evening and Saturday, Sonic.net…

Wed Jun 5 16:14:16 PDT 2002 — On Friday evening and Saturday, Sonic.net moved our staff and office network to our new headquarters in Santa Rosa. The move went very smoothly, and we were back up and providing technical support on Sunday morning as expected.

Credit for the smooth move and great new location goes to the site prep and move teams managed by Eli and Jen, and also to JLC Construction, Simons & Brecht Architectural, CloudBuzz Cabling, Honeywell Security, Source One Communications, McClure Electric, CalAir HVAC, Stockham and Parker construction, Secure Access Portals, Marin Flooring and Woodforms.

Thanks, everyone! Extra thanks to helpers Sean Franklin, Dustin Mollo, Jen Musil and Chris Schoenfeld. Thanks to Carl Schneider and Valley Roofing for the use of their amazing truck.

Please visit us at our new location at 2260 Apollo Way in Santa Rosa!

Datacenter moves start next week, and we’ll make notes here in the MOTD as things shift over.

-Dane