Sonic Status admin |

We’re seeing continued packet loss and…

December 6, 1999

Mon Dec 6 17:56:30 PST 1999 — We’re seeing continued packet loss and latency on our T3 circuit to UUNet, which is our primary connection to the Internet. We’ve shifted some traffic toward our backup circuits, but this ongoing trouble is certainly impacting interactive performance. UUNet and PacBell are continuing their work on finding and eliminating this trouble, and we all appreciate your patience. -Dane

During our night operations, we moved most of

December 4, 1999

Sat Dec 4 10:19:04 PST 1999 — During our night operations, we moved most of our core systems from our old Extreme Networks Summit 48 to the huge new Black Diamond switch. All of the systems except one were correct in auto-negotiating their full or half duplex 100 megabit fast ethernet settings. mega.sonic.net, our core Cisco router did not properly negotiate it’s link speed with the new switch. Because these settings were out of sync, as traffic increased into the AM hours, CRC errors began occuring on that connection, causing packet loss. I’ve manually overridden the auto-negotiated setting of half duplex in the switch, and will persue this with the switch vendor to figure out why the systems are failing to negotiate. The impact here has been slower than usual Internet connections this AM, and this has been resolved. Sorry about any inconveniance this may have caused!

The new Black Diamond switch is huge, with capacity to support up to 256 connections to systems running at 100 megabit at full speed. It’s non-blocking architecture allows for up to 64 gigabits of traffic in total, or full wire speed on all circuits all at once. Gigabit ethernet on fiber is also supported for up to 48 ports which could be connected to other large switches or directly to huge servers with high bandwidth needs. This is the largest enterprise network switch that we’re aware of on the market, and it provides Sonic.net with a great new core structure. -Dane

We completed our night operations and systems

December 4, 1999

Sat Dec 4 05:29:59 PST 1999 — We completed our night operations and systems maintenance here at around 5am. There were a few brief and isolated service outages of individual features between 1am and 4am as we moved machines and connections in our data center. These changes included a memory upgrade for thunder.sonic.net, a new power plant for the 1001 dial group and a migration of most core systems to our new Ethernet switch. G’night! -Dane, Scott, Dave, Kelsey, Tony and Evan

During an alarm system run, fridge.sonic.net…

December 3, 1999

Fri Dec 3 09:44:44 PST 1999 — During an alarm system run, fridge.sonic.net rebooted itself causing downtime of email and web space for three minutes and 28 seconds. fridge.sonic.net is internally highly redundant, and we’ve never had an issue like this with it. We’re working with Network Appliance and our alarm vendor to see what might have caused this. We suspect some sort of power surge or RF trouble. -Dane

Covad communications is having a massive…

December 3, 1999

Fri Dec 3 19:07:17 PST 1999 — Covad communications is having a massive outage which is affecting most of their DSL customers in the state. They say that they’re working fast and hard to resolve this, and I’m sure the stress over there is pretty extreme. They had no ETR available when we spoke with them. -Dane, Jen and Eli

Now that the web logs for multihomes are…

December 3, 1999

Fri Dec 3 15:15:46 PST 1999 — Now that the web logs for multihomes are centralized you can tail your logfiles. We have provided a simple command for you to do this, ‘twl’ To use ‘twl’ just enter ‘twl yourdomain’ at the shell and then ‘CTRL-C’ to stop tailing the logs. This command will continue to work once the we have the Alteon switch up and running with load balanced web servers. -Kelsey

Snapshot data filled up the Network Appliance

December 2, 1999

Thu Dec 2 21:36:21 PST 1999 — Snapshot data filled up the Network Appliance just now. We are adding another disk to the array tonight, which should prevent further problems. -Scott and Dane

At around 10am this morning, snapshot data on

December 2, 1999

Thu Dec 2 17:45:51 PST 1999 — At around 10am this morning, snapshot data on our Network Appliance file server filled up the disk array, causing disk operations to report ‘disk full’ for about 20 minutes. Normal operation was restored by removing snapshot data.

Pacific Bell will be performing maintenance…

December 2, 1999

Thu Dec 2 00:04:32 PST 1999 — Pacific Bell will be performing maintenance on our primary T3 between 3:00am and 3:15am this morning. Expected impact is a slight slowdown of overall Internet performance during this time as our backup circuits take up the load. PacBell will be swapping in a new card in the SMDS switch in an attempt to clear our performance and latency concerns. In the worse case, if they have issues with the swap and have to backpedal, the circuit will be back online by 3:30am. -Dane

During network reconfiguration that should…

December 1, 1999

Wed Dec 1 12:01:54 PST 1999 — During network reconfiguration that should have been non-intrusive, our core gateway router, mega.sonic.net crashed and reloaded. We were changing the pathing to re-balance load on our circuit to Cable and Wireless as we work on ongoing latency troubles on our UUNet T3, and this isn’t an operation that should cause any problems in the router.

We’ve opened a support ticket with Cisco, and have asked their tech group to come up with some sort of reason for what happened. Downtime was four minutes and 42 seconds, during which Internet connectivity was now available. We’re very sorry for the interruption, and we’ll work with Cisco to come up with an explanation and to try to come up with some way to achieve more redundancy than we already have in our core structure. -Dane, Scott and Co.