During an alarm system run, fridge.sonic.net…

Fri Dec 3 09:44:44 PST 1999 — During an alarm system run, fridge.sonic.net rebooted itself causing downtime of email and web space for three minutes and 28 seconds. fridge.sonic.net is internally highly redundant, and we’ve never had an issue like this with it. We’re working with Network Appliance and our alarm vendor to see what might have caused this. We suspect some sort of power surge or RF trouble. -Dane

Covad communications is having a massive…

Fri Dec 3 19:07:17 PST 1999 — Covad communications is having a massive outage which is affecting most of their DSL customers in the state. They say that they’re working fast and hard to resolve this, and I’m sure the stress over there is pretty extreme. They had no ETR available when we spoke with them. -Dane, Jen and Eli

Now that the web logs for multihomes are…

Fri Dec 3 15:15:46 PST 1999 — Now that the web logs for multihomes are centralized you can tail your logfiles. We have provided a simple command for you to do this, ‘twl’ To use ‘twl’ just enter ‘twl yourdomain’ at the shell and then ‘CTRL-C’ to stop tailing the logs. This command will continue to work once the we have the Alteon switch up and running with load balanced web servers. -Kelsey

At around 10am this morning, snapshot data on

Thu Dec 2 17:45:51 PST 1999 — At around 10am this morning, snapshot data on our Network Appliance file server filled up the disk array, causing disk operations to report ‘disk full’ for about 20 minutes. Normal operation was restored by removing snapshot data.

Pacific Bell will be performing maintenance…

Thu Dec 2 00:04:32 PST 1999 — Pacific Bell will be performing maintenance on our primary T3 between 3:00am and 3:15am this morning. Expected impact is a slight slowdown of overall Internet performance during this time as our backup circuits take up the load. PacBell will be swapping in a new card in the SMDS switch in an attempt to clear our performance and latency concerns. In the worse case, if they have issues with the swap and have to backpedal, the circuit will be back online by 3:30am. -Dane

During network reconfiguration that should…

Wed Dec 1 12:01:54 PST 1999 — During network reconfiguration that should have been non-intrusive, our core gateway router, mega.sonic.net crashed and reloaded. We were changing the pathing to re-balance load on our circuit to Cable and Wireless as we work on ongoing latency troubles on our UUNet T3, and this isn’t an operation that should cause any problems in the router.

We’ve opened a support ticket with Cisco, and have asked their tech group to come up with some sort of reason for what happened. Downtime was four minutes and 42 seconds, during which Internet connectivity was now available. We’re very sorry for the interruption, and we’ll work with Cisco to come up with an explanation and to try to come up with some way to achieve more redundancy than we already have in our core structure. -Dane, Scott and Co.

Urchin, the new web log analysis tool, is up…

Wed Dec 1 19:52:11 PST 1999 — Urchin, the new web log analysis tool, is up and running. You may take a look at your multihome’s urchin report by going to www.sonic.net/stats and filling out the form. You will be prompted for the username and password associated with the multihome. The raw web logs have also been updated to include November 30th. Both Urchin and the raw logs are updated once a day around 2:00am. -Kelsey

In the ongoing troubleshooting of the latency

Tue Nov 30 12:00:51 PST 1999 — In the ongoing troubleshooting of the latency on our primary T3, PacBell and UUNet have found a buffer setting on the UUNet end of the circuit which needed to be increased. As we’ve been experiencing less than ideal performance for some time, we elected to make this quick change during the day time. It’s complete, and during the shift, we’ve seen slow performance as the network converges. While convergence is still in process, I am seeing quite good performance, so this is encouraging.

While the network is converging, performance is less than ideal. Thank you for your patience. -Dane

We will be performing maintenance on a number

Tue Nov 30 08:40:39 PST 1999 — We will be performing maintenance on a number of systems on Friday night, beginning at midnight. A number of machines are being upgraded, but most user services will be unaffected. However, serving of websites will be interrupted briefly as we upgrade these machines, and migrate them to a new layer 4 switch. This new switch allows us to balance load between many web servers, and means that we can down a server for maintenance (or have one crash due to hardware or software failure) without any perceived impact from the browsing end-user. This is a great enhancement to our hosting offering, and provides our customers with a really nice degree of redundancy. The L4 switch is Alteon’s AceDirector-3, a fast hardware based application layer switch. This ‘behind the scenes’ investment in our infrastructure will pay off for our customers as we continue to scale our offerings without excessive downtime or performance bottlenecks.