Month: December 2000

We’re having ongoing problems this morning…

Fri Dec 29 11:18:55 PST 2000 — We’re having ongoing problems this morning with our primary authentication server. About 20% of the time, it’s failing to authenticate customer logins. This affects dialup, mail and shell access, as well as web based member tools. We are working to reduce the workload on the primary server in an attempt to resolve this problem, but haven’t had much luck. This issue has been slowly building for a couple weeks, and is quite a bit worse today.

Kelsey, Eli, Steve and Russ are working to bring online two new authentication servers that have been in the works for some time. The two new machines will be much, much faster then the current configuration, and will be load balanced by the Alteon L4 switches for full redundancy. We’re hoping to wrap this up late this afternoon, but the final deployment may end up happening this weekend due to testing overhead.

If you have authentication failures, please do simply try again. We’re sorry for the inconvenience this causes! If you find that after multiple attempts, you still cannot access the service, please contact support ASAP at 707-547-3400 and them know. -Dane

PacBell reports a switch module has gone down

Fri Dec 29 09:46:16 PST 2000 — PacBell reports a switch module has gone down in the Napa CO, this is causing our main Napa dialup number to return busy signals. PacBell has assured me that this is getting top priority and should be fixed shortly. Meanwhile we do have redundant dialup access for Napa. You can find an alternate dialup number by by checking our pop finder tool at www.sonic.net/cgi-bin/pops.pl or by calling tech support at 707-547-3400. – Steve

This afternoon, we’ve isolated the issue with

Fri Dec 29 17:51:23 PST 2000 — This afternoon, we’ve isolated the issue with the Radius code and fixed it, the last few hours of monitoring show that it is working well. We will deploy the two new Authentication servers next week after more development, but the existing configuration should serve well until the servers are ready. – Russ, Kelsey, Eli

We had a carrier transition on our T3 to…

Wed Dec 27 15:42:10 PST 2000 — We had a carrier transition on our T3 to UUNet, causing a few minutes of network instability while the Cable & Wireless T3 took up the load- The redundant configuration of our network prevented this from being a serious issue. – Eli

Night Ops Complete: Our NetApps, freezer and…

Thu Dec 21 03:20:29 PST 2000 — Night Ops Complete: Our NetApps, freezer and icebox, are now configured in a cluster such that if a head unit fails, the other will seamlessly take over it’s NFS duties. Like the Alteon’s load balancing of all email, ftp and web, we now have redundancy on the NFS back-end. Thanks to Dan from NetApp we have also thoroughly tested and proven that the cluster fail-over works properly.

It should be noted that both NetApps had to be halted during the installation of the additional hardware needed to support clustering. During this time, inbound email queued locally on each mail server, but pop services were offline, along with web, ftp and shell. The total service outage extended from 12:15 AM to about 1:00 AM.

Our two new Cisco routers are on site and we’re preparing to migrate to an active-active dual router configuration using Cisco’s HSRP (similar to VSRP) protocol. Once we have finished the migration to the new Ciscos we will have full end-to-end redundancy in our core network and for all of our core services.

We also replaced our Redback’s SMS 1000 loaner with our new SMS 1800 which has a much greater capacity for expansion over our old SMS 500. The SMS terminates all PacBell and Broadlink DSL service on our network.

Lily, the T3 MUX, had it’s primary controller restored (from the last night ops.) So once again, lily is internally redundant. We also completed some routine maintenance and reorganization of our NOC and some of our core servers.

-Kelsey, Steve, Nathan, Russ, Matt, Jared, Jeff, and the guys from NetApp.

We will be installing our new netapp in a…

Wed Dec 20 13:46:06 PST 2000 — We will be installing our new netapp in a redundant configuration, we have the netapp engineers here to help us get this up with minimal down time. There may be a short interruption of local services between 12:00am and 1:30am tonight. Also we will be upgrading our SMS1000 to a new SMS1800, this is our Redback DSL router so DSL service will be interrupted for about 15 minutes at 1:30am tonight. Thanks -Kelsey, Steve, Jeff and the NetApp crew.

Our 1003 dialup group started to return…

Mon Dec 18 22:17:16 PST 2000 — Our 1003 dialup group started to return intermittent ‘All circuits are busy’ messages. It required a reboot of one of our NAS servers. This caused a small amount of people to be bumped off. After the reboot the error message went away. -Steve and Eli