Sun Apr 13 02:02:35 PDT 2003 — Night Ops Complete: All work went as planned. We have replaced the AD3 with one of the ones from the new AD3 cluster earmarked for the new mail cluster. So far it appears to be functioning perfectly. Apparently the old AD3 was suffering from a hardware failure; one which none of the Nortel support reps we’ve talked to have ever seen or heard of. We will be keeping a close eye on the new box; we were able to ‘fix’ the old AD3 a number of times only to have it fail again one week or even a month later. Recent experience aside, the Alteon AD3 has been a very reliable and trusted component of our network. -Kelsey and Nathan
Night Ops: Sunday, at 12:00AM, we are going…
Sat Apr 12 13:16:03 PDT 2003 — Night Ops: Sunday, at 12:00AM, we are going to replace the AD3 that is currently having trouble with another AD3 that is currently deployed in an Active-Standby cluster with a partner for the new mail server cluster. We’ve been reluctant to swap hardware until we had discussed the situation with Nortel who now owns the Alteon product line. Any downtime experienced during the swap to the new AD3 should be less then the periodic outages that we are currently seeing.
We are also taking this opportunity to reboot our Netapp NFS cluster in order to fix a minor annoyance that needs a reboot to clear. They should be down for no more than a few minutes. At the same time the NetApps are being rebooted we will also be swapping the shell server’s disk back into its own chassis to free up the spare server for some other tasks. -Kelsey and Nathan
Trouble with the Alteon: The AD3 switch that…
Fri Apr 11 03:42:22 PDT 2003 — Trouble with the Alteon: The AD3 switch that currently handles load balancing our web and mail services has stopped co-operating with the Cisco switches that are on either side of it. The resulting problem, like before, is intermittent packet-loss which may result in poor performance while trying to send or receive mail or accessing a locally hosted web page. We’ve already replaced all of the hardware on all sides of the AD3 in attempts to resolve this problem but we’ve only succeeded in ‘fixing’ it for a week or two at a time before the problem shows up again. We are considering all options available; resolving this chronic problem in both the short and long term is our highest priority. -Kelsey and Nathan.
Authentication problems, xxx-9811 pool.
Fri Apr 11 14:30:05 PDT 2003 — Authentication problems, xxx-9811 pool. Shortly after 1:00pm today, we stopped authenticating successfully for clients dialing our xxx-xxx-9811 numbers. Support was closed for our weekly meeting, and a (soon to be corrected) fault in our monitoring system prevented us from fixing it as quickly as we could have. Eli, Scott, Russ.
Our SpamAssassin cluster experienced some…
Thu Apr 10 19:59:38 PDT 2003 — Our SpamAssassin cluster experienced some load related failures this afternoon while we had taken one of the three servers offline in preparation to add the fourth dedicated SpamAssassin server tomorrow morning. The addition of the fourth server will restore redundancy to the cluster and should help limit the occasional email that passes through the cluster unchecked. -Kelsey and Kevan.
At 8am this morning, our LATA-9 (Stockton)…
Wed Apr 9 08:00:00 PDT 2003 — At 8am this morning, our LATA-9 (Stockton) DSL concentrator lost connectivity to the Internet for approximately 20 minutes. It turns out the Stockton facility deployed a new layer 3 switch, which caused difficulties with our router. Only LATA-9 DSL customers were affected. -Scott, Nathan, and Dane
Site Wide Sendmail Upgrades: We are busy…
Mon Mar 31 13:05:55 PST 2003 — Site Wide Sendmail Upgrades: We are busy rebuilding sendmail for all of our servers to patch the second severe sendmail exploit released this month. Most of our internal servers have been updated as well as bolt. Some users may have experienced some trouble sending mail from bolt while the new binary was being installed and the configuration was getting properly tweaked. The updates will be completed within an hour on the rest of our servers. -Kelsey and Russ
PacBell DSL services restored.
Sun Mar 30 21:56:52 PST 2003 — PacBell DSL services restored. The SMS-1800 DSL termination hardware failed, and additionally failed to reboot on it own. Actually, it took quite a bit of coaxing to get it to boot properly; it’s FE (Forwarding Engine) failed to initialize until we removed one of it’s redundant power supplies that had been quietly complaining for the past few days. We replaced the power supply with a spare and all appears to be well. Please note: our LATA9 DSL subscribers were not affected by this outage and are terminated on a SMS 500 colocated in Stockton. -Kelsey and Nathan
PacBell DSL outage – The router that supports
Sun Mar 30 20:30:02 PST 2003 — PacBell DSL outage – The router that supports our Pacific Bell DSL customers rebooted about 25 minutes ago, and has not come up properly. We are currently looking into the situation, and will restore service/update this space ASAP. – Eli, Kelsey, Scott
We are down a PRI (23 modems) on one of our…
Thu Mar 27 21:25:32 PST 2003 — We are down a PRI (23 modems) on one of our access servers that handles 522-1001, 522-1002 as well as some other currently unpublished numbers. We’ve received some reports of busy’s and are working to get the apparently failed line card back online. At this moment, we have plenty of free capacity at our other POPS. We expect busy’s to be resolved shortly as we are going into off-peak hours and should have the failed card back up soon. -Kelsey and Russ.