Month: September 2001

Night Operations Complete: All scheduled…

Wed Sep 5 03:33:33 PDT 2001 — Night Operations Complete: All scheduled upgrades and maintenance has been completed with minimal customer impact. We appear to have restored FC-AL loop integrity on the NetApp filers and have restored full clustering support. Tsunami has been upgraded and is working well on its new hardware. -Nathan and Kelsey

Night Operations: Tonight starting at…

Tue Sep 4 16:33:44 PDT 2001 — Night Operations: Tonight starting at midnight we will be performing some maintenance and upgrades to icebox, one of our NetApp filers, and tsunami, the server which handles mailbox accounts. We will also be upgrading and rebuilding the RAID in gale, our news feeder server. This is not a customer impacting event. Tsunami is getting a new motherboard and more RAM to resolve its ongoing performance issues. After the upgrade is complete, mailbox users should see a significant increase in performance. The upgrade should only take approximately 15 minutes to complete. Icebox is getting a new FC-AL adapter which will hopefully resolve the Fibre Channel Loop problems that we began seeing last week. The new LRC, which we replaced on one of the disk shelves on Saturday did not resolve the trouble. While we are replacing the FC-AL adapter icebox will be shutdown and all web data which it serves will be unavailable. Additionally, our administrative SQL server, which uses icebox for it’s database storage will offline for the duration of the maintenance. We expect that it should take approximately 20 minutes to replace and test the new FC-AL adapter. We will start the maintenance on icebox at 1:00AM. -Kelsey and Nathan

Routing hiccup.

Tue Sep 4 16:07:16 PDT 2001 — Routing hiccup. We were resolving a minor BGP configuration error in our three core routers, mega, gamma, and delta, and unexpectedly prevented customers connected via mega from being able to access anything outside of our network. The problem existed for approximately ten minutes before it was, brought to our attention. This effected T3, T1, Frame Relay, and remote pop users. We’ve identified the problem that led to this and are working to fix it so we can go back to fixing BGP. We apologize for this brief service interruption. -Nathan, Scott, Eli and Kelsey.

News Server downtime.

Mon Sep 3 18:33:35 PDT 2001 — News Server downtime. News.sonic.net was offline between 5:25p and 6:25p this afternoon. The NNTP software package has been failing intermittently over the last 2-3 days, and in this case a delay in our notification package led to excessive downtime. We’ve tightened up monitoring, and will continue to work out the issues with the Typhoon software. – Eli, Kelsey

UPDATE: Typhoon, the news server software, has decided that it’s best action is to exit and dump core. We are in contact with the software vendor and waiting for a call back. Gale, our news feeder server is capable of back-logging about 6 hours of news for typhoon so as long as we are able to get it up within that time frame, no news posts should be lost. – Kelsey and Eli

UPDATE: We are seeing what appears to be some overview database corruption on typhoon. This manifests itself as garbled subject lines and/or the wrong post being returned for a given message id. In the two years that we’ve been running typhoon we’ve never seen anything like this before, in fact, the server had an uptime of 282 days until tonight. We are going to continue to work with the software vendor for a prompt resolution to this problem which will hopefully not involve the loss of our historical news spools. In the meantime if a group that you are reading exhibits the corruption use our backup news server, “supernews.sonic.net” in place of “news.sonic.net” -Kelsey, Dane, Nathan and Eli.