Month: July 2002

DNS server issues.

Tue Jul 16 23:51:50 PDT 2002 — DNS server issues. We experienced hardware problems with our secondary DNS server which caused mail servers to refuse connections for about 5 minutes. Our mail servers use this secondary for their primary DNS server as it is more available. They fail over to the primary nameserver when the secondary fails. However, the health check that the mail servers perform to determine if DNS is available didn’t identify this error. We are reworking the health check algorithm to avoid this problem in the future. The server has been taken offline and the nature of the problem is being investigated.

We also experienced a DNS configuration error this morning which caused some mail delivery to be delayed. The problem was resolved as soon as it was discovered. -Matt, Nathan, Dane and Kelsey

On Tuesday during the day, we’ll be turning…

Mon Jul 15 22:54:14 PDT 2002 — On Tuesday during the day, we’ll be turning up our new 100Mbps fiber link to Equinix, and deploying a Cisco 7206 and other equipment there. If all goes well, we’ll be moving our Cable & Wireless link from downtown Santa Rosa to San Jose sometime shortly afterwards.

On Wednesday morning at around 3am, we will be moving Focal (9811) dialup lines from Santa Rosa to San Francisco. Expected downtime is ten to fifteen minutes. No, we don’t drive that fast, we’ll have extra equipment in place in San Francisco to allow for a quick transition. Matt, Steve and Russ will be doing this migration for us. -Dane

Update: Wed Jul 17 07:31:10 PDT 2002 — Night Operations complete. The Focal dial-up lines were successfully moved to our San Francisco POP. There was less than 5 minutes of downtime during the move.

Update: Wed Jul 17 16:22:53 PDT 2002 — As a part of the move we renumbered the xxx-9811 dial pools NAS servers. A minority of ISDN routers may have to have their gateway address changed in order to function properly. -Matt, Nathan, Russ and Steve

ssl.sonic.net, our shared secure web server…

Mon Jul 15 00:18:40 PDT 2002 — ssl.sonic.net, our shared secure web server has had a system disk failure. We are in the process of working on the on the system now and should have services fully restored in a few hours. Only the core OS is affected by this disk failure, all customer data is stored on our redundant Netapp cluster. – Kelsey and Nathan

Update: ssl.sonic.net has been completely restored on a new disk and it’s data has been verified by our backup software. It’s been up on the new hardware for over an hour and appears to be working properly.

There seems to have been a power outage in…

Sun Jul 14 02:53:44 PDT 2002 — There seems to have been a power outage in Northern Santa Rosa, and we’ve noticed a number of T1 and DSL customers went offline. The T1 and ATG DSL customers now seem to be back online, but we have not been able to verify that the PacBell DSL customers are back up. Please let us know ASAP if you have any problems with your connection due to this event. -Dane, Matt and Russell (Xponentia)

Our move of primary core storage and servers…

Sun Jul 14 02:47:37 PDT 2002 — Our move of primary core storage and servers has been delayed due to problems getting the tape backup completed, indexed and verified. The full backup started at 5am Saturday, and as of 2am Sunday, is complete but not yet fully indexed, so it cannot be tested. For this reason, we’ve postponed the move of the storage array and all associated servers until the same time next weekend.

Instead, we’ll be doing a number of clean-up activities this evening, many of which are slightly service affecting. Steve will be doing a bit of tidying and rearranging of the 1003 dial pool, and customers will be disconnected as he moves equipment around. Nathan is planning to swap a T3 port adapter in one of the Cisco 7507s here for a HSSI, and expects about 30 seconds of inability to reach sites connected via Cable & Wireless. -Dane, Matt, Steve, Nathan and Kelsey

News Server Updates: We’ve just completed a…

Sat Jul 13 04:09:42 PDT 2002 — News Server Updates: We’ve just completed a software upgrade to bring us up to the latest stable version of Typhoon, the news server software we use on our news reader box. We’ll be keeping a close eye on the news server over the next few days to see if this resolves the problems that it has been having this past week. -Kelsey UPDATE: The problems appear to have been resolved by the new version of Typhoon. It’s ran problem free for more than 12 hours. -Kelsey

Night Op – We’re planning to move our core…

Fri Jul 12 01:24:53 PDT 2002 — Night Op – We’re planning to move our core disk storage architecture on Sunday morning, with downtime beginning at about 1am.

Our Network Appliance F740 network file system cluster is the basis of Sonic.net’s storage solution. All user data resides on the two NetApp filers, and they’re configured in a completely redundant configuration. The drives are dual-channel fiber arbitrated loops, and are served by redundant processor heads. The units are RAID level four, WAFL filesystems, and include redundant power and cabling to all disks and the network itself at Gigabit speeds.

The move of the disk shelves themselves will be a service affecting move, and during the time that they are in transit, local web and ftp hosting, email and shell will be unavailable. Dialup, DSL and web browsing will be unaffected. We expect the downtime to be between an hour and an hour and a half, beginning just after 1am on Sunday morning. Actual server moves of redundant systems will begin at 11pm Saturday night, but these changes should be transparent.

The move to the new datacenter is nearing completion for Sonic.net equipment, and it’s been a good opportunity to redesign a number of network elements. Downtime has been very brief, and we appreciate your patience with any interruptions noted in the MOTD. -Dane

Night Op – Redback SMS and DSL customer move.

Wed Jul 10 18:58:53 PDT 2002 — Night Op – Redback SMS and DSL customer move. Tonight we will be moving the ATM DS3 which terminates Pac Bell DSL and FRATM customers. It is scheduled to take place at 3am and should last about 30-40 minutes as we relocate the equipment to the new data center.

Update: The move of the RedBack SMS DSL router and PacBell DSL customers is complete. Downtime was about twenty minutes. Pacific Bell did a great job of moving this circuit quickly and efficiently, and we had a very smooth transition of the equipment. -Matt, Dane, Eli, Kelsey, Nathan and Mike(2) from PacBell

News Server Issues: news.sonic.net, our NNTP…

Wed Jul 10 16:41:12 PDT 2002 — News Server Issues: news.sonic.net, our NNTP reader server has been experiencing stability problems for the past few days. This instability results in periodic refused connections as the server process reinitializes. We have been unable to find the cause and are in the process of working with the software vendor to resolve this as soon as possible. -Kelsey

Update: We are still experiencing trouble with the news server. The vendor has recommended a version upgrade and is currently analyzing our cores. We will attempt the upgrade as soon as we reach an appropriate maintenance window. -Kelsey

Web performance impacted.

Tue Jul 9 11:11:43 PDT 2002 — Web performance impacted. A denial-of-service attack is affecting the performance of one of our web servers; this may result in slower response times when loading pages hosted at Sonic.net. Our operations crew is resolving the issue and web performance should return to normal shortly. We maintain a pool of redundant, load-balanced web servers, which greatly reduces the severity of problems of this type. -Eli and Russ