Fri Jul 2 15:12:16 PDT 2004 — Due to the power failure, our Graton wireless network is currently offline, and we’re working to isolate the cause and get it resolved. Update: The wireless network was brought back online about 15 minutes after this notice was posted. -Dane and Bryan
Brief power event in Santa Rosa.
Fri Jul 2 14:28:52 PDT 2004 — Brief power event in Santa Rosa. During a UPS system test, our Leibert power system had a failure, causing a power outage of about one minute in our Santa Rosa datacenter. The planned test involved cutting utility input supply power to one UPS system, which should have switched to it’s batteries – instead, it broke the master battery cabinet output circuit breaker, forcing the system to bypass.
This caused a brief network outage while routers and servers in Santa Rosa rebooted, and impacted availability of many services as they came back online. Email POP and SMTP are currently unavailable, and are expected to be back online in a few minutes. Some DSL customers (those homed out of Santa Rosa) lost connection briefly and are now coming back online. Customers in San Francisco may have had problems with domain name resolution. Hosted websites were offline briefly while the web cluster and it’s storage arrays rebooted.
We are sorry for the interruption, and are working with Leibert to determine why there was a failure. We have an additional Leibert power system in the planning phases now, and we’ll assure that we use it to provide redundant diverse-input power to systems and equipment which support this in the future, including key routers, servers and storage. -Dane
Update: After complete testing, the technician found that one battery in the Leibert UPS cabinet had failed completely, and one other was not holding its voltage at the full level. The system was reconfigured without these two batteries and was taken out of bypass and put back into online mode. New batteries are on their way, and we’ve implemented periodic testing procedures to check voltage levels for signs of failure to assure that this doesn’t happen again.
Missing Weblogs: Some users webstats may be…
Thu Jul 1 14:59:22 PDT 2004 — Missing Weblogs: Some users webstats may be missing for the past few days. We’ve corrected the error and all new logs should be processed correctly. Unfortunately there is no way to recreate the missing logs. Incidentally, we also resolved a couple of small bugs in Hedgehog and Webalizer that were resulting in missing logs for a couple of users with particular configurations. The new web cluster uses a simplified and robust network based logging system that also allows us to move towards a real-time log processing system for all of our hosted customers. On a completely unrelated note, earlier this morning we shutdown our UUNet DS3 due to their upstream router exhibiting packet buffer exhaustion problems. UUNet resolved the problem and we reenabled the DS3. No one noticed. -Kelsey, Nathan
New test web server up.
Thu Jul 1 13:59:51 PDT 2004 — New test web server up. We have a web server up for test, testweb.sonic.net . This server is running with a copy of our web files, and is not using the production data used by www.sonic.net. The server is not part of our new web cluster, but is only up to ensure that programs operate correctly.
Please direct questions or comments regarding your experiences with the test server to the newsgroup news://news.sonic.net/sonic.help.www , in the thread announcing the new test server.
In conjunction, we are also developing and testing the new web cluster, which is a cluster of three very fast machines, load-balanced for performance and high-availability. Preliminary tests reveal that the cluster is capable of sustained performance of over 14,000 hits/second. -Scott writing — almost all of Operations has their hand in this, especially Kelsey and Nathan.
Router maintenance Tuesday morning.
Fri Jun 25 14:06:29 PDT 2004 — Router maintenance Tuesday morning. On Tuesday, 6/29/04, between 5 and 6 am, we will be upgrading our router at our second Santa Rosa POP. The actual outage should be less than 15 minutes. Dialup modems using the numbers 707-522-1001 and 707-522-1002 will have service interrupted during the outage.
This work is being done as part of Sonic.net’s network upgrade program. It will allow us to install a 45 Mbps connection from our main POP to this one and improve service for the dialup modem pool. It will also allow us to increase our peering connection with ATI, eliminating congestion and latency between our two networks. -Network Operations
Update Tue Jun 29 06:11:14 PDT 2004: Work was completed successfully. Actual downtime was approximately 5 minutes. -Nathan
Our old webmail system, TWIG, has finally…
Mon Jun 21 16:13:51 PDT 2004 — Our old webmail system, TWIG, has finally been turned down. Links have been updated to point to Squirrelmail at webmail.sonic.net/. If you were a user of TWIG and notice an issue with Squirrelmail, please contact support. – Kevan
Dial-up service outage.
Fri Jun 18 14:27:20 PDT 2004 — Dial-up service outage. Equipment failure caused an interruption of dial-up access to some customers who use access numbers ending with 0174. The problem has been corrected and service has been restored. No other access numbers were affected. -Russ
Kernel Upgrades.
Thu Jun 17 11:35:20 PDT 2004 — Kernel Upgrades. We’ve upgraded all of our systems to protect against the latest Linux Kernel exploit announced a few days ago. Since the exploit only led to crashing the system we took our time and resolved a few other problems that we’ve been having with our Kernels and the same time. During the reboots we tickled a few small problems with a misconfiguration on one of our internal IMAP servers used to provide webmail services that led to some intermittent webmail related issues. -Kelsey, Geoff and Kevan
Impending web cluster upgrade.
Tue Jun 15 15:17:57 PDT 2004 — Impending web cluster upgrade. Within the next couple of weeks, the web cluster serving www.sonic.net and all Sonic.net web hosting will be upgraded to a new follow-on web cluster. This will entail a most important change for FrontPage users: all FrontPage server extensions installations will be upgraded to FrontPage 2002 extensions, the latest that have been released. These server extensions are compatible with the Windows FrontPage 2003 client.
We think hosting customers will appreciate having a new, up-to-date web cluster with up-to-date software installed. We are putting the final touches on the cluster, including a new method of load-balancing. We will make another announcement when development and testing are complete, and we have a firm date. -Scott,Nathan,Kelsey,and Geoff
DSL configuration problem.
Fri Jun 11 13:31:47 PDT 2004 — DSL configuration problem. At 12:45 the configuration source file for the San Francisco SMS was corrupted. This caused an outage on that SMS. We had the config restored by 1:00 and customer circuits were back up by 1:30. -John