Month: July 2004

New web cluster’s test data changed slightly.

Fri Jul 16 16:17:19 PDT 2004 — New web cluster’s test data changed slightly. As announced, we have mass-updated all Berkeley DB files in the new web cluster’s test volume. We then tested the results with a previously-installed “Movable Type” blog installation, which worked as expected.

We then emailed all detected users of Berkeley DB files on our web servers. Web sites can be tested at testweb.sonic.net — please respond with your experiences at news://news.sonic.net/sonic.help.www . Thank you. -Scott and the Sonic.net Operations crew

Packet Loss from Santa Rosa to San Jose.

Tue Jul 13 18:02:03 PDT 2004 — Packet Loss from Santa Rosa to San Jose. Our 100mbit circuit from Santa Rosa to San Jose is experiencing substantial packet loss. We are working with the provider of the circuit and hope to have the situation resolved shortly. -Kelsey and Nathan

Update Tue Jul 13 20:20:25 PDT 2004: Service is fully restored. The carrier reports a piece of equipment in Mountain View failed. Technicians were dispatched and the gear was repaired shortly after 8:00pm.

To prevent issues such as this in the future, we started construction of a new gigabit-speed fiber optic ring connecting our Santa Rosa headquarters, San Francisco, and San Jose approximately 4 months ago. This ring, scheduled for completion by the end of September, will enable us to withstand single circuit outages between these POPs without any service degradation. Also on order is a new link connecting both of our San Francisco POPs. Once these upgrades are in place, all of our major points of presence will be fully redundant. -Nathan and the rest of Sonic.net Operations

New web cluster online for testing.

Tue Jul 13 14:10:56 PDT 2004 — New web cluster online for testing. Since Friday afternoon, we have been pointing testweb.sonic.net at the new web cluster — so far, so good.

We are currently preparing emails to be sent out to various classes of web server users, including those who use cgi-bin, Microsoft FrontPage, and Berkeley DB. As we draw nearer to deployment, we would encourage web server users to test their sites at testweb.sonic.net and let us know how it goes in news://news.sonic.net/sonic.help.www . Thank you. -Scott, Kelsey, Nathan, and the rest of Operations

Web server testing continues apace.

Fri Jul 9 13:45:22 PDT 2004 — Web server testing continues apace. We are continuing to test our new web servers, which may result in occasional outages to “testweb.sonic.net”.

We hope to have the “testweb” name swung over to the new cluster later today, so that folks can test the cluster this weekend. -Scott, Kelsey, Nathan, and the rest of Operations

New web server test continues.

Tue Jul 6 12:05:17 PDT 2004 — New web server test continues. We are continuing to test our new web server environment at testweb.sonic.net . Already, we have found some differences in the php environment between both servers that needed to be addressed, as well as installed a mysql support module for Perl.

So far, we have had very little comment about the new web server environment. If you are concerned about the up-and-coming web cluster, working with your data, we would encourage you to try the new server at testweb.sonic.net and tell us what you think in news://news.sonic.net/sonic.help.www (in the thread announcing the new web server). We would especially appreciate reports about the FrontPage 2002 extensions. Thanks! -Scott, Kelsey, Nathan, and the rest of Operations

Due to the power failure, our Graton wireless

Fri Jul 2 15:12:16 PDT 2004 — Due to the power failure, our Graton wireless network is currently offline, and we’re working to isolate the cause and get it resolved. Update: The wireless network was brought back online about 15 minutes after this notice was posted. -Dane and Bryan

Brief power event in Santa Rosa.

Fri Jul 2 14:28:52 PDT 2004 — Brief power event in Santa Rosa. During a UPS system test, our Leibert power system had a failure, causing a power outage of about one minute in our Santa Rosa datacenter. The planned test involved cutting utility input supply power to one UPS system, which should have switched to it’s batteries – instead, it broke the master battery cabinet output circuit breaker, forcing the system to bypass.

This caused a brief network outage while routers and servers in Santa Rosa rebooted, and impacted availability of many services as they came back online. Email POP and SMTP are currently unavailable, and are expected to be back online in a few minutes. Some DSL customers (those homed out of Santa Rosa) lost connection briefly and are now coming back online. Customers in San Francisco may have had problems with domain name resolution. Hosted websites were offline briefly while the web cluster and it’s storage arrays rebooted.

We are sorry for the interruption, and are working with Leibert to determine why there was a failure. We have an additional Leibert power system in the planning phases now, and we’ll assure that we use it to provide redundant diverse-input power to systems and equipment which support this in the future, including key routers, servers and storage. -Dane

Update: After complete testing, the technician found that one battery in the Leibert UPS cabinet had failed completely, and one other was not holding its voltage at the full level. The system was reconfigured without these two batteries and was taken out of bypass and put back into online mode. New batteries are on their way, and we’ve implemented periodic testing procedures to check voltage levels for signs of failure to assure that this doesn’t happen again.

Missing Weblogs: Some users webstats may be…

Thu Jul 1 14:59:22 PDT 2004 — Missing Weblogs: Some users webstats may be missing for the past few days. We’ve corrected the error and all new logs should be processed correctly. Unfortunately there is no way to recreate the missing logs. Incidentally, we also resolved a couple of small bugs in Hedgehog and Webalizer that were resulting in missing logs for a couple of users with particular configurations. The new web cluster uses a simplified and robust network based logging system that also allows us to move towards a real-time log processing system for all of our hosted customers. On a completely unrelated note, earlier this morning we shutdown our UUNet DS3 due to their upstream router exhibiting packet buffer exhaustion problems. UUNet resolved the problem and we reenabled the DS3. No one noticed. -Kelsey, Nathan

New test web server up.

Thu Jul 1 13:59:51 PDT 2004 — New test web server up. We have a web server up for test, testweb.sonic.net . This server is running with a copy of our web files, and is not using the production data used by www.sonic.net. The server is not part of our new web cluster, but is only up to ensure that programs operate correctly.

Please direct questions or comments regarding your experiences with the test server to the newsgroup news://news.sonic.net/sonic.help.www , in the thread announcing the new test server.

In conjunction, we are also developing and testing the new web cluster, which is a cluster of three very fast machines, load-balanced for performance and high-availability. Preliminary tests reveal that the cluster is capable of sustained performance of over 14,000 hits/second. -Scott writing — almost all of Operations has their hand in this, especially Kelsey and Nathan.