One of our authoritative name servers, b.auth-ns.sonic.net, had a hardware failure last Friday and services were restored on temporary hardware shortly thereafter. (We don’t think these kinds of non-service impacting events warrant an MOTD.) Rather than fix the original hardware, we decided it was a good opportunity to further increase the geographic and network diversity of our authoritative name servers and have moved it to a facility in Texas. Our authoritative name servers are now located in three different networks, all with IPv6, in California, Texas, and New York. -Kelsey
News Service Interuption
The article numbering server in our news cluster suffered a catastrophic failure a couple of hours ago. All services have been restored on to the backup server but there may be some delay in new articles as the servers catch up. It is possible that some articles were lost and cannot be re-fed into the system.
Update: Due to an error promoting a reader slave (this is something we’ve only had to do once before, many years ago) to the article numbering master server, the article numbers being assigned to new articles were grossly incorrect. As such, all articles received between when services were brought back online and the article numbering server was fixed, are lost to our readers. I’m sorry we didn’t catch this sooner.
Update: The news cluster continues to have some issues that we are working to iron out. The replacement article numbering server is having trouble keeping up with the feed and we’ve had to force it to catch up by flushing out its back log (loosing the articles) several times. Please note that this only affects our overviews (article lists) and not our ability to retrieve articles by message-id from our spools. We’re hope to have the situation stabilized soon.
-Kelsey
ATM Switch Reboot
Tonight at about 6:20 PM the ATM switch serving our newest ATM OC12 suffered a software failure and had to be rebooted to restore service. The reboot caused 5-10 minutes of downtime for affected customers. The switch is back up and operating normally at this time, and we are continuing investigation into the software error.
-Jared and Nathan
Santa Rosa Wireless Outage
We are currently experiencing an outage affecting a subset of wireless customers served off one of our antenna sites in Santa Rosa. We are working as quickly as possible to repair the problem but do not have an estimated time to repair at this point. Further updates will be provided as the situation develops.
-Tim, Jared, and the NOC
Update: The downed tower has been returned to service and all affected customers should be online. If you are still experiencing problems with your wireless Internet service, please contact support at 707-547-3400.
PHP 5.3 deployed
We have deployed PHP version 5.3 to our customer web cluster. More information on how to enable PHP 5.3 for your website, can be found here: https://wiki.sonic.net/wiki/Php5
-William
News server issues
Our NNRP server, news.sonic.net, is currently only partially functional. We’ve temporarily lost network connectivity to the provider we use for binary spools. While some articles may be available locally on our servers, most are likely available only on the remote spools. We’re working with our providers to restore service as soon as possible.
Update: As of 9:20PM the problem has been resolved. It may take a little while for our servers to catch up but news is flowing normally again.
-Kelsey
FlexLink Long Range Emergency Maintenance
Within the next 60 minutes, we will be performing emergency maintenance on equipment serving a subset of FlexLink Long Range customers in the Santa Rosa area. Affected customers can expect up to 10 minutes of downtime while the work is performed.
-Tim and the NOC
Non-Invasive Backbone Maintenance
Tonight, beginning at 9PM, we will be performing non-intrusive maintenance on our core infrastructure in San Francisco. No downtime or impact is expected for customers.
-Jared, Nathan and Matt
Update: All maintenance being performed tonight has been completed. There should have been no impact to customers.
Brief Fusion/Flexlink Outage
Today at 5:02PM a configuration error while doing capacity increase configuration took our Petaluma POP offline for approximately 3 minutes. All customers served out of this POP would have lost all Internet and voice connectivity. We have restored service to this POP and apologize for any inconvenience caused.
-Jared
CLEC Intrusive Maintenance
This evening, beginning at 12:01AM, we will be performing maintenance on equipment serving FlexLink and Fusion customers in San Jose, Oakland, San Leandro, Santa Clara, San Carlos, and Milpitas. Affected customers may experience up to 30 minutes of downtime while the work is performed.
-Tim