Author: admin

ssl.sonic.net stopped responding to web…

Sun Jul 21 10:08:30 PDT 2002 — ssl.sonic.net stopped responding to web requests a few minutes ago. A brief investigation revealed that the apache webserver was wedged up on NFS, most likely a result of the mass migration. A reboot resolved the problem but we’ll be keeping a close eye on it in case it’s something else. -Kelsey

Night Operations Complete.

Sun Jul 21 07:24:36 PDT 2002 — Night Operations Complete. We’ve made a massive move of our core storage architecture and associated servers. The migration went very well, and downtime of most services was quite a bit shorter than planned. Servers were taken offline at 1:30am. Web hosting was down for a bit over an hour. Mail took longer due to the backlog of inbound and outbound mail, and took about three hours to complete. Peak load average observed on one of the mail servers during this time was 992.99 (compare to a typical of under 5.00). FTP was down for about an hour and a half, later in the morning.

The clock on our primary administrative box came up with the incorrect time, and executed some scheduled tasks far out of schedule. This resulted in some invoices for colocation, disk usage and bandwidth usage being run when they shouldn’t have. If you have an email in your inbox regarding billing for one of these type of services this AM, please disregard! No actual charges were made.

We had some unexpected problems with one of our nameservers, and a few other minor challenges, but things look quite healthy now and we’re very pleased with how smoothly this transition went. This completes the majority of our move into our new datacenter facility, and we’re very excited.

If you should observe any odd behavior, please post to news:sonic.net, or email support@sonic.net, or call support at 707-547-3400 and explain what you’re observing. Ask support not to wake us if it’s not a critical item. =)

Thanks to the team here for all of their help! While I might have to ask forgiveness for missing someone, here’s a list of the folks who worked on this tonight. Ops: Nathan, Kelsey, Scott, Russ, Steve, Matt and myself. Techs: ChrisM, Jeff, Aaron, MattS, ScottB, ChrisB, Dan, Kavan, Bryan. Guest helpers: JenM DustinM

-Dane (very happy to be almost completely moved, and looking forward to my vacation which begins on Wednesday.)

Night Operations.

Sat Jul 20 21:54:50 PDT 2002 — Night Operations. Tonight we will be moving our NetApp filers, web, mail and support servers. We will begin by moving half of our load balanced web and mail servers. This pre-move of systems will not create a service outage. NetApps are scheduled to go off-line at 1am and should be down for no longer than 4 hours. During this time, mail and pop service will be off-line as well as locally hosted web sites. This operation will not affect connectivity to the Internet. -Matt, Nathan and Kelsey

“R” Section Rooftop restored.

Fri Jul 19 16:17:34 PDT 2002 — “R” Section Rooftop restored. The Rohnert Park “R” section Rooftop circuit had an outage earlier today. We opened a ticket with PacBell, and service was restored at approx. 3:15pm. PacBell reports that a backhoe “bumped” some of their equipment, which needed some TLC to come up again. -Scott and Steve

“R” Section Rooftop down.

Fri Jul 19 13:23:21 PDT 2002 — “R” Section Rooftop down. The Rohnert Park “R” section Rooftop circuit is down. We have a ticket open with PacBell, and they are currently testing the circuit. -Scott, John, and Jen

Update: PacBell can’t see the Rohnert Park end of the circuit, so they have rolled a truck to see what’s going on. -Scott and Steve

In the past 10 minutes, we’ve had a number of

Thu Jul 18 19:08:38 PDT 2002 — In the past 10 minutes, we’ve had a number of reports from DSL customers in Sebastopol that they are offline. We have opened trouble tickets with ASI, and will followup to work out what the situation is (likely down DSLAM) and estimated time to resolution is. -Dane, Zeke and Jared

Update: It appears that the DSLAM is back up and operational. -Kavan and Scott

Problems on our 522-1003 dial group.

Thu Jul 18 17:37:35 PDT 2002 — Problems on our 522-1003 dial group. In order to reduce downtime in our xxx-9811 Focal dial group we shuffled equipment out of our 522-1003 dial group so it could be relocated to San Francisco. In order for this migration to work properly, the gear had to be reconfigured on both ends. The new xxx-9811 gear was properly transitioned to its new configuration but the 522-1003 wasn’t properly configured. The MPIP server address wasn’t changed, which prevented MLPPP (Multilink PPP) from negotiating properly. As a result, all attempted MLPPP sessions to 522-1003 failed to negotiate multiple channels until this was resolved this afternoon. Overall, this only affected a small number of customers, primarily those with ISDN. -Steve

Brief busy signals on dial-up numbers ending…

Wed Jul 17 08:20:45 PDT 2002 — Brief busy signals on dial-up numbers ending with 9811. We discovered a misconfigured card in the middle of our dial group which caused the 9811 numbers to give false busy signals for about 5 minutes. The problem has been corrected and we will monitor this POP closely all day. -Matt and Steve