Tue Mar 20 14:40:30 PST 2001 — Last night one PRI in the 9811 dialup group went down due to an OC3 circuit failure at Focal Communications. Focal was able to work with Pacific Bell and Sprint to get the PRI back online by 8:00pm. This may have caused a short period of time that the 9811 modem pool returned busy signals. We have added more capacity to this and the Sebastopol dialup groups to increase reliability in in those groups. — Steve
Author: admin
PG&E rolling outages.
Tue Mar 20 14:25:06 PST 2001 — PG&E rolling outages. Sonic.net’s facility has been reclassified into a zone eligible for rolling blackouts. While we were previously in a “50” zone, we’ve been moved into zone 2 as part of PG&E’s efforts to address their energy shortfalls. In response to this, we’ve today deployed additional battery capacity at a cost of around $12,000, plus we have a diesel generator here on site to keep us going beyond the battery runtime. -Dane, Eli, Kelsey, Russ and Scott
Unexpected Network Outage.
Fri Mar 16 16:56:35 PST 2001 — Unexpected Network Outage. We have been working with Extreme Networks to troubleshoot and identify some problems that we’ve had with VLANs in their switches not behaving as expected under certain circumstances. We were gathering a packet dump to help their engineers resolve the problem when we unexpectedly caused a switch loop between giga and ape (our two core switches.) We were able to resolve the loop in a matter of seconds but our Alteon load balancing switch decided to down it’s interfaces into ape making all web and email servers unavailable. It took a few minutes to coax the Alteon to bring it’s interfaces back up and then a few minutes more for the server network to converge.
All locally served web sites and email services were offline for five minutes. No other services were affected. -Russ, Nathan and Kelsey
DSLAM1 in Santa Rosa has had intermittent…
Fri Mar 16 16:12:25 PST 2001 — DSLAM1 in Santa Rosa has had intermittent issues over the last few days, but has been operational for the most part. This afternoon, we’ve noticed more significant problems, and we will be working with Pacific Bell’s Repair to determine when the DSLAM will be fully operational. Our DSL Product manager will be tracking the issue all weekend, and coordinating with Support to get news out in a timely fashion. – Sonic Support
DSLAM 1 in the main Santa Rosa CO is down.
Mon Mar 12 17:55:55 PST 2001 — DSLAM 1 in the main Santa Rosa CO is down. This DSLAM serves approximately 200 of our ADSL customers in Santa Rosa. PacBell (ASI) has not provided us with an ETR. Don’t forget that you may still dial up using a modem if your DSL is offline. -Dave & Eli
We identified the causes of the poor inbound…
Wed Mar 7 18:01:23 PST 2001 — We identified the causes of the poor inbound performance and routing instability on our network and resolved them at 5:00pm. We had been aware of the performance problems since this morning and had been working hard to try to identify the source. As it turns out, there was trouble with two of our routers internal ethernet interfaces which, once identified, was easy to fix. We also ordered an additional 3Mbps for UUNet today which we expect to turn up tomorrow morning. This is in addition to the the 3Mbps that we brought online this morning and brings our UUNet T3 to approximately 24Mbps. -Kelsey, Eli, Scott and Russ.
We have successfully upgraded our UUNet DS3…
Wed Mar 7 13:34:41 PST 2001 — We have successfully upgraded our UUNet DS3 circuit to the next access class, in response to a need for more bandwidth that we identified last week. It took longer than expected to work with UUNet to get this upgrade accomplished, but we were able to make the change today with no negative impact whatsoever on the network, and utilization stats on our circuits now show optimal performance with no bottlenecks. — Eli, Kelsey
Broadlink Maintenance.
Tue Mar 6 10:08:01 PST 2001 — Broadlink Maintenance. Broadlink will be performing some maintenance at two of their tower sites. The Barham tower which serves Rincon Valley will be down on Wednesday, 3/07 from 11:59 pm to 12:15 am. The Wmoore tower which servers Southeast Santa Rosa will be down from 12:30 am to 12:45 am. If you experience any service outages outside of this maintenance window please let us know. -Broadlink
Router coup.
Mon Mar 5 13:35:21 PST 2001 — Router coup. As part of our move to a redundant configuration, a simple operation turned into a serious problem for about 3 minutes. Specifically, one of the USR Total Control dialup routers took control of our internal routing protocol, even though it was configured not to. The problem was detected within about 2 minutes and corrected within about a minute. Even more specifically, two routers with OSPF priorities of “127” and “96” were couped by a router with a priority of “5”. We’re still trying to figure out how that could have happened. -Scott and Dane
Night Operations.
Sun Mar 4 10:45:51 PST 2001 — Night Operations. Sunday morning at 12:30 am Sonic.net performed a number of maintenance upgrades designed to increase NetApp filer, news server and core switching performance as well as border router redundancy. All upgrades were preventative and further increase the redundancy and responsiveness of Sonic.net.
The news server, Typhoon, underwent a performance upgrade — unfortunately, the license key provided by the vendor did not work for our new configuration.
An upgrade of our core switch OS caused a “network freeze” that lasted about 90 seconds while the switch rebooted. The new OS has bug and performance fixes.
We commissioned into service another Cisco 7200 router as part of our ongoing efforts toward more redundant Internet connectivity. Sonic.net now uses two Cisco 7200 edge routers — “gamma” and “delta” — in addition to the Cisco 7507 that used to be our edge router, “mega”. Each edge router handles a T3 to one of the two largest Internet backbones: UUNet and Cable & Wireless. While we had reported that Internet connectivity would be lost for three minutes during this upgrade, actual impact was less than a minute.
The NetApp NFS filer underwent an OS upgrade and disk firmware upgrade, disturbing mail and web storage for about an hour.
A number of Port Masters servicing 707-522-1002 and some Oakland numbers were rebooted. There was no noticeable down time, however, dial-up connections on the 1002 equipment were terminated. -Matt, Kelsey, Scott, Steve, Russ, Scott R., and Jeff