Month: December 2002

Mail server trouble.

Wed Dec 18 18:21:33 PST 2002 — Mail server trouble. Two or more of our mail servers refused mail for short periods of time today with ‘User Unknown’ errors for at least some of the email they received. Although we were unable to make the new DB passwd lookups fail during testing, we believe that this failure is related to our recent upgrades. We’ve re-enabled NIS as a fall back user lookup method in case DB lookups fail again. In the meantime we will further analyze the failures so we can prevent them from occurring again again. -Kelsey, Eli, Russ and Scott.

Power restored.

Mon Dec 16 12:21:18 PST 2002 — Power restored. At approximately 8:30am, power was restored to Sonic.net’s Apollo headquarters. -Scott

Sonic.net on generator power.

Mon Dec 16 03:40:11 PST 2002 — Sonic.net on generator power. For the first time ever, Sonic.net is experiencing an extended power event at our headquarters. Everything is online and performing as expected, and no manual intervention is needed.

I’ve visited the office to make sure that all is well, and checked the diesel fuel level – we’re still between 90% and 100% fuel after running on generator for more than two hours. Our fuel and load projections indicate that we’ve got enough diesel to run for days – and of course, we’ll be topping up the tank in the AM Monday. Generator load is currently less than 10% of the max load of 750 kilowatts (3/4 of a megawatt, enough power for a small town of about 750 homes); this generator is sized for the ten year plan.

No services have been interrupted. Kudos to McClure Electric, JLC Construction, Stuart & Stevens gensets and Honeywell monitoring; and of course, Detroit Diesel.

-Dane (proudly)

Miscellaneous Upgrades.

Mon Dec 16 14:51:19 PST 2002 — Miscellaneous Upgrades. We have completed the migration away from YP (NIS) to db files for passwd and group lookups on all of the mail and SpamAssassin servers. This eliminates a number of problems and should also improve performance on these servers, especially when under high load.

We have upgraded our domain email alias membertool to correct a problem wherein users were able to remove their ‘catch-all’ alias causing mail to be incorrectly delivered to other sonic.net mailboxes. Over the next two weeks we are going to work to correct any domains that do not currently have catch-all aliases set up. This should prevent the delivery of mail to the wrong entities, as well as help reduce the amount of SPAM that some users receive. -Kelsey, Nathan, Scott, John, Russ, and Chris B.

BroadLink Tower Five outage due to power…

Sun Dec 15 12:45:25 PST 2002 — BroadLink Tower Five outage due to power failure. From BroadLink:

Ok, it _is_ a power problem, it looks like power went down around 2:00pm (Saturday), our battery system and generator kept it up until around 11:00pm. We’re headed up the mountain to pull the system back online. Tim will be onsite in the next few hours and we should be able to get the generator running again (he’s bringing another generator just in case) The tower itself is our most remote site, when it’s bone dry it takes a 4×4 to reach it. In this weather we may have to hike a portion of the way. PG&E doesn’t have any information on the outage, no word as to the cause but lightning is most likely. They also don’t have any information regarding power restoration. As soon as more information is available I will relay it to sb-ops (Sonic.net). -Jason (BroadLink)

Update: Sun Dec 15 12:49:23 PST 2002, The repair team’s vehicle is stuck in the mud, they are now hiking. More news as it unfolds. -Jason

Update: Sun Dec 15 14:53:20 PST 2002, It looks like BL’s generator isn’t working and we can’t move a replacement on foot. We’re in contact with the property owner to find another option (there are working generators on-site owned by the property, our eqpt isn’t hooked up with them). No useful information from PG&E yet. -Jason

Update: Sunday, December 15, 2002 4:23 PM, PG&E has finally confirmed the outage, they have no time frame for restoration. -Jason

Update: Sun, 15 Dec 2002 19:35:53 -0800, Tower 003 has recently lost power. The tower is currently operating on battery reserves. If the outage is long enough we may have a new problem.

With regards to tower 005, we still have one vehicle stuck midway up the hill. The rescue 4×4 with winch was not able to pull it free. The rain and wind are expected to continue until Tuesday and we’ve no useful data from PG&E regarding the restoration of land-line power to either tower. We’re adding a generator to 003 right now (the ‘real’ generator for tower 003 is halfway up Barham mountain in the mud-locked vehicle). Surprise surprise you can’t buy a generator in S.R. tonight for love or money. The generator should allow us to avoid knocking any additional customers offline. We’re actively looking for a solution to the tower 005 problem. A visual inspection did not reveal any damage which is probably a good sign. When power is restored we expect full functionality to return.

So, in summary: It’s possible tower 003 may have trouble in the next 4-12 hours. We’re doing what we can to avert it.

Tower 005 is still down, unless PG&E repairs that portion of the power grid the tower will be down until we can get a working generator online. We’re trying to find a way to accomplish it but regardless of what we work out ‘down here’ nothing can actually be done until someone can get up the darn hill. With rain continuing the road conditions have steadily degraded from pretty bad to practically insane. Finding a solution that doesn’t require us to wait for the weather to improve (which would probably mean Wednesday or Thursday) is clearly our top priority. -Jason

Update: Sun, 15 Dec 2002 21:14:06 -0800, PG&E power back up to 003 so UPS is re-charging and ready to go.

I will continue to work on the 005 in the AM (MT. Barham) issue that serves all of Rincon and Bennett Valley customers as well as some connected in western Santa (both north and south). Once I get my vehicle out of the way (presently blocking the entire muddy one-lane road) I am seeking to rent an ATV to pull the generator the rest of the way the hill to the tower to restore the power.

PS – We took advantage of Dane’s generous offer to use the hand-cart generator. I will use that on the 003 site if the power dumps again. -Tim (BroadLink)

Update Mon Dec 16 09:53:00 – Tim at Broadlink has sourced a generator that can be connected onsite, with the help of KXFX Radio! They estimate that they will have the Barham tower site back online later today. – Eli

Update Mon Dec 16 16:32:55 – KXFX’s onsite generator has been connected to the Barham Tower site, and the radios are associated with the customer radios. However, the backhaul into the rest of Broadlink’s network has not come up. BL’s engineers fear that the backhaul equipment may have been damaged, and work continues. – Eli and Broadlink

Final Update Thu Dec 19 08:37:47 – After much hard work, Barham is fully operational, and service has been restored! – Eli and Broadlink

Sebastopol POP trouble.

Sun Dec 15 05:55:00 PST 2002 — Sebastopol POP trouble. We are experiencing problems with the Sebastopol POP affecting customer who dial 707-823-8812. We have a ticket open with Pac Bell and are working to get this resolved. Please remember that we have alternate dial-up numbers available at www.sonic.net/cgi-bin/pops.pl -Matt

Update: Sun Dec 15 11:12:26 PST 2002, Power outage in Sebastopol resolved. A long power outage in Sebastopol caused our POP to go off-line for a few hours this morning. Power has been restored and the POP is answering modem calls again.

Intermittent “connection refused” from…

Sat Dec 14 10:29:05 PST 2002 — Intermittent “connection refused” from mail.sonic.net. For a few minutes this morning, connections to mail.sonic.net were sometimes refused. By the time we had logged onto the boxes to correct the problem, the problem had gone away. Five servers in our mail cluster serve mail.sonic.net.

We are soon to deploy a new mail server architecture, which will ensure that mail.sonic.net always answers connections. -Scott and Matt

Sonic.net absorbs SBC-ASI DSL customers of…

Sat Dec 14 17:38:47 PST 2002 — Sonic.net absorbs SBC-ASI DSL customers of DSL Designs. On Saturday December 7th, in an agreement between DSL Designs and Sonic.net, we have taken over the responsibility for serving the majority of DSL Designs SBC-ASI connected DSL customer base. DSL Designs will continue operations, serving Verizon connected customers.

Customers who migrate from DSL Designs to Sonic.net can have their email forwarded to avoid any loss of correspondence. Sonic.net staff will coordinate the email forwarding with the DSL Designs management as subscribers migrate. Many of the new customers are now online with us, and others are coming online in the first few days of next week. -Dane

Nokia Rooftop in Rohnert Park ‘R’ Section…

Sat Dec 14 14:28:03 PST 2002 — Nokia Rooftop in Rohnert Park ‘R’ Section down. The T1 that serves Nokia Rooftop customers in Rohnert Park ‘R’ Section is down. A ticket has been opened with Pac Bell and we expect a reasonable repair in spite of the weather. -Matt

Update: Sat Dec 14 14:56:50 PST 2002, Power outage in Rohnert Park ‘M’ and ‘R’ Sections. After calling down to Rohnert Park, it was determined that the power is out in much of this area. We expect service to be restored when PG&E resolves the issue.

Our unix shell server, bolt.sonic.net is…

Fri Dec 13 15:47:09 PST 2002 — Our unix shell server, bolt.sonic.net is currently having problems, and the ops team is working on it. -Ops

Update: We’ve restored shell services on a standby server and will be investigating bolt’s failure. -Kelsey, Eli and John.