Month: January 2017

Network Connectivity Outage

Update: 4:15pm – Today at 12:28pm Sonic experienced a network instability event of an unknown origin at the time.  Through troubleshooting we were able to narrow down the issue to two transport links going from one of our datacenter locations to core network equipment.  These links were receiving replicated traffic from the transport providers network equipment, sent back at us. The replicated traffic overloaded the CPU on our core routers. We have taken steps to prevent the replicated traffic from affecting our network, and we have contacted our provider for further diagnosis. Apologies for the delay in getting issue resolved, it was a very difficult problem to troubleshoot, and we have never seen anything like this happen before.

Update 3:42pm – We believe everything is restored.  We will release a RFO soon with more specific information.

Update 2:46pm – We are still working to mitigate the DoS. All of our engineering staff is currently engaged in this issue. We will post more details as they become available.

As of 12:30pm, the Sonic network is experiencing reachability issues to the outside world.  A large DoS attack is the suspected cause but we are still working to identify and mitigate the problem ASAP.  More information will follow.

-Sonic NOC

Phone Switch Maintenance

Update: 3:10  Maintenance is complete.

 

We will be performing scheduled maintenance on our phone switch tonight at midnight. We do not expect any service impact by this maintenance.

 

Network Engineering

System Maintenance

UPDATE: Maintenance complete.

Tonight at 11:59pm, System Operations will be running maintenance updates on several customer facing systems. The following services may experience brief interruptions:

  • IMAP/POP3/SMTP
  • Customer hosted websites
  • VPN
  • IPv6 Tunnels

The maintenance period is expected to last 1 hour.

-SOC

Enterprise SSE Fiber circuit

Update: 1/23/2017 – 4:05pm – A bug has been identified in the vendors network that is related to the CPE that is put on site and the router in the vendors core network.  Our vendor has been issuing upgrades to the CPE which has the potential expose this bug on the core router, causing traffic to loop.  When this occurs, the traffic is perceived as BUM traffic and is policed causing speeds to be capped at 2 Mbps.  We are working with our vendor to create solutions to help identify and act quickly when this bug happens.

Update: 1/23/2017 – 9:58am – We are continuing to actively work with our vendor’s advanced Engineering and Management teams to resolve the remaining affected customers. Please report outstanding SSE service impacting issues to NOC for us to investigate.

Update: 1/22/2017 – 3:30pm We are receiving some more reports this afternoon regarding this issue being unresolved for a portion of our customers and have escalated with our vendor.

Update: 1/21/2017 – 8:30pm We believe this issue has been located and fixed at this point. Please contact the NOC if you are experiencing any further trouble.

Update: 1/21/2017 – 5:40pm We are continuing to escalate and work with our vendor on this issue.

Update: 1/21/2017 – 11:22am We are still working with high level escalation with our vendor.  Currently they are tracing aggressively through their network in an attempt to locate any issues or anomalies.

Update: 1/21/2017 – 6:46am We are working direct with upper management and 2 different teams.  We will update soon when they get some more data to us.

Update: 10:38pm – Escalation has moved to a higher level department and our issue is currently being worked.  We are awaiting the results of their testing.  No ETR.

Update: 8:11pm – We are still on working with our vendors and have escalated the issue again. There is currently no ETR.

Update: 4:45pm – Our issue is currently being worked by our vendors Engineering staff, they suspect an issue in the core of their network.  No ETR yet.

Update: 2:09pm – We are still working with our vendor and have assisted in narrowing down the issue. There is currently no ETR.

Update: 10:39am – We are working with our fiber vendor to isolate and resolve this issue. There is currently no ETR. As soon as we have one or know more, this post will be updated.

~ Network Operations

We are investigating reported packet loss and high latency spikes on our enterprise SSE fiber circuits.  We will update this once we know more.

~ Network Operations

Santa Cruz – Fusion/Flexlink Intrusive Maintenance

Maintenance was completed at 4:22am

UPDATE(3:38pm) – This will also extend to a subset of customers in the Palo Alto area as well.

Tonight, 1/19/2017, starting at midnight, we will be replacing cards serving a subset of customers in the Santa Cruz. Expected downtime is around 20-30 minutes while the cards upgrade themselves and reboot. The maintenance window is from 12am – 5am.

Network Engineering

Santa Rosa Data Center – Outage

Beginning tonight at around 4:50PM we experienced a large DDoS attack towards a colocation customer of ours. During this attack you may have noticed little or no connectivity to Sonic members services such as email or members tools and an inability to phone into our support department. The attack is still ongoing but we believe services have been fully restored as of about 7:00PM.


Network Engineering

Mendocino, Fort Bragg, Palo Alto, and Mountain View – Fusion/Flexlink Intrusive Maintenance

Tonight, 1/6/2017, starting at midnight, we will be replacing cards serving a subset of customers in the Mendocino, Fort Bragg, Palo Alto, and Mountain View areas. Expected downtime is around 20-30 minutes while the cards upgrade themselves and reboot. The maintenance window is from 12am – 5am.

Update(9:13am): A small subset of customers in the Mountain View area are still down due to an issue with one card after the upgrade. We are working to restore service as quickly as possible.

Update(11:28am): The last of the affected customers have been fixed. Maintenance complete.

-Network Engineering

Santa Rosa Data Center – Outage

Around 8:04pm Sonic experienced an NTP attack to our data center – taking down connectivity to services hosted within.  We have put in place measures to prevent the attack from happening further and service stability has been restored.

-Network Engineering