Author: grant.keller@sonic.com

DNS Problems – Resolved

We are seeing an increase in in DNSSEC validation failures on our recursive dns servers. The cause has been identified as a security patch that was applied, which applied a stricter validation policy to domains with DNSSEC enabled. We are currently looking for ways to mitigate the problem.

Update: The cause of these problems has been positively identified as a behavior change that came along with a new version of ISC’s Bind which was was released two days ago in response to a collection of discovered potential security exploits in a group of CVEs.  As always, we strive to push deploy security fixes in our network as quickly as possible and deployed this new version to all of our recursive DNS cluster backend servers over the course of the day starting Thursday AM.  The problem specifically is the removal of SIG(0) combined with a change in behavior for what are seen as “invalid” DNSSEC keys resulting in these being treated as failures instead of being skipped.  We’re currently stuck between a rock and a hard place, a known potential cache poisoning vulnerability, or a version which results in an unknown quantity of broken domains still relying on SIG(0).  More updates forthcoming, we hope to have chosen a path forward to mitigate customer impact from this soon.

Update: We are in the process of rolling back the affected version across our name server clusters.  It is our assessment that the additional complexity we believe is required for one of these potential cache poisoning attacks to succeed in our network justifies rolling back to the previous version rather than other choice which was to entirely disable DNSSEC until the issues with the new version could be resolved.  For additional clarity this was originally brought to our attention by students and staff at usfca.edu who found they were unable to resolve usfca.edu domains this morning, we are not sure how many other affected domains there are or if this issues can rightly be blamed on the upstream DNS server administrators or not.  The roll back should be completed shortly and we’re sorry for any confusion or trouble this may have caused you today.  -Kelsey, Kevan and William

Update 2025-10-31:  As it turns out this problem ended up being a combination of several issues and was actually related to zones that contained a deprecated DNSSEC key type (RSASHA1), even if they also had a valid key as well.  This was additionally confused by RHEL’s security policy framework which triggered the new undesired behavior in Bind.  We are in investigating several work around solutions for this but also expect that Bind will be releasing an update that corrects this behavior relatively soon.

-SOC

Intermittent IMAP login issue.

Routine maintenance this evening caused unexpected load on some of our IMAP/POP mail servers that lasted from 11:15pm to 12:12am. During this time some users may have experienced intermittent problems logging in. The situation is believed to be resolved; our operations team will be reviewing the incident to reduce the impact of the same maintenance in the future.

-SOC

System Maintenance

UPDATE: Maintenance complete

Tonight starting at 10pm we will be performing system updates to some customer facinrg systems, including:

  • IMAP/POP3
  • Webmail
  • Membertools

We don’t expect any noticable interuptions to those services.

-SOC

Membertools outage

UPDATE: Membertools is back up and running, we are currently reviewing the incident.

 

Membertool.sonic.net is currently experiencing an outage that started around 6:25pm. Our Operations team is working on getting the service up and running.

-SOC