We’ve been working on several improvements to our recursive DNS cluster configs to improve performance across the board and better support network growth in new regions beyond our existing service foot print in Northern and Southern California and have rolled out several config changes to the DNS proxies that handle ns1 and ns2.sonic.net over the past week. What we believed was to be the last of those changes was pushed out this afternoon to the entire fleet after having cooked properly on a few systems at 3:15PM. After that change was pushed, a significant portion of IPv6 DNS requests appeared to be black holed by some of the servers. The issues continued until about 3:46PM. We are still unclear on the root cause of this but all services are currently stabilized and running as expected at this time. We will continue to investigate in the hope that we can identify the cause, it seems possible it could be a bug in the dns specific load balancing software itself.
It is worth noting that our expectation was that most clients would have both v6 and v4 servers configured but it is evident that is not the case and it is likely that the majority of v6 enabled clients on our network with no fail over to v4 requests. If you have static configured name servers, we’d suggest you list both the v6 and v4 address listed below.
2001:5a8::11
2001:5a8::33
50.0.1.1
50.0.2.2
-Kelsey, William and the rest of Systems.
I had a service interruption at about that time, but I don’t think I use IPv6. I couldn’t connect to NYTimes.com, user.well.com, or to a crossword puzzle page. It resolved after I restarted my router.
Location: SF – 16th Ave (3 blocks from Taraval)
On-Prem products: Fusion Gigabit Fiber, SR-515 AC
Issue: ALL recursive DNS queries fail on Sonic/ADTRAN SR-515AC, issue also using locally configured “Static DNS” on same.
Anyone else noticing this? Looks like I should file a Tix but since issue is open thought I would try here first and decide what to do based upon response….
It looks like the current firmware (at least what is pushed by Sonic) v2.6.2.8 for the SR-515AC that was provided by Sonic at some point stoped handling recursive DNS requests (ie anything upstream of my local domain). Given that I also have ns1.sonic.net set in the DHCP server, there has been no LOSS of functionality. HOWEVER it DOES explain the decreased overall performance (latency), as now there is a 50% chance that EVERY DNS lookup that is not cached by each host has to first fail recursion at the SR-515, before the host moves on to query ns1.sonic.net, thus more than doubling lookup times (given an assuming that the ADTRAN box’s DNS server would have had the A record locally SOME percent of the time and should have responded to the query locally rather than having to recursively submit upstream).
In addition and possibly related, the embedded DNS server also seems to have issue with static defined (local) hosts. This WAS working previously, unfortunately I don’t have a specific timeframe for this regression.
Power cycling the unit does not fix the issue.
I can of course change SRR-515 DHCP server to hand out ns2.sonic.net (in addition to ns1.sonic.net), deleting the SR-515’s address but won’t address the issue of the failure of locally defined host. Alternatively I can do what I should have done ages ago which is git rid of the charge for this ancient box, and stand up my own pfSense box and be done with the whole issue of key network devices on my network that are not under my full control. Fortunately it does help that I was a a Network Engineer for nearly 20 yrs before moving on to Product Mgt.
Laptop:~ weyer_l$ nslookup Printer
;; Got recursion not available from 192.168.42.1, trying next server
Server: 50.0.1.1
Address: 50.0.1.1#53
** server can’t find Printer: NXDOMAIN
Laptop:~ weyer_l$ nslookup Printer.
;; Got recursion not available from 192.168.42.1, trying next server
Server: 50.0.1.1
Address: 50.0.1.1#53
** server can’t find Printer: NXDOMAIN
Laptop:~ weyer_l$ nslookup Printer.local
;; Got recursion not available from 192.168.42.1, trying next server
Server: 50.0.1.1
Address: 50.0.1.1#53
** server can’t find Printer: NXDOMAIN