[dns-operations] Observing DNS Propagation Inconsistencies Across Global Resolvers

Mon Mar 9 23:41:16 UTC 2026

Hi Vahid -
   Thanks for the tool; always good to see more people building things 
for measurement.  I'm with Quad9.  A few notes/questions/clarifications:

1) Quad9's "primary" should probably be 9.9.9.9 & 149.112.112.112 & 
2620:fe::fe - you have us as 149.112.112.13, which is not even listed in 
our production set of resolvers.  See 
https://quad9.net/service/service-addresses-and-features/ for details.

2) Are you using IPv6 to connect to resolvers in your tests?  That would 
probably give faster results in some cases.

3) Given the latency figures at the bottom of the page, I'm guessing 
you're testing public resolvers from within a very specific testpoint. 
If you're only testing to public resolvers from one test point, the data 
is going to be very skewed and may give people quite non-intuitive 
(false) impressions. For instance: I'm fairly confident that Quad9 and 
Cloudflare are not >21ms for the vast majority of the world's 
population, which is over my repeated tests what seems to be shown.

4) The question mark icon beside each testing point does not work.  What 
was it supposed to show?

5) Are you performing DNSSEC validation on results? If DNSSEC fails, is 
there a special error type shown?  Hopefully "yes".

6) Your test seems to be focusing on recursive resolvers, but that is 
only half the problem set when you are trying to determine why a zone is 
not sync'ed.  The first half of the problem is un-described by your 
tool, and that is "does the authoritative anycast array have the correct 
answers to give?"  If you're just measuring what the recursive resolver 
says, you can't actually identify where the problem is.  The only way to 
identify the actual problem (or at least a closer view, which will still 
be imperfect) is to have both the recursive resolver AND a direct query 
to the authoritative servers from the same location. A significant 
problem with zone synchronization is propagation between anycast 
instances on the authoritative side of the equation. This will be more 
visible to you with more testing across zones that have anycast 
authoritative servers.

7) I would expect that clicking on a city name would bring up the list 
for ALL recursive resolvers, and their answers from that perspective.  
Quad9, Google, Cloudflare, OpenDNS, and many others have enough anycast 
nodes that you would need to perform a many-to-many test for all queries 
in order to determine if results are lagging.  Right now, a single test 
in each geography is not very meaningful. The point of having servers in 
remote geographies is to test both the authoritative AND recursive 
behaviors of large arrays (though of course you should still test more 
local resolver caches as well.)

8) I hope you have some math in your system that accounts for TTL when 
triggering warnings for "out of sync".

9) (probably the most important question) How do you even validate 
correctness of answers?  CDNs may have 1000 possible different answers 
on a specific query, and which one you get varies depending on 
geography/time/cost/phase of moon/routing/etc.  You may only see an A 
record once across your entire fleet of queries.  How do you know this 
particular IP address is correct for that query, at that time, from that 
geography?  In other words: what is the baseline against which you are 
testing "correctness"?  You can't compare resolvers to each other, and 
unless you know the ground truth state of the zone from the 
authoritative's perspective (which may be programmatically generated) 
then you can't derive that a particular answer is valid or not.  So I'm 
not sure what the tool provides except in the most simplistic cases 
where there is a non-dynamic answer for a qname/qtype. This may still be 
enough for some people to find useful but I suspect the people most 
concerned with zone update latency may not be using static zone data.

To answer or clarify some background on your questions:

1) "serve-stale" is a behavior some recursive resolvers utilize (Quad9 
does.)  If authoritative servers are unreachable, the stale record may 
be served in lieu of failing. This behavior should be coupled with other 
indicators that something is wrong with the authoritative server, though 
those problems may not be visible from your testing vantage point.  I 
don't know if that is the root cause of what you describe, but it 
exists.

2) Can you provide some examples of this DS behavior?

3) Again, some examples of CNAME flattening would be useful.  Who does 
this?  Do we?  Is the answer correct in the A/AAAA result at the end, or 
is this "breaking" in some way that prevents expected client connection 
behavior?

4) I'd say that expectations of "fully propagated" would be having 
records updated within ${now}-{$last-ttl-value+2s} in normal conditions. 
Each resolver may get a query that holds the record for the TTL just 
before you ask, so you have to assume that the TTL needs to expire 
before you can trust that an answer has been refreshed. I put the caveat 
of "in normal conditions" due to serve-stale behavior.  You may not know 
"last-ttl-value" as well unless you're testing prior to change events 
and are certain that TTL is fixed and not generated per-query so that is 
another edge case that may catch you.

[note: I wrote this when I had a few unexpected minutes of time; replies 
will be very very slow if at all]

JT

On 2 Mar 2026, at 9:12, Vahid Shaik wrote:

> Hello,
>
> I have been building and maintaining an open DNS propagation 
> monitoring platform at DNS Robot 
> (https://dnsrobot.net<https://dnsrobot.net/> ) that queries 30+ 
> globally distributed DNS resolvers simultaneously to track record 
> propagation in real time.
>
> During development and ongoing monitoring, I have observed some 
> interesting inconsistencies in how different resolver implementations 
> handle TTL expiry and cache refresh behavior. Specifically:
>
> 1. Some major public resolvers (particularly in Asia-Pacific regions) 
> appear to serve stale records well beyond the configured TTL, 
> sometimes by 2-3x the expected duration.
>
> 2. DNSSEC-signed zones occasionally show inconsistent DS record 
> propagation between parent and child zones during key rollovers, 
> visible when checking multiple resolvers within a short window.
>
> 3. CNAME flattening behavior varies significantly — some resolvers 
> return the flattened A record while others return the CNAME chain, 
> which can cause confusion when debugging propagation.
>
> These observations come from real-time queries across resolvers in 20+ 
> countries. I am curious whether others on this list have documented 
> similar patterns, or if there are known resolver-specific behaviours 
> that explain these discrepancies.
>
> I would also welcome any feedback on best practices for measuring 
> propagation completeness — specifically, what threshold of global 
> resolver agreement should be considered "fully propagated” for 
> operational purposes.
>
> Best regards,
>
> Shaik Vahid
>
> DNS Robot — https://dnsrobot.net<https://dnsrobot.net/>
>
> Free DNS Propagation Checker & Network Tools
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20260309/11611773/attachment-0001.html>