[dns-operations] Observing DNS Propagation Inconsistencies Across Global Resolvers
John Todd
jtodd at loligo.com
Mon Mar 9 23:41:16 UTC 2026
Hi Vahid -
Thanks for the tool; always good to see more people building things
for measurement. I'm with Quad9. A few notes/questions/clarifications:
1) Quad9's "primary" should probably be 9.9.9.9 & 149.112.112.112 &
2620:fe::fe - you have us as 149.112.112.13, which is not even listed in
our production set of resolvers. See
https://quad9.net/service/service-addresses-and-features/ for details.
2) Are you using IPv6 to connect to resolvers in your tests? That would
probably give faster results in some cases.
3) Given the latency figures at the bottom of the page, I'm guessing
you're testing public resolvers from within a very specific testpoint.
If you're only testing to public resolvers from one test point, the data
is going to be very skewed and may give people quite non-intuitive
(false) impressions. For instance: I'm fairly confident that Quad9 and
Cloudflare are not >21ms for the vast majority of the world's
population, which is over my repeated tests what seems to be shown.
4) The question mark icon beside each testing point does not work. What
was it supposed to show?
5) Are you performing DNSSEC validation on results? If DNSSEC fails, is
there a special error type shown? Hopefully "yes".
6) Your test seems to be focusing on recursive resolvers, but that is
only half the problem set when you are trying to determine why a zone is
not sync'ed. The first half of the problem is un-described by your
tool, and that is "does the authoritative anycast array have the correct
answers to give?" If you're just measuring what the recursive resolver
says, you can't actually identify where the problem is. The only way to
identify the actual problem (or at least a closer view, which will still
be imperfect) is to have both the recursive resolver AND a direct query
to the authoritative servers from the same location. A significant
problem with zone synchronization is propagation between anycast
instances on the authoritative side of the equation. This will be more
visible to you with more testing across zones that have anycast
authoritative servers.
7) I would expect that clicking on a city name would bring up the list
for ALL recursive resolvers, and their answers from that perspective.
Quad9, Google, Cloudflare, OpenDNS, and many others have enough anycast
nodes that you would need to perform a many-to-many test for all queries
in order to determine if results are lagging. Right now, a single test
in each geography is not very meaningful. The point of having servers in
remote geographies is to test both the authoritative AND recursive
behaviors of large arrays (though of course you should still test more
local resolver caches as well.)
8) I hope you have some math in your system that accounts for TTL when
triggering warnings for "out of sync".
9) (probably the most important question) How do you even validate
correctness of answers? CDNs may have 1000 possible different answers
on a specific query, and which one you get varies depending on
geography/time/cost/phase of moon/routing/etc. You may only see an A
record once across your entire fleet of queries. How do you know this
particular IP address is correct for that query, at that time, from that
geography? In other words: what is the baseline against which you are
testing "correctness"? You can't compare resolvers to each other, and
unless you know the ground truth state of the zone from the
authoritative's perspective (which may be programmatically generated)
then you can't derive that a particular answer is valid or not. So I'm
not sure what the tool provides except in the most simplistic cases
where there is a non-dynamic answer for a qname/qtype. This may still be
enough for some people to find useful but I suspect the people most
concerned with zone update latency may not be using static zone data.
To answer or clarify some background on your questions:
1) "serve-stale" is a behavior some recursive resolvers utilize (Quad9
does.) If authoritative servers are unreachable, the stale record may
be served in lieu of failing. This behavior should be coupled with other
indicators that something is wrong with the authoritative server, though
those problems may not be visible from your testing vantage point. I
don't know if that is the root cause of what you describe, but it
exists.
2) Can you provide some examples of this DS behavior?
3) Again, some examples of CNAME flattening would be useful. Who does
this? Do we? Is the answer correct in the A/AAAA result at the end, or
is this "breaking" in some way that prevents expected client connection
behavior?
4) I'd say that expectations of "fully propagated" would be having
records updated within ${now}-{$last-ttl-value+2s} in normal conditions.
Each resolver may get a query that holds the record for the TTL just
before you ask, so you have to assume that the TTL needs to expire
before you can trust that an answer has been refreshed. I put the caveat
of "in normal conditions" due to serve-stale behavior. You may not know
"last-ttl-value" as well unless you're testing prior to change events
and are certain that TTL is fixed and not generated per-query so that is
another edge case that may catch you.
[note: I wrote this when I had a few unexpected minutes of time; replies
will be very very slow if at all]
JT
On 2 Mar 2026, at 9:12, Vahid Shaik wrote:
> Hello,
>
> I have been building and maintaining an open DNS propagation
> monitoring platform at DNS Robot
> (https://dnsrobot.net<https://dnsrobot.net/> ) that queries 30+
> globally distributed DNS resolvers simultaneously to track record
> propagation in real time.
>
> During development and ongoing monitoring, I have observed some
> interesting inconsistencies in how different resolver implementations
> handle TTL expiry and cache refresh behavior. Specifically:
>
> 1. Some major public resolvers (particularly in Asia-Pacific regions)
> appear to serve stale records well beyond the configured TTL,
> sometimes by 2-3x the expected duration.
>
> 2. DNSSEC-signed zones occasionally show inconsistent DS record
> propagation between parent and child zones during key rollovers,
> visible when checking multiple resolvers within a short window.
>
> 3. CNAME flattening behavior varies significantly — some resolvers
> return the flattened A record while others return the CNAME chain,
> which can cause confusion when debugging propagation.
>
> These observations come from real-time queries across resolvers in 20+
> countries. I am curious whether others on this list have documented
> similar patterns, or if there are known resolver-specific behaviours
> that explain these discrepancies.
>
> I would also welcome any feedback on best practices for measuring
> propagation completeness — specifically, what threshold of global
> resolver agreement should be considered "fully propagated” for
> operational purposes.
>
> Best regards,
>
> Shaik Vahid
>
> DNS Robot — https://dnsrobot.net<https://dnsrobot.net/>
>
> Free DNS Propagation Checker & Network Tools
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20260309/11611773/attachment-0001.html>
More information about the dns-operations
mailing list