[dns-operations] Observing DNS Propagation Inconsistencies Across Global Resolvers
John Todd
jtodd at loligo.com
Tue Mar 10 16:52:15 UTC 2026
That's disappointing. Thanks for the heads-up.
That's the kind of thing that causes significant dis-engagement
tendencies, speaking for myself - I just wasted 10m on talking to a bot
that looked like a well-meaning but inexperienced newcomer to the
community. I have no immediate ideas on how to prevent it, though.
JT
On 9 Mar 2026, at 21:23, Ondřej Surý wrote:
> John, this is just a SEO spammer replying to emails with AI slop
> (copy&paste), see the similar exchange here:
> https://mailman.ripe.net/archives/list/dns-wg@ripe.net/message/E4SENMXPYAF77X4WY3QOR3SRFWPIQGZB/--Ondřej
> Surý (He/Him)
>
> A gentle nudge is always appreciated if I take a little longer to
> reply.
>
>> On 10. 3. 2026, at 1:01, John Todd <jtodd at loligo.com> wrote:
>
>>
>>
>> Hi Vahid -
>> Thanks for the tool; always good to see more people building things
>> for measurement. I'm with Quad9. A few
>> notes/questions/clarifications:
>>
>> Quad9's "primary" should probably be 9.9.9.9 & 149.112.112.112 &
>> 2620:fe::fe - you have us as 149.112.112.13, which is not even listed
>> in our production set of resolvers. See
>> https://quad9.net/service/service-addresses-and-features/ for
>> details.
>>
>> Are you using IPv6 to connect to resolvers in your tests? That would
>> probably give faster results in some cases.
>>
>> Given the latency figures at the bottom of the page, I'm guessing
>> you're testing public resolvers from within a very specific
>> testpoint. If you're only testing to public resolvers from one test
>> point, the data is going to be very skewed and may give people quite
>> non-intuitive (false) impressions. For instance: I'm fairly confident
>> that Quad9 and Cloudflare are not >21ms for the vast majority of the
>> world's population, which is over my repeated tests what seems to be
>> shown.
>>
>> The question mark icon beside each testing point does not work. What
>> was it supposed to show?
>>
>> Are you performing DNSSEC validation on results? If DNSSEC fails, is
>> there a special error type shown? Hopefully "yes".
>>
>> Your test seems to be focusing on recursive resolvers, but that is
>> only half the problem set when you are trying to determine why a zone
>> is not sync'ed. The first half of the problem is un-described by your
>> tool, and that is "does the authoritative anycast array have the
>> correct answers to give?" If you're just measuring what the recursive
>> resolver says, you can't actually identify where the problem is. The
>> only way to identify the actual problem (or at least a closer view,
>> which will still be imperfect) is to have both the recursive resolver
>> AND a direct query to the authoritative servers from the same
>> location. A significant problem with zone synchronization is
>> propagation between anycast instances on the authoritative side of
>> the equation. This will be more visible to you with more testing
>> across zones that have anycast authoritative servers.
>>
>> I would expect that clicking on a city name would bring up the list
>> for ALL recursive resolvers, and their answers from that perspective.
>> Quad9, Google, Cloudflare, OpenDNS, and many others have enough
>> anycast nodes that you would need to perform a many-to-many test for
>> all queries in order to determine if results are lagging. Right now,
>> a single test in each geography is not very meaningful. The point of
>> having servers in remote geographies is to test both the
>> authoritative AND recursive behaviors of large arrays (though of
>> course you should still test more local resolver caches as well.)
>>
>> I hope you have some math in your system that accounts for TTL when
>> triggering warnings for "out of sync".
>>
>> (probably the most important question) How do you even validate
>> correctness of answers? CDNs may have 1000 possible different answers
>> on a specific query, and which one you get varies depending on
>> geography/time/cost/phase of moon/routing/etc. You may only see an A
>> record once across your entire fleet of queries. How do you know this
>> particular IP address is correct for that query, at that time, from
>> that geography? In other words: what is the baseline against which
>> you are testing "correctness"? You can't compare resolvers to each
>> other, and unless you know the ground truth state of the zone from
>> the authoritative's perspective (which may be programmatically
>> generated) then you can't derive that a particular answer is valid or
>> not. So I'm not sure what the tool provides except in the most
>> simplistic cases where there is a non-dynamic answer for a
>> qname/qtype. This may still be enough for some people to find useful
>> but I suspect the people most concerned with zone update latency may
>> not be using static zone data.
>>
>> To answer or clarify some background on your questions:
>>
>> "serve-stale" is a behavior some recursive resolvers utilize (Quad9
>> does.) If authoritative servers are unreachable, the stale record may
>> be served in lieu of failing. This behavior should be coupled with
>> other indicators that something is wrong with the authoritative
>> server, though those problems may not be visible from your testing
>> vantage point. I don't know if that is the root cause of what you
>> describe, but it exists.
>>
>> Can you provide some examples of this DS behavior?
>>
>> Again, some examples of CNAME flattening would be useful. Who does
>> this? Do we? Is the answer correct in the A/AAAA result at the end,
>> or is this "breaking" in some way that prevents expected client
>> connection behavior?
>>
>> I'd say that expectations of "fully propagated" would be having
>> records updated within ${now}-{$last-ttl-value+2s} in normal
>> conditions. Each resolver may get a query that holds the record for
>> the TTL just before you ask, so you have to assume that the TTL needs
>> to expire before you can trust that an answer has been refreshed. I
>> put the caveat of "in normal conditions" due to serve-stale behavior.
>> You may not know "last-ttl-value" as well unless you're testing prior
>> to change events and are certain that TTL is fixed and not generated
>> per-query so that is another edge case that may catch you.
>>
>> [note: I wrote this when I had a few unexpected minutes of time;
>> replies will be very very slow if at all]
>>
>> JT
>>
>> On 2 Mar 2026, at 9:12, Vahid Shaik wrote:
>>
>>> Hello,
>>>
>>> I have been building and maintaining an open DNS propagation
>>> monitoring platform at DNS Robot
>>> (https://dnsrobot.nethttps://dnsrobot.net/ ) that queries 30+
>>> globally distributed DNS resolvers simultaneously to track record
>>> propagation in real time.
>>>
>>> During development and ongoing monitoring, I have observed some
>>> interesting inconsistencies in how different resolver
>>> implementations handle TTL expiry and cache refresh behavior.
>>> Specifically:
>>>
>>> Some major public resolvers (particularly in Asia-Pacific regions)
>>> appear to serve stale records well beyond the configured TTL,
>>> sometimes by 2-3x the expected duration.
>>>
>>> DNSSEC-signed zones occasionally show inconsistent DS record
>>> propagation between parent and child zones during key rollovers,
>>> visible when checking multiple resolvers within a short window.
>>>
>>> CNAME flattening behavior varies significantly — some resolvers
>>> return the flattened A record while others return the CNAME chain,
>>> which can cause confusion when debugging propagation.
>>>
>>> These observations come from real-time queries across resolvers in
>>> 20+ countries. I am curious whether others on this list have
>>> documented similar patterns, or if there are known resolver-specific
>>> behaviours that explain these discrepancies.
>>>
>>> I would also welcome any feedback on best practices for measuring
>>> propagation completeness — specifically, what threshold of global
>>> resolver agreement should be considered "fully propagated” for
>>> operational purposes.
>>>
>>> Best regards,
>>>
>>> Shaik Vahid
>>>
>>> DNS Robot — https://dnsrobot.nethttps://dnsrobot.net/
>>>
>>> Free DNS Propagation Checker & Network Tools
>>>
>>> dns-operations mailing list
>>> dns-operations at lists.dns-oarc.net
>>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
>>
>> _______________________________________________
>> dns-operations mailing list
>> dns-operations at lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20260310/e6723842/attachment.html>
More information about the dns-operations
mailing list