[dns-operations] Cache efficiency (was: Re: DNS .com/.net resolution problems in the Asia/Pacific region)

Thu Jul 20 21:50:19 UTC 2023

Mark Andrews wrote:
> Lookups take enormous numbers of queries these days.  A support customer
> was asking why a lookup wasn’t completing within 3 seconds.  The resolution
> process took 48 queries with a cold cache.  Involved several CDNs and required
> fetching nameserver addresses in several different TLDs.  There where no retries
> in that count.
> 
> CNAME chains are expensive but we have a whole industry that has fallen in love
> with them.
> 
> Yes, we do have query limits but they need to be large to handle this sort of
> stuff.

Yes, there are lookups that can take a long time to perform with a cold
cache. By putting lots of users behind large, centralized caches we can
insulate users from a lot of cold cache lookups, but these centralized
resolvers then become concentrated points of failure, convenient
monitoring points, etc. Personally, I'd like to see the "full resolver"
role be re-distributed and move out as close as possible to the
endpoints, given that the original justification for the stub
resolver/full resolver split was a lack of resources at the endpoints --
in the 1980's. But if you have full resolvers running on individual
endpoints, or on network elements that serve individual households, etc.
you're much more likely to run into cold cache lookups and it would be
nice to be able to accelerate or avoid those cold lookups.

Here are some random ideas for improving the efficiency of cold or
lukewarm caches.

1. Cache occlusion rather than replacement of outranked data. The DNS
protocol reuses the same record type (NS) for both the non-authoritative
delegation nameserver record set served by the parent zone as well as
the authoritative nameserver record set served by the child zone. RFC
2181 § 5.4.1 says that resolvers "should replace" the data from the
parent zone when they receive authoritative data from the child zone,
but the parent zone often has a much longer TTL on the records that it
serves. (E.g., the twitter.com data from the .com zone has a 2-day TTL,
while the twitter.com NS record set from the twitter.com zone has a <4
hour TTL.)

If resolver caches were able to retain the longer-lived NS records from
the parent and "occlude" them when a shorter-lived NS record from the
child is cached, then utilize them again when they become unoccluded
upon the expiration of the child NS record, it would avoid sending
unnecessary queries to the parent. It would also be arguably more
compatible with the lower case "should replace" text in RFC 2181 § 5.4.1
than a "parent-centric" resolver implementation.

2. Persist some or all of the resolver's cached NS and nameserver
address records to disk. These are typically long-lived records and I'd
gladly trade a few tens of MB of disk space in exchange for better P99+
resolution latency after a restart. Perhaps this could also include the
RTTs, EDNS capabilities, etc. that is sometimes called the
"infrastructure cache".

Compare to modern web browsers which allow websites to store an enormous
amount of data on every user's disk. (If you use Chrome, check
chrome://settings/content/all and sort by "Data stored". According to
[0], Chrome apparently believes that individual web origins are entitled
to use "up to 60%" of your disk space.) A tiny fraction of that disk
space could store a very large amount of the most frequently used DNS
infrastructure records.

[0] https://web.dev/storage-for-the-web/#how-much

I believe some resolver implementations e.g. Knot Resolver already store
their entire cache on disk in an LMDB database.

3. You mention CNAME chains, but NS delegations are another source of
indirection that may require additional upstream lookups, especially if
the nameserver names are in several different TLDs (as a reliability
hedge?). There are a couple of things that could be done here:

a) Delegations within the same organization often reflect internal
organizational boundaries. One team may want to give control over part
of the namespace to another team, without handing over write permissions
for the whole zone, so the typical solution is to carve out a child zone
for the other team, and host that zone on the same provider as the
parent zone. If the cloud-based DNS providers that many organizations
use offered a more granular, less than whole zone permissions model, it
would cut down on the number of child zones that are created solely to
reflect intra-organizational boundaries.

b) Make nameserver address indirection *optional* without requiring a
backwards-incompatible protocol change.

One could stand up "stunt" nameservers that return A or AAAA records for
an IP address embedded in the QNAME, e.g.:

    ;; QUESTION SECTION:
    ;198.51.100.1.ipv4-literal.example. IN A

    ;; ANSWER SECTION:
    198.51.100.1.ipv4-literal.example. 86400 IN A 198.51.100.1

and

    ;; QUESTION SECTION:
    ;2001:db8::1.ipv6-literal.example. IN AAAA

    ;; AUTHORITY SECTION:
    2001:db8::1.ipv6-literal.example. 86400 IN AAAA 2001:db8::1

Then, zones could be delegated "directly" to a nameserver IP address by
embedding the literal nameserver IP addresses into these special
domains. Perhaps you could use a mixture of "direct" and "indirect"
nameserver address records, e.g.:

    example.com. NS 198.51.100.1.ipv4-literal.example.
    example.com. NS 198.51.100.2.ipv4-literal.example.
    example.com. NS 2001:db8::1.ipv6-literal.example.
    example.com. NS 2001:db8::2.ipv6-literal.example.
    example.com. NS ns1.example.net.
    example.com. NS ns2.example.net.

Of course, that's still indirection, but the next step would be to put
those zones (ipv4-literal.example and ipv6-literal.example) through the
RFC 6761 special-use domain name process and reserve/define them in such
a way that resolver implementations could be updated to directly
synthesize the IPv4 or IPv6 literal implied by the QNAME *without*
needing to actually contact the nameservers for the ipv4-literal.example
or ipv6-literal.example zones.

(If the SUDN process is used I'm thinking those zones should be called
something like ipv4-literal.arpa and ipv6-literal.arpa.)

-- 
Robert Edmonds