[dns-operations] Cloudflare DNS resolver (1.1.1.1): Weird DNSSEC race condition
Michael Sinatra
michael at brokendns.net
Fri Aug 3 19:18:46 UTC 2018
Hi,
I have some DNS domains that are (actually *were*) unsigned that I have
begun to sign. I let the zones be signed (using BIND inline-signing)
for several days, or, in the case of one zone, months, before inserting
a DS record in the parent.
I decided to insert the parent DS record in the parent zone (in this
case .org), but I wanted to check the various open resolver services to
make sure they were seeing signatures. The idea, of course, is to avoid
cache timing issues and make sure that signatures are in place for some
time before inserting the DS record.
Here's what's weird: For the zone I am trying to sign, when querying
Google, Quad-9, and other services that support DNSSEC, RRSIGs are
returned if I query the service with the DO bit set, just like we'd
expect. But when I query 1.1.1.1, I get no RRSIGs back for the same
query, even with the DO bit set. Keep in mind that this zone has had
RRSIGs for *months*, just no DS record.
In looking at the query logs on my authoritative servers, I see that
Cloudflare is querying my authoritative servers based on my queries to
1.1.1.1, *except* that the Cloudflare recursive resolver is not setting
the DO bit, even though I am in my query! All of the other services'
backend resolvers are properly setting the DO bit in their iterative
queries.
So I figure, what the heck, I'll put the DS record into the parent
anyway. Maybe Cloudflare will see the DS record and magically go back
and fetch all of the RRSIGs and start validating.
Nope. In fact, Cloudflare does not even fetch the RRSIGs with the DO
bit set and the DS record in Cloudflare's own cache:
michael at manasquan:~ % dig +dnssec ds tinkerdork.org @1.1.1.1
; <<>> DiG 9.12.1-P2 <<>> +dnssec ds tinkerdork.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21223
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1452
;; QUESTION SECTION:
;tinkerdork.org. IN DS
;; ANSWER SECTION:
tinkerdork.org. 10734 IN DS 22640 14 2
BF1EF5CFD4429D096A14C7509F4B38B133AAE155D034F0003546EE0B D68E7EB6
tinkerdork.org. 10734 IN RRSIG DS 7 2 86400 20180824182548
20180803172548 1862 org.
VLvYbhoSAQ67CdBUD8azyD7k6ExbXpFc64/gkqkaaUP4XiEeik88Rf7C
DQVB8Yn3/Obpxcj2+4Oq5tgLcTnxhgvHzvySVZr1JSS4tQwmqxFNZgY1
p7AcUwlXQXS2agole7RSkzgVbEMJ6UqJ98FP1ppTc89xcZuYRyyIdiSn bT0=
;; Query time: 74 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Aug 03 11:27:20 PDT 2018
;; MSG SIZE rcvd: 254
But it still doesn't include RRSIGs in subsequent queries, even though
the cache on tinkerdork.org (60 seconds) has expired:
dig +dnssec tinkerdork.org @1.1.1.1
; <<>> DiG 9.12.1-P2 <<>> +dnssec tinkerdork.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24518
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1452
;; QUESTION SECTION:
;tinkerdork.org. IN A
;; ANSWER SECTION:
tinkerdork.org. 60 IN A 166.84.136.98
;; Query time: 77 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Aug 03 11:48:37 PDT 2018
;; MSG SIZE rcvd: 59
And the query from Cloudflare's backend resolver isn't even asking for
the RRSIGs:
03-Aug-2018 11:48:38.388 client @0x28d5e000
2400:cb00:11:1024::a29e:3cd7#64192 (TInkeRDORK.oRG): query:
TInkeRDORK.oRG IN A -E(0)
[Note the absence of the "D" flag in the about BIND log.]
Meanwhile, Google is having no trouble validating the zone:
michael at manasquan:~ % dig +dnssec tinkerdork.org @8.8.8.8
; <<>> DiG 9.12.1-P2 <<>> +dnssec tinkerdork.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24765
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;tinkerdork.org. IN A
;; ANSWER SECTION:
tinkerdork.org. 59 IN A 166.84.136.98
tinkerdork.org. 59 IN RRSIG A 14 2 60 20180830202654 20180731194526
49377 tinkerdork.org.
VsyOXsqrT/n64mZ+C4mPlTQqRVYjZ4qt3odaCSGLT5GM83vMG64h8fn6
a5y1vF3YpwiMQDvgWi4i2/2Q9qyoqjnhpqTTbvHpRsrB111tzCC+n+3E OgCgkLKNJdB5JT6l
;; Query time: 132 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Aug 03 11:27:36 PDT 2018
;; MSG SIZE rcvd: 201
Sounds like a good recipe for validation failures. "Combine the above
ingredients into a validating resolver that forwards to Cloudflare. The
presence of the DS record will blend nicely with the refusal of
Cloudflare to return RRSIGs (even though the zone has been signed for
months) and will cause your forwarding resolver to SERVFAIL. Bon appetit!"
So I set up a simple Unbound validating resolver and told it to forward
to 1.1.1.1 and had it listen on 10.52.52.5. Bingo:
dig +dnssec tinkerdork.org @10.52.52.5
; <<>> DiG 9.12.1-P2 <<>> +dnssec tinkerdork.org @10.52.52.5
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10962
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;tinkerdork.org. IN A
;; Query time: 1076 msec
;; SERVER: 10.52.52.5#53(10.52.52.5)
;; WHEN: Fri Aug 03 11:55:13 PDT 2018
;; MSG SIZE rcvd: 43
It has been about 30 minutes since I added the DS record and the
behavior is still the same. Anyone using 1.1.1.1 as a forwarder on
their validating resolver will not be able to resolve my zone. Note
also that Cloudflare *does* correctly validate zones signed with the
same algorithm that have had DS records for some time.
And even weirder, some Cloudflare instances on the US east coast are
correctly validating my domain now, but some in the western US still
aren't. The query logs include sets of Cloudflare backends, some asking
for RRSIGs and some (still) not.
I'll update this if/when all Cloudflare instances figure out that
they're supposed to validate my zone and return signatures when the DO
bit is set, but in the meantime, I'd say that using Cloudflare as part
of a forwarding, validating resolver configuration is "considered
dangerous."
michael
More information about the dns-operations
mailing list