[dns-operations] More Aggressive prefetch for popular names

bert hubert bert.hubert at powerdns.com
Sat Apr 6 19:25:06 UTC 2019


On Sat, Apr 06, 2019 at 11:58:23AM -0700, Doug Barton wrote:
> We have to get out of the mindset that it's our job to fix someone else's
> mistakes. We keep adding kludges to the DNS which increase our attack
> surface, and the more we increase code complexity the more we open ourselves
> up to bugs, both serious and not.

+1000000

There is a lot to be said for taking the stance that as long as we coddle
the world, the world will continue to degrade and do an ever more terrible
job. 

If authoritative servers are slow to respond or send bad responses, it is
actually GOOD if the domains they host suffer. It provides an incentive for
people to clean up their act.

Similarly, browsers at times have attempted to resend failed DNS queries on
an already loaded page, covering up a lot of resolver damage. I can
understand the intent, but it covers for a lot of broken resolvers. It
should hurt!

Prefetching, serve-stale and other things increase complexity and hurt the
good actors.  "There is no such thing as a free feature".  We may keep
Twitter alive for a few hours without DNS at the cost of one day taking down
another domain because we wrongly triggered serve stale of old data. 

I realize it is a lot of fun to whiteboard how resolvers could become even
smarter (or 'complicated' as I like to call it), but it is typically not the
whiteboard people picking up the phone at 3AM if the clever idea fails.

And sadly, that is my job. 

And the problem is, once one resolver has a clever feature, it ends up on
feature lists and the 'keep it simple please' implementations start losing
out.

So everyone, please think twice before complicating things further. And
specifically, run the numbers.

As a case in point, together with a huge European telco we ran prefetching
in production.  We tried real hard but could not show any end-user benefit. 
It was not visible on the graphs.  The decrease in response time latency did
not stand out within the noise of the graph.  Only if you focus on very
specific domains and watch every query will you notice a 99th percentile
effect.

So before putting that marker to the whiteboard, remind yourself to also
measure the advantage you think those thousands of lines of code will
generate.

	Bert




More information about the dns-operations mailing list