[dns-operations] How should work name resolution on a modern system?

Wed Jun 15 12:31:37 UTC 2022

On 11. 06. 22 5:56, Viktor Dukhovni wrote:
> On Fri, Jun 10, 2022 at 09:16:11PM +0200, Petr Menšík wrote:
>
>> - first is libc interface getaddrinfo() provided by nss plugins. Names
>> can be resolved also by different protocols than just DNS. A good
>> examples might be MDNS (RFC 6762), LLMNR (RFC 4795) or Samba
>> (nmblookup). Standardized calls provide only blocking resolution interface.
> The nsswitch (originall from SunOS IIRC) indeed has limitations, but is
> unlikely to fade away quickly, especially because it also handles
> non-DNS queries (passwd, group, services, ...).
>
> That said, best practice for "hosts" has been essentially:
>
>      hosts: files dns
>
> with a very short /etc/hosts with just the local system address and
> name, perhaps only a loopback address at that.  But this was before as
> mention below the introduction of "mdns" et. al., and in some
> "enterprise" environments "samba" or similar.

Enterprise samba uses DNS. I think legacy network lookup would be used 
by older Windows systems or NAS storages. Maybe it is not so important.

I am quite sure MDNS is used often on Apple systems.

>
>> * Asynchronous interface does not exist in useful form. It is easy to
>> handle multiple connections in single thread, but multiple resolutions
>> in single thread are not supported. nss plugins are simple to write, but
>> hard to use in responsibe program. Should that be changed?
> Indeed asynchronous interfaces for some of these would be quite useful,
> and some dedicated libraries (alternatives to good-*old* libresolv)
> provide these for DNS specifically, but they are not ubiquitous, and
> would introduce completely new APIs undreamed of in SvID and POSIX.
>
>> * MDNS usually uses names under .local domain. What should be preferred
>> order of single label names, like 'router.'? Should be LLMNR tried
>> first, samba first or DNS search applied first? Should it avoid reaching
>> DNS when search domain is not set?
> I rather expect there is no one-size-fits-all answer, and so
> nsswitch.conf or equivalent is here to stay.  Sometimes one wants no
> "mdns" or similar at all.  The right answer for a laptop trying to
> locate nearby printers is rather different than the answer for a server
> racked in a datacentre.
Sure, mdns would have rare usage on server releases. dns-sd.org has link 
to DNS only implementation, I expect that would be a good way to have 
printers located from datacenters if required.
>> - primary interest for us is DNS protocol. On Unix systems it specifies
>> nameservers to use in /etc/resolv.conf also with some options. We would
>> like to offer DNS cache installed on local machine, which should
>> increase speed of repeatedly resolved names.
> Definitely, with DNSSEC validation, and (on laptops) perhaps support for
> probing of DNSSEC support when switching between WiFi networks, or
> opting in to a captive portal, so that DNSSEC is used when available,
> once the portal T&Cs etc have been dealt with, and real DNS servers
> (ideally) become reachable.
Sure. I have made a bit research of how Network Manager does it. It 
tries resolution via systemd-resolved, if not possible, falls back to 
normal resolution in curl library. I have not found any trace of DNSSEC 
support during it. I think it should disable validation during 
connection test phase. If captive portal is detected, keep it disabled. 
Also set maximum TTL to 1s or flush cache at captive portal passed 
event. I think cache flush would be required only if any name validation 
failed and captive portal were detected.
>
>> * I would like to have support for multiple interfaces and redirection
>> of names subtree to local network interfaces servers. For example
>> 'home.arpa' redirected to local router at home, but example.com
>> redirected to VPN connection.
> This is largely a laptop problem, but indeed the local caching
> nameserver could have appropriate stub zones defined, so that
> queries for "special" zones are sent to non-default servers.
I think also servers can have some use-cases, when some service provides 
link to internal company network. Also network maintained by libvirt od 
podman might have its own dnsmasq, which maintains names for started 
containers. They might want to map pods.example.com to podman or 
vms.example.net to VM. If it runs different networks, host should 
connect them and make then reachable. But this is different, because 
those networks are not received from different devices. Orchestration 
should be able to configure it appropriately.
>
>> I think RFC 8801 and RFC 7556 specify standardized way to list
>> interface specific domains. Existing implementations misuse RFC 2937
>> for a source of such list now. Something like this is implemented by
>> systemd-resolved on Ubuntu and Fedora systems. But it introduced
>> couple of new issues. Is something similar implemented on end user
>> machines? I think laptop and phones are typical devices with multiple
>> interfaces, where it would make sense.
> This is a complex question, that can't be answered in a brief email
> thread.  Designs need to be throught through, written up, debated,
> zealously supported, ignored, dismissed out of hand, ... :-) :-(
>
>> - how should single label names be handled?
> Local policy.
I have made some tests. nss plugins in glibc do not allow trying single 
label applied with search domain in dns. Then try other plugins like 
llmnr. Then as a last resort try resolution of this name on DNS with 
just the name without anything appended. I think this cannot be 
configured on today's Linux implementation.
>> -- is domain (opt. 15) and search (opt. 117) from DHCP already dead?
>> Should they be completely avoided even in trusted networks?
> They had some merit for moving laptops between corporate offices, but
> are problematic on shared public WiFi networks, so perhaps by now best
> ignored.  The hot new trend is to always leave mobile devides on
> "external" networks, even on prem.  And so you're always on the public
> Internet, where search lists from DHCP are not trustworthy.
Such trend has not reached my company yet. We have internal trusted 
network and guest network. Because guest network is still maintained by 
people I trust, I would trust dns search from DHCP. I admit I am not 
sure if clients are able to announce own DHCP server however.
>
>> -- in which order should be resolution tried? Should machine cache block
>> queries to single label hostnames not expanded to FQDN on DNS protocol?
> For getaddrinfo(3), indeed there is no reason to ask a TLD for its IP
> address, worst case is you actually get an answer!
Well, I am not sure this is sarcasm or not. Current systemd-resolved 
redirects single label to LLMNR lookup if that is enabled. Should DNS be 
asked before LLMNR even for single labels? Should local cache try to 
protect root servers from unnecessary queries? It seems windows 11 does 
apply search first, then LLMNR. I am not sure whether it tries also 
single label query as last step.
>
>> -- I have seen usage of search domains on cloud technologies. Is there
>> common example what they are used for? Do we need ndots option with
>> value different from 1?
> For managed servers in datacentres there is some plausible value in
> search lists, but anything with more than one element used to qualify
> names with more than zero dots has dubious semantics in the face of
> timeouts, or other transient failures.  Should these abort the query,
> or occasionally seek the answer in what could be the wrong place?
>
>> - should we expect DNSSEC capabilities on every machine?
> This is probably not the right way to ask the question.  Some
> datacentre machines will want DNSSEC all the time, some never.
> As for laptops, it rather depends on whether they start "trusting"
> HTTPS and SVCB indirection without using DoH or DoT to tunnel to a
> "trusted" resolver, and how much impact forgery of these and future
> security-relevant DNS data will have.
>
>> -- should we even enable DNSSEC validation on every machine by default?
>> When it would be good idea and when it wouldn't?
> Ideally yes, if well designed to downgrade gracefully in captive
> portals, when switching wireless networks, in ways that are clear to the
> experienced user (perhaps a menubar icon) and not intrusive to the
> novice.  And of course does not silently and unexpectedly degrade to
> insecure (without the user involved in selecting a new network).
I would display a DNSSEC support somewhere in connection status windows 
only. I do not think DNSSEC would be as important captive portal 
indication or limited connectivity. I would like to implement captive 
portal support in NM, which would allow also DNSSEC. But at least 
dnssec-trigger implementation reported annoying error every time I got 
disconnected. I am sure this is not the correct way. It has to be able 
to guess when DNSSEC is failing and when whole connection does not work.
>
> Of course for DNSSEC on end-user (rather than datacentre) devices to
> matter, support would need to be broader than just Linux laptops.  It
> would need to ultimately encompass MacOS, Windows, Android, iOS, ...
> But some vendor would to be the first mover.
Understood. But I am not able to influence those.
>> - should asynchronous API be prepared for common name to addresses and
>> vice versa? One which would support both local network resolution and
>> unicast DNS in easy to use way? Usable even in GUI applications without
>> common hassle with worker threads?
> Would be nice.  OpenBSD may IIRC have code in that space, don't know
> whether it is a good model to emulate, or a lesson to avoid.
Interesting, would you be able to find a documentation for it?
>
>> If there is documentation for name subtree mapping to interface servers
>> on different systems, I would be glad if you could share links to it. If
>> we should improve current situation, I would like to first gather
>> expected requirements for such system. Is there some summary already?
> I guess this can be about whether a VPN is active or not, and whether
> then to route some DNS queries to some server over the VPN.  Some
> enterprises build this sort of thing in managed OS images for employee
> laptops, with each company developing or buying custom software for the
> managed user experience.  Perahps some part of this s ripe for inclusion
> in the base vendor OS.

If the administrators configure whole device and its every part, then it 
should be easy to do it. But at least in my company we can use the 
device also for private usage. I would like to be able to be connected 
to workplace from one VPN connection, connected to my home with another. 
This should be reasonably simple to configure even to advanced users, 
without complicated ansible script or similar things to configure such 
things.

Even more demanding it would become, when I am at a train wifi, which 
provides some services to its passengers available on some known name. 
Or my employer wants to scan my whole traffic for security 
vulnerabilities over VPN, but I would like to keep access to local 
printer or file storage at home. So all name queries would go to VPN, 
but only locally served resources would use local servers.

-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB