[dns-operations] creeping poorness of judgement

Sun Mar 15 00:34:54 UTC 2020

On Sat, Mar 14, 2020 at 11:35:51PM +0000, Paul Vixie wrote:

> > I know of no specifications that concatenate TXT RR strings with spaces.
> 
> And yet, RFC 1035 says this:
> 
> "<character-string> is expressed in one or two ways: as a contiguous set
> of characters without interior spaces, or as a string beginning with a "
> and ending with a ".  Inside a " delimited string any character can
> occur, except for a " itself, which must be quoted using \ (back slash)."

This text merely explains a limitation on the unquoted syntax of
*single* character strings in a TXT RR.  I says *nothing* about the
interpretation of multiple substrings.

> Ask yourself, if you were parsing a zone file, and you found a TXT RR,
> and it had spaces in it, what would you do?

Spaces in the presentation form of TXT RDATA that are not inside quotes
are *separators* between multiple character strings:

    IN TXT foo "bar baz" xyz == IN TXT "foo" "bar baz" "xyz"

> since the <character-string> cannot include interior spaces unless
> "quoted like this", what does <space> mean?

Bare spaces separate multiple character strings.

> It means each word is a character string, or else, spaces are a syntax
> error unless they are quoted.

We are in violent agreement on the semantics of bare spaces in
the presentation form of TXT RDATA.

> once that's decided, the ignorant wrongheadedness of the SPF
> interpretation becomes pretty egregious.

Here we part ways.  SPF assigns a meaning to a multi-character
string TXT RR by saying that for its purposes all the strings
are to be concatenated into a single string.  I think this is
entirely sensible.

> If it went in as "TXT foo bar" then it should be treated as "foo bar"
> even if the wire encoding of "TXT foo bar" is in fact ( "foo" "bar" ).

I don't see anything to support that conclusion.  It went in as two
separate character strings, there is no implied non-empty joiner.  The
fact that the presentation form happens to use spaces between the
elements of a compound RDATA structure is irrelevant.  Some other syntax
could have been used:

    example.org|IN|TXT|foo|"bar baz"|xyz

would you now argue that applications that want a single combine string
should concatenate with "|"?  The spaces in the presentation form are
syntax, not content.

> I realize that SPF has a massive installed base, and that an installed
> base gives one great market power. but it's still wrong, and should
> still be fixed. (BIND 4 had a 100% market share, but got this wrong,
> and so, got fixed.)

There is nothing wrong with the SPF interpretation, it is indeed
natural, because it allows the input string to be chopped up
into fragments without needing to care about where the breaks
land, and then to be faithfully reassembled.

Indeed this is by far the simplest and most natural way to represent a
long string as a list of bounded-length shorter substrings.

> > So my take is that applications that expect a single string per TXT
> > record should just join without inserting spaces, while applications
> > that expect multiple values can use the verbatim substrings without
> > concatenation.
> 
> If we're going to write an RFC to clarify this, it could say almost
> that. but I think the robustness principle calls for something
> slightly more subtle, and it doesn't have to do with inserting spaces.

Well I think that robustness calls for the simplest available encoding,
and the SPF encoding has the requisite simplicity.  Otherwise, all sorts
of fragile ad-hoc processing comes into play...

> If a multi-segment string is encountered by a TXT-consuming
> application, and if that multi-segment string can be unambiguously
> interpreted by the application as some machine-form instruction by
> concatenating the segments, then this should be done.  However, if
> such concatenation renders an ambiguous result (possibly meaningful
> but possibly erroneous) then the application should try to interpret
> each text segment as a separate word, that is, as if separated by
> whitespace characters. if the segments-as-words interpretation is less
> ambiguous or less erroneous, then this interpretation should prevail.

This is precisely the sort of ad-hoc complexity we must avoid.
What happens if there are 8 inter-string breaks (9 strings).
Do we now consider 256 possible ways to join the strings?

Do encoders need to look for spaces at which to break the strings, and
failing to find spaces in a long run of non-spaces hope that the
decoder on the other end will get it right?

Regrettably, despite the many other topics on which we agree, and the
fantastic generous help you're giving me with the DANE survey, I'm going
to side with the SPF approach on its merits.

> here's what i'm going with, by the way:
> 
> _spf                    TXT     ( 	v=spf1\032
> /				/2001:4f8::/32\032
> 				2001:559:8000::/48\032
> 				149.20.56.0/24\032
> 				24.104.150.0/24\032
> 				~all )

Well, you'd be much better off with the more readable, and
equally maintainable:

    @ TXT ( "v=spf1"
            " ip6:2001:4f8::/32"
            " ip6:2001:559:8000::/48"
            " ip4:149.20.56.0/24"
            " ip4:24.104.150.0/24"
            " ~all" )

With the qname changed to "@", since SPF clients do not prepend "_spf.",
and added "ip4:" and "ip6:" prefixes, AFAIK they're required.

> but i'd like to be able to remove those \032 workarounds in ~10 years.

I am compelled to disagree with the sentiment.

-- 
    Viktor.