[dns-operations] Robert Graham - A Quick Review of the BIND9 Code.

Thu Jul 30 21:58:31 UTC 2015

Mukund Sivaraman wrote:
> On Thu, Jul 30, 2015 at 10:43:02AM -0700, Paul Vixie wrote:
> > he's completely wrong about turning off assertions in
> > production-compiled code, and in his comments about the performance
> > requirements.
> 
> The recent CVEs have been about assertions we've been hitting in
> production uses. Though we have decent test coverage and are
> continuously adding more tests, even code that is covered isn't
> exercised in every way it can be in production because of the numerous
> behavioral states that can exist reacting to network conditions.

Hi, Mukund:

CVE-2015-5477 is credited to Jonathan Foote, who published this article
on July 21:

    https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop

    [...]

    Footnote: So where are the bugs?

    The Knot team was already using AFL to fuzz a subset of their server
    logic (and I've sent a persistent mode patch upstream to them), but
    it’s worth noting that I was able to find vulnerabilities in another
    popular DNS server in a matter of hours on a low-end PC using this
    technique (we're coordinating a fix with the upstream vendor so I'm
    holding off on discussing that for the time being).

    [...]

Can you confirm that American Fuzzy Lop was the tool that was used to
discover this vulnerability?

Is the ISC team working on adding AFL fuzzing to the BIND development
process?  My limited understanding of the afl-fuzz results that have
been published so far when run against various software is that finding
a vulnerability in a matter of hours on a low-end PC counts as low
hanging fruit.  Some vulnerabilities have been found in other software
with afl-fuzz only after long periods of fuzzing on clusters of
computers.

The fact that the Knot DNS team is proactively using AFL against their
implementation is a big plus for Knot, and I hope to see it adopted by
other DNS vendors.

> Such assertion failures show that they are required, but some are just
> programming bugs which can exist in any codebase that changes over time.

It seems to be exactly Rob Graham's point that mere firing of an
assertion is not proof that the assertion was required.

Here is the assertion that was fired:

    https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=blob;f=lib/dns/message.c;h=b752da61b104c9065935cd85e4f365ea9b40130f;hb=dbb064aa7972ef918d9a235b713108a4846cbb62#l2352

    2351         if (name != NULL)
    2352                 REQUIRE(*name == NULL);

However, the correctness of the dns_message_findname() function doesn't
depend on the value of *name.  The function only writes to *name.
Requiring that callers always set *name to NULL is thus just busywork.

It looks like this bug was fixed by satisfying the
dns_message_findname()'s "REQUIRE" in the caller:

    https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=commitdiff;h=dbb064aa7972ef918d9a235b713108a4846cbb62;hp=faa3b61828dc2c6b92b68cd6e603fe2b9a7d5fdc

It could just as easily have been fixed by removing the "REQUIRE" in the
callee.

> Paul likely knows this, but for the sake of others: The assertions in
> BIND 9 are mostly of two kinds, REQUIRE() and INSIST(). REQUIRE()s are a
> design-by-contract style assertion that require some preconditions to
> exist when calling into a function.

Well, they're both ultimately just wrappers for abort(), so I'm not sure
the fine points of the D-b-C stuff matters that much.

In a different programming language with a stronger type system where
these constraints could be enforced at compile time rather than run
time, they would be a lot more useful.

> When an assertion fails, the process dies, but this design-by-contract
> method has saved our skin from worse conditions in all these recent
> vulnerabilities. named does an abort that it controls, and there is no
> chance of privilege escalation or remote code execution. This is vastly
> better than not having the assertions.

I agree that a controlled abort() is a lot better than granting control
to a remote attacker.  But it's not an either-or.  There's a big
difference between interfaces that are "hard to use" versus "hard to
misuse".  See, e.g., Rusty Russell's "How Do I Make This Hard to
Misuse?":

    http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html

> There is some discussion internally of changing _some_ things from
> assertions to code that recovers, such as out of memory conditions so
> that named continues in the face of trouble. But this is being argued
> both ways, so it's unlikely to be changed.

Due to VM overcommit, processes running on modern systems are more
likely to be unable to recover from out-of-memory conditions.  That is,
instead of malloc() returning NULL, you're much more likely to have the
kernel simply kill your process when it's unable to satisfy a memory
write.  Trying to recover from OOM is likely a lot of wasted effort;
abort()'ing if malloc() fails is very reasonable, IMO.

> C code is like a controlled nuclear reactor. We know how to control it
> well, and it's necessary to have it in C for many reasons (let's not
> even begin that flamewar), but rarely, things fall over and get fixed
> very quickly.

This is an amazing metaphor, and it can be twisted around to show the
opposite point: there is a huge cost to performing an unexpected shut
down of a nuclear reactor, so reactors are engineered with backup
systems and elaborate fallback procedures, monitored by highly trained
engineers, scrutinized by government regulators, etc., in order to
maximize availability/safety.  They certainly don't scram the reactor
core if someone burns popcorn in the breakroom microwave...

> We have a managed security process, there's a decent programming team
> behind BIND and it's continually improved and maintained.

-- 
Robert Edmonds