[dns-operations] Interesting messages in our logs

Sun Nov 2 00:37:14 UTC 2014

On 11/01/14 18:10, Paul Vixie wrote:
>
>
>> Lyle Giese <mailto:lyle at lcrcomputer.net>
>> Saturday, November 01, 2014 1:41 PM
>>
>> On 11/01/14 12:21, Paul Vixie wrote:
>>>
>>>
>>>> Stephane Bortzmeyer <mailto:bortzmeyer at nic.fr>
>>>> Saturday, November 01, 2014 8:49 AM
>>>> On Sat, Nov 01, 2014 at 10:10:07AM -0500,
>>>>   Lyle Giese<lyle at lcrcomputer.net>  wrote
>>>>   a message of 23 lines which said:
>>>>> Oct 31 04:10:52 linux1 named[2899]: client
>>>>> 2607:f8b0:4001:c07::151#61651: no more TCP clients: quota reached
>>>>
>>>> If you wish to handle this amount of requests, you can raise
>>>> the tcp-clients parameter.
>>>>
>>>> options { tcp-clients 300; };
>>>
>>> there is no number you can insert here, including the largest number 
>>> your OS can support, such as 2^16, which will make your tcp listener 
>>> robust in the face of attacks. even if both sides of a non-attack 
>>> flow (so, client and server) fully implemented the recommendations 
>>> of <https://tools.ietf.org/html/draft-dickinson-dnsop-5966-bis-00>, 
>>> intentional tcp state exhaustion will remain a viable attack vector.
>>>
>> While interesting and I learn from discussions like this, it doesn't 
>> answer my original question.  When Named goes into SLIP via UDP 
>> queries, the other party should(and did) retry using TCP.  What 
>> happens when we throttle via TCP, like above?  Does NAMED just drop 
>> the connection? Or does it send back a meaningful error message or 
>> status of some sort?
>
> first, SLIP isn't a mode, it's a ratio. "SLIP X;" means every X'th 
> drop will be turned into a TC=1 response. so, higher values for X 
> result in fewer TC=1 responses (since 1/N > 1/M for all M > N). from 
> your syslogs, you would benefit from more drops and fewer slips, which 
> is why i advised you to increase your slips to at least 5, and be 
> willing to consider 8 or 10 if the problem persists.
>
> second, you're not throttling via TCP, you're doing two 
> related-but-not-that things. you are encouraging TCP by deliberately 
> mixing slip frames (TC=1) in with your rate-related intentional drops. 
> second, you're seeing more TCP traffic than you have state quota for. 
> these are probably related, but need not be related. if i were 
> attacking you i would make sure you had lots of TCP sessions open at 
> the same time that i blasted you with UDP in a way that forced you to 
> start sending TC=1 responses. all i have to do is ask google dns over 
> and over for answers it won't have in cache and that won't fit in UDP 
> responses. (you can only stop me by ensuring that IP fragmentation and 
> therefore EDNS are working to your authority servers, which is often 
> not under your control. BWA HA HA HA!) i can also just open a lot of 
> TCP sessions from other parts of the internet, since it's unlikely 
> that you have per-client-ip flow quotas, and i can just ask lots of 
> questions, returning ACK's very slowly, tar-pit style. i'll never be 
> idle, and if you RST or FIN me i'll just call back.
>
> to precisely answer your question, in the log snippet you showed (no 
> more TCP clients: quota reached) there is no meaningful error message 
> or status to send. you could send SERVFAIL but it's more likely you're 
> just RST'ing. neither one is meaningful to the far end. you have no 
> choices available to you which will be meaningful to the far end. (as 
> i said, BWA HA HA HA).
>
> happy belated all-hallows eve, everybody.
>
> -- 
> Paul Vixie
I don't think I showed the logs from Named that show RRL had kicked in 
(rate limit slip NXDOMAIN response to 2607:f8b0:4001:c00::/56) and also 
for a google IPv4 address during this same time frame.

The response to the rate limit slip, I understand.  TC=1 is telling the 
other side to try again using TCP.

Now on the TCP side, I am seeing 'no more TCP clients: quota reached'.  
I am still not clear on what the other side is told or what it thinks is 
going on at this point.  My thought process/logic is that a TCP Reset 
should be handled above the application layer and that an application 
could not create/generate that.  Of course, I can be wrong on this point 
and would welcome corrections.

Now that I have written this, I suspose it's within reason that an 
application can tell the upper layers, I am to busy to handle that 
request, drop the TCP session and those layers use the RST function to 
accomplish that.

But it sounds more and more like an DoS attempt against my authoritative 
servers by exhausting TCP resources.  And they were utilizing Google's 
servers to assist in that process.  Not at all sure what the heck they 
would be trying to accomplish hitting up my servers.  And I would assume 
that it would take a lot more to accomplish a DoS attack against 
Google's servers this way!  But it's an excellent way for the attackers 
to hide their identity from me as I see only Google's IP addresses in 
the requests.

Lyle Giese
LCR Computer Services, Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20141101/0f9c3a5d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20141101/0f9c3a5d/attachment.jpg>