[dns-operations] Open source release of the DNS-STATS Compactor

Jim Hague jim at sinodun.com
Thu Jun 22 17:49:05 UTC 2017


On 22/06/2017 14:06, Jerry Lundström wrote:
> On 06/22/17 11:08, Jim Hague wrote:
>> and you won't see the expected number of packets. It looks to me that if
>> there's more a gap of about 50 milliseconds or more between packets,
>> you'll lose the first packet after the gap.
>>
>> Judging by the timing of the printed output, a waiting select() on the
>> PCAP handle exits on receipt of a new packet, but a subsequent call to
>> to pcap_next_ex() or pcap_dispatch() does not find the packet.
> 
> I ran this on a Debian 8.7 VM and your test code does not capture all
> packets when using dispatch().

Good. So we're seeing the same thing, then.

> I then downloaded pcap-thread and ran hexdump (small example program
> included in the repository) without threads which means it does
> select()/dispatch() and it saw all packets.

OK. I grabbed it, built it, and observed apparently the same as you.

> So my guess is that you've missed something and hopefully this is not a
> major issue.

Well, I'd be somewhat surprised if I'd managed to make some silly
mistake using libpcap which was fixed by changing Linux kernel version.
My example program is all of 137 lines, a lot of which is argument
wrangling and setup, so it should be easy to see something dumb happening.

Anyway, I did a bit of digging. And, thanks largely to the magic of
strace (ploughing through your 3000+ line single source file library
wasn't going to help much), I've found the difference between the two
programs that causes my test to drop ping packets and yours not.

Both programs have a central loop that is basically:

    pcap_dispatch()
    select(pcapfd, timeout)

For the select() timeout, you happen to (usually) use the same timeout
given to pcap_set_timeout(). I set an arbitary 2s timeout for the
select. Result: I drop, you don't.

If, on the other hand, I set the select() timeout to 1ms and also set a
pcap timeout of 1ms, I behave like your program. And if I change the
timeout on your select to 2s and specify a pcap timeout of 1ms, you drop
just like me.

It also turns out that if we leave the select() timeout at 2s, and set
the pcap timeout to 2s as well, dropping doesn't (seem to) happen.

I can see no reason that the select() timeout should matter here, and
that to my mind that it doesn't with later kernels suggests strongly
that it shouldn't and that something bad is happening at the kernel
level. I certainly can't find any docs suggesting that if your loop
includes a select() it should have a timeout the same as any pcap timeout.

So, in summary, on old Trusty kernels and current Jessie, if you are
using a pcap loop with select(), set a pcap timeout and set the select()
timeout to the same period. It looks like it makes things better; I
can't be sure it cures the problem.
-- 
Jim Hague - jim at sinodun.com          Never trust a computer you can't lift.



More information about the dns-operations mailing list