[dsc] Upcoming DSC 2.4.0

Jerry Lundström jerry at dns-oarc.net
Tue Jan 17 08:43:32 UTC 2017


Hi all,

Here is an update of the stability work being done for version 2.4.0 .

Our build platforms are now running dsc from the develop branch and is
automatically updated and restarted on changes to that branch. Several
tests has been made that run frequently to monitor that dsc is working,
the Jenkins jobs, status and config and scripts can be found here:

  https://dev.dns-oarc.net/jenkins/view/dsctest/
  https://github.com/DNS-OARC/dsctest

The platforms runs an instance of NSD that answers for an example.com
zone and there is another VM running that generates 10 QPS against all
platforms. The platforms are currently Ubuntu 16.04, Debian 8, CentOS 7,
FreeBSD 11 and OpenBSD 6.

I will try and outline all changes made below and the plan is to keep
monitoring dsc this week and make a release next week if all goes well.

On 01/05/17 14:28, Jerry Lundström wrote:
> Note, -T can be used as a workaround for the current release.

Threaded functionality is now default disabled, to use it it must be
enabled during configure:

  ./configure --enable-threads

Thread safe versions of libc functions ( _r ) are now also in use.

> Reported Issues
> 
> - In rare occasions the forked process that writes the output file got
> stuck waiting for a mutex within the NSS library.  The effect would be a
> loss of data at that interval period and a stuck process.

Since the disabling of threads this has not occurred on the platforms.

> - DSC 2.3.0 on Debian Jessie would stop writing output files after days
> of running, have yet been able to replicate this problem but it was
> clear that the reporter moved to the threaded version and that the
> non-threaded version worked before.

The Debian Jessie platform has been running dsc for over 1.5 weeks now
without stopping.

> - Inconsistency with the number of packets captured if running with
> threads vs running without (-T), this has been noticed while debugging
> the first issue.

A lot of timing tweaks has been made and since a couple of patches
yesterday all platforms are now reporting correct number of packets
captured.

- Wrong number of packets during start up have been resolved and was
related to initializing pcap and the interval sync that happens after,
because of this packets could be buffered by the system and you would
see a spike of packets in the first output.

- "Kernel dropped" that was noticed on some platforms, which indicates
that system was unable to process the packets in time, has been resolved
with the new option "pcap_buffer_size" which sets the internal buffers
of pcap.

It will be recommended that people with high QPS set this relatively
high in order to not miss packets during the window which dsc writes
it's output, on the build platforms this takes about 0.2 - 0.5 ms.

- Wrong number of packets captured and start_time off by one has been
resolved, this had to do with the timings.

First off the interval sync was only using seconds when calculating how
long to sleep, this has been changed to microseconds precision.

Secondly the interval periods where running for x seconds which could
mean that it ran over the next period, this has been changed to running
until an absolute time which have shown that on our build platforms it
is stopping within 1-50 milliseconds precision.

Will send another update at the end of this week.

Cheers,
Jerry


More information about the dsc mailing list