[dns-operations] IP address encryption: pseudonymization

Paul Hoffman phoffman at proper.com
Mon Feb 26 15:48:03 UTC 2018


On 26 Feb 2018, at 3:34, Jim Hague wrote:

> On 25/02/2018 15:59, Paul Hoffman wrote:
>> If only it was that simple. As you can see from the thread on CFRG,
>> there are other methods that do not have the inherent limitations of
>> ipcrypt: instead, they have different ones.
>>
>> The easiest one to describe is
>> truncate_to_32_bits(aes_128(message=padded_ipv4, 
>> key=128_bit_random)).
>> You cannot determine the key even with a huge number of known pairs.
>> However, you get collisions in the output. So, if you have 4 million
>> unique input addresses, about .1% of the output addresses will look 
>> like
>> one source of input when in fact they are two sources mixed together.
>
> Going for the above scheme while retaining the straightforward AES-128
> for IPv6 (as used in ipcipher) would mean that you can reverse the 
> IPv6
> pseudo-anonymisation given knowledge of the key, but you can't reverse
> IPv4.

That is true, but I believe it is irrelevant because only the system 
that is doing the anonymization should know the key. That is, an 
attacker cannot determine the key regardless of how many IPv6 
input/output pairs they know.

> This seems a little asymmetrical, and I imagine is the reason for
> the selection of ipcrypt in ipcipher. Or, in other words, how 
> desirable
> a property is reversibility when considering pseudo-anonymisation 
> schemes?

That is a good question. It would only apply to the system that is doing 
the anonymization: do they ever want to create the original data again?

> From an implementor's PoV, the above description would need some
> fleshing out - how exactly does one pad and truncate?

You can pad the IPv4 address by concatenating it to itself three more 
times, and you truncate by picking the first 32 bits.

> Does one pad with
> 0 values? Random values? Where is the padding inserted - more
> significant or less significant bytes? Similarly, which bytes are
> selected during truncation?

Padding with zeros causes problems in some uses, and padding with random 
values takes useless work. Truncation can be done anywhere.

> Not being a crypto specialist, I am more concerned as to whether any
> variations in the methods of padding and truncation would affect the
> security properties, and whether there are hidden traps awaiting the 
> naive.

There are some traps on padding that have been well-studied. By 
concatenating the input to itself, you don't introduce any new data to 
the message.

--Paul Hoffman



More information about the dns-operations mailing list