On concatenating hashes

This post was prompted by a twit from Didier Stevens about a presentation from Bart Preneel at the OWASP Benelux conference. On slide 17, citing a 2004 paper from Joux, Dr Preneel writes that “the concatenation of 2 iterated hash functions is as most [sic] as strong as the strongest of the two (even if both are independent)“. This was widely understood as “concatenating hashes results in an equal or lower security”, which is not really correct as this post will show.

Hash basics
A hash function is an algorithm which takes variable size input message and returns a fixed size output. For a good hash, we want three properties:

• collision resistance: it is very difficult to find two input messages which hash to the same value.
• preimage resistance: given a hash value, it is very difficult to find a message which hashes to this value.
• 2nd preimage resistance: given a message, it is very difficult to find another message which hashes to the same value.

For an ideal n-bits hash, that is a hash against which no better method than exhaustive search is known, the complexity of finding a collision is $2^{\frac{n}{2}}$ (due to the birthday paradox), and the complexity of finding the first and second preimages is $2^n$. When concatenating two ideal hashes of n and m bits, the complexity of the combined hash becomes respectively $2^{\frac{n}{2}+\frac{m}{2}}$ for collisions and $2^{n+m}$ for preimages. Finally, a k-multicollision for a given a hash function is a set of k messages which hash to the same value. The complexity of finding a k-multicollision for an ideal n-bits hash is $2^{\frac{n(k-1)}{k}}$.

Iterative hashes
Now let’s consider an ideal iterative n-bits hash function (also known as a Merkle–Damgård construction), which has the following structure :

source: wikipedia

With such a function, the hash value is calculated by repeatedly iterating a compression function f over a combination of a message block and the previous output of the compression function. The first iteration of the compression function uses the first message block and a fixed initialization vector (IV). Nearly all hashes created before 2004 use this structure, including MD5, SHA-1, SHA-256, SHA-512, RIPEMD-160 and Tiger.

Joux’s multicollision attack
In his paper, Joux shows that the iterative structure of a hash actually helps an attacker to find multicollisions. In short, he proves that finding a $2^t$ multicollision (that is, $2^t$ messages which hash to the same value) only takes t times the efforts needed to find one collision. This result has a huge impact on the security of concatenated hashes, as Joux describes in the following two attacks.

Let’s consider two ideal iterative hash functions F and G of respective output lengths m and n bits (n >= m). One can find collisions in F||G in two steps :

1. create a $2^{\frac{n}{2}}$ multicollision in F, which has a complexity in the order of $\frac{n}{2}2^{\frac{m}{2}}$.
2. hash these $2^{\frac{n}{2}}$ messages with G and look for a collision. Due to the birthday paradox, there is a good chance one will occur.

For a preimage attack, given two hash values Hf and Hg, we look for a message M such that F(M)=Hf and G(M)=Hg. 3 steps :

1. create a $2^n$ multicollision in F, which has a complexity of $n2^{\frac{m}{2}}$
2. by exhaustive search, find a last block to append to one of the previous messages to obtain the desired F(M) hash value. With complexity $2^m$, this gives $2^n$ messages which hash to the desired F(M) value
3. by exhaustive search, hash these $2^n$ messages and look for the desired G(M) value. Due to the birthday paradox, there’s a good chance one will occur

In the end, the complexity of finding a collision in the combined hash is in the order of $\frac{n}{2}2^{\frac{m}{2}}+2^{\frac{n}{2}}$ and the complexity of finding preimages is $n2^{\frac{m}{2}}+2^m+2^n$, which is much lower than the theoretical values. Furthermore, it is important to note that only F was required to be iterative for these attacks, no assumption was made about G to build these attacks. For instance, let’s consider two 160 bits hashes which have respective strengths of $2^{80}$ against collisions and $2^{160}$ against preimages : by concatenating these two hashes (thus obtaining a 320 bits hash value), the respective strengths become about 87 bits against collisions and 161 bits against preimages.

Last words
Contrary to the general interpretation of this OWASP presentation, concatenating two good hash functions (with a least one of them being iterative) does not decrease the security, but brings a very small improvement. However, two important questions remain:

1. Joux’s attack uses the birthday paradox on ideal hashes, but what if another effective attack against one of the hashes exists ? Could concatenating a broken hash with a bad one yield a weaker security than the good hash alone ?
2. If concatenation is not effective, are there better ways to combine hashes so that the resulting combination is much stronger than any of the two ?

Any idea ? 🙂

1 Comment

1. Combining cryptographic hashes, part 2 « Sam280's Blog said,

January 14, 2010 at 10:05 am

[…] 14, 2010 at 10:05 am (cryptography) In a previous post I discussed Joux’s multicollision attack and how it can significantly decrease the efficiency […]