Combining cryptographic hashes, part 3

In previous posts I discussed how Joux’s multicollision attack significantly decreases the efficiency of concatenating hash functions and how it can be used to break the birthday limit of md5. Now the obvious remaining question is: if concatenation is not effective, are there better ways to combine two hash functions ? As often in cryptography, it is very easy to come up with a solution which is neat, simple, and wrong: for instance, would you have said that F(M) || G(F(M) \oplus M) is insecure ? Luckily the smart people at the Emmy Noether Research Group give us two examples of good combiners.

Hash-tree based concatenation
In a paper from 2007, Fischlin and Lehmann introduce a security-amplifying combiner based on the concatenation combiner. Their idea is quite simple : to build a 2^n multicollision on an n-bits iterative hash function, Joux’s attack requires n consecutive calls to the compression function. In other words, the colliding messages must have a size equal to or larger than n * B bits, where B is the message block size of the hash function.

Conversely, limiting the size of the input messages would also decrease the number of multicollisions found by Joux’s attack, and therefore the probability of finding a pair of messages which collide in both hash functions. For instance, for messages of size n/4 blocks, Joux’s attack can only build a 2^{\frac{n}{4}} multicollision, and the probability of finding a pair of colliding messages drops to 2^{-\frac{n}{2}} .

Input messages obviously have an arbitrary size, so to ensure that the concatenation combiner H = F || G only process a small number of message blocks, a hash-tree structure can be used. For instance, if we decide to break the message in groups of 4 blocks the hash-tree would look like this:

The size of the groups is a tradeoff between security and speed, but you should be safe as long as you keep the groups smaller than n/4 blocks.

The C_4p combiner
In 2009 the same Fischlin and Lehmann introduced a new family of secure combiners. The basic combiner of this family is called C_4p , and it has the following construction:

where:
H_0 and H_1 are the two hash functions to combine
\oplus is the XOR operator
H_\oplus = H_0 \oplus H_1
H_k^i(M) = H_k(i || M) , in other words the binary representation of i
is preprended to the message M before hashing with H_k (k=0 or 1)
H_\oplus^i(M) = H_0^i(M) \oplus H_1^i(M) = H_0(i||M) \oplus H_1(i||M)

With a combination of XOR operations, concatenations and permutations, the C_4p combiner outputs 2n bits hash values (same length as the classical concatenation combiner) and is robust against collisions. In their article the authors describe several modifications of C_4p which provide additional properties, such as pre-image robustness and indifferentiability from random oracles.

Last words
In march Fischlin and Lehmann will present at the CT-RSA 2010 conference a paper titled “Hash Function Combiners in TLS and SSL”, it should be fun 🙂

Advertisements

Combining cryptographic hashes, part 2

In a previous post I discussed Joux’s multicollision attack and how it can significantly decrease the efficiency of concatenating hash functions. Today I will discuss a recent paper (December 2009) by Mendel, Rechberger and Schlaffer, which shows how specific attacks against two hash functions can be used against the concatenation combiner. Using this attack, the authors can find collisions in md5||SHA1 with a complexity around 2^{59}, thus breaking the 2^{64} birthday limit.

First, let’s restate the old result : the cost to find collisions in F||G with Joux’s multicollision attack is \frac{n}{2}.2^{\frac{m}{2}} + 2^{\frac{n}{2}} where m and n are the respective output lengths of F and G. This total is calculated by adding the cost of finding a 2^{\frac{n}{2}} multicollision in F (\frac{n}{2}.2^{\frac{m}{2}}) and the cost of hashing these 2^{\frac{n}{2}} messages to find a collision in G. Now even if there were a better attack against F, this would decrease the first term of the total cost, but not the second one (the verifying step). As a result the total cost would always be higher than 2^{\frac{n}{2}}, so Joux’s attack is clearly not sufficient to break the birthday limit.

Let’s now see the new attack, and to simplify things let’s consider that m=n. The new attack is based on two assumptions:

  • F suffers from a type-1 collision attack, meaning that given a message block m0 we can find m1 and m1′ such that F(m_0||m_1)=F(m_0||m_1\prime) with a complexity lower than 2^{\frac{n}{2}}
  • G suffers from a type-3 collision attack, meaning that given two message blocks m2 and m2′ we can find m3 such that G(m_2||m_3)=G(m_2\prime||m_3) with a complexity lower than 2^{\frac{n}{2}}

The attack then goes as follows:

  1. Using the type-1 collision attack in F, find k pairs of message blocks (m_1,...,m_k) (m_1\prime,...,m_k\prime) such that for any i \leq k, F(m_1||...||m_i)=F(m_1\prime||...||m_i\prime). The value of k depends on the size and structure of G. Following Joux' attack, this effectively gives a 2^k multicollision on F, in other words 2^k messages which hash to the same value through F.
  2. Using these 2^k messages as prefix, use the type-3 collision attack on G to find a message block p such that G(m_1||...m_k||p) = G(m_1\prime||...m_k\prime||p). A large part of the paper describes how to perform this attack in an efficient way, using a combination of birthday techniques and differential shortcut techniques.
  3. Finding p also gives a collision on F, as we append p to a known pair of colliding messages for F.

This is the first documented attack where a weakness in two hash functions can be combined to break the birthday limit of the concatenation, and it will be interesting to see if it can be applied to other hash functions.

On concatenating hashes

This post was prompted by a twit from Didier Stevens about a presentation from Bart Preneel at the OWASP Benelux conference. On slide 17, citing a 2004 paper from Joux, Dr Preneel writes that “the concatenation of 2 iterated hash functions is as most [sic] as strong as the strongest of the two (even if both are independent)“. This was widely understood as “concatenating hashes results in an equal or lower security”, which is not really correct as this post will show.

Hash basics
A hash function is an algorithm which takes variable size input message and returns a fixed size output. For a good hash, we want three properties:

  • collision resistance: it is very difficult to find two input messages which hash to the same value.
  • preimage resistance: given a hash value, it is very difficult to find a message which hashes to this value.
  • 2nd preimage resistance: given a message, it is very difficult to find another message which hashes to the same value.

For an ideal n-bits hash, that is a hash against which no better method than exhaustive search is known, the complexity of finding a collision is 2^{\frac{n}{2}} (due to the birthday paradox), and the complexity of finding the first and second preimages is 2^n. When concatenating two ideal hashes of n and m bits, the complexity of the combined hash becomes respectively 2^{\frac{n}{2}+\frac{m}{2}} for collisions and 2^{n+m} for preimages. Finally, a k-multicollision for a given a hash function is a set of k messages which hash to the same value. The complexity of finding a k-multicollision for an ideal n-bits hash is 2^{\frac{n(k-1)}{k}}.

Iterative hashes
Now let’s consider an ideal iterative n-bits hash function (also known as a Merkle–Damgård construction), which has the following structure :

source: wikipedia

With such a function, the hash value is calculated by repeatedly iterating a compression function f over a combination of a message block and the previous output of the compression function. The first iteration of the compression function uses the first message block and a fixed initialization vector (IV). Nearly all hashes created before 2004 use this structure, including MD5, SHA-1, SHA-256, SHA-512, RIPEMD-160 and Tiger.

Joux’s multicollision attack
In his paper, Joux shows that the iterative structure of a hash actually helps an attacker to find multicollisions. In short, he proves that finding a 2^t multicollision (that is, 2^t messages which hash to the same value) only takes t times the efforts needed to find one collision. This result has a huge impact on the security of concatenated hashes, as Joux describes in the following two attacks.

Let’s consider two ideal iterative hash functions F and G of respective output lengths m and n bits (n >= m). One can find collisions in F||G in two steps :

  1. create a 2^{\frac{n}{2}} multicollision in F, which has a complexity in the order of \frac{n}{2}2^{\frac{m}{2}}.
  2. hash these 2^{\frac{n}{2}} messages with G and look for a collision. Due to the birthday paradox, there is a good chance one will occur.

For a preimage attack, given two hash values Hf and Hg, we look for a message M such that F(M)=Hf and G(M)=Hg. 3 steps :

  1. create a 2^n multicollision in F, which has a complexity of n2^{\frac{m}{2}}
  2. by exhaustive search, find a last block to append to one of the previous messages to obtain the desired F(M) hash value. With complexity 2^m, this gives 2^n messages which hash to the desired F(M) value
  3. by exhaustive search, hash these 2^n messages and look for the desired G(M) value. Due to the birthday paradox, there’s a good chance one will occur

In the end, the complexity of finding a collision in the combined hash is in the order of \frac{n}{2}2^{\frac{m}{2}}+2^{\frac{n}{2}} and the complexity of finding preimages is n2^{\frac{m}{2}}+2^m+2^n, which is much lower than the theoretical values. Furthermore, it is important to note that only F was required to be iterative for these attacks, no assumption was made about G to build these attacks. For instance, let’s consider two 160 bits hashes which have respective strengths of 2^{80} against collisions and 2^{160} against preimages : by concatenating these two hashes (thus obtaining a 320 bits hash value), the respective strengths become about 87 bits against collisions and 161 bits against preimages.

Last words
Contrary to the general interpretation of this OWASP presentation, concatenating two good hash functions (with a least one of them being iterative) does not decrease the security, but brings a very small improvement. However, two important questions remain:

  1. Joux’s attack uses the birthday paradox on ideal hashes, but what if another effective attack against one of the hashes exists ? Could concatenating a broken hash with a bad one yield a weaker security than the good hash alone ?
  2. If concatenation is not effective, are there better ways to combine hashes so that the resulting combination is much stronger than any of the two ?

Any idea ? 🙂