(OT) MD5 collisions (was Re: How to edit PDF?)

Mon Nov 28 21:51:24 UTC 2005

On Mon, Nov 28, 2005 at 04:09:05PM +0100, N Chosechu wrote:
> On 11/28/05, hometoast <hometoast at gmail.com> wrote:
> > Which is a good reason to use both md5 and sha1.  the two together will be
> > more than sufficient as the chances of both colliding are nil.
> 
> Unfortunately this does not seem to be the case. There is a famous paper
> by Joux about Multicollision, describing how to attack several
> hashes simultaneously if you know how to break them separately.
> You may want to have a look at the following discussion on Slashdot:

Indeed. To save you (hometoast) the trouble of working through the
maths, finding a simultaneous collision for MD5 and SHA-1 turns out to
be at most about 64 times as hard as finding a collision for both SHA-1
alone; those extra six bits of collision-resistance are typically not
worth the additional runtime of verifying both hashes, and the illusion
of additional security may be very dangerous. You're far better off just
using SHA-256 or better.

> > IMHO the md5 collisions are not really going to have an impact in
> > every day use; (offtopic) imagine the case where md5 is not "valid"
> > AND source includes some sort of payload.
> 
> MD5 offers lots of interesting features as a hash function (e.g. for
> checksums), it is easily implemented and available on most platforms,
> so will survive for a number of non-cryptographic applications.
> But as mentioned before, MD5 is dead for cryptographic signature. This
> means that package signing or document fingerprinting should not be
> based on MD5 any more.

There are still no second-preimage attacks on either MD5 or SHA-1 (i.e.,
given an MD5 hash, it's not yet generally possible to find an arbitrary
text with that hash), although I think it would be unwise to assume that
they're very far off for MD5. SHA-1 is in a similar situation, just a
year or two behind.

That said, the two examples you bring up are qualitatively different. As
NIST noted in March in their statement on the SHA-1 collision attacks
(http://www.csrc.nist.gov/pki/HashWorkshop/NIST%20Statement/Burr_Mar2005.html):

  "However, many digital signature applications include contextual
  information that will make this attack difficult to carry out in
  practice."

In other words, to substitute a .deb in a Debian-format archive you need
to ensure that the substituted text is a valid .deb with malicious
contents, which will involve producing a substitute gzip stream. I
wouldn't like to claim that this is anywhere near invulnerable, but it
certainly raises the bar considerably.

Attacking package signatures also typically involves second-preimage
attacks, not collision attacks. To construct a collision attack on
package signatures, you either need to find a naïve sponsor who'll sign
a package without rebuilding the source locally (rebuilding always
changes the .diff.gz hash because of the timestamps in gzip's output) or
social-engineer yourself into a position where you can upload, in which
case you could almost certainly attack the package directly in other
ways that would give you plausible deniability.

Attacking document fingerprinting probably involves much fewer
contextual problems than attacking package signatures, and furthermore
there are many more interesting attacks you can carry out from
relatively unprivileged positions using only collision attacks rather
than second-preimage attacks; as a general rule, people are more willing
to forward documents that somebody else wrote for them than developers
are willing to sign packages that somebody else built for them.

Cheers,

-- 
Colin Watson                                       [cjwatson at ubuntu.com]