What makes MD5 such a bad hashing algorithm?


Edit: Asking more in the sense of what makes the algorithms process worse than others, sorry for the ambiguity.

In: Technology

All I know is that MD5 algorithm is easy to be abused. Technically hashing a text with another different text will result in different hash, but with md5, it is said that different text can be made to have a same hash.

Why is it bad? Let say you as a king order the messanger to send a peace letter to another kingdom, your message is hashed. You send your message and the hash (for integrity check), then the messanger tampered your massage to a war letter but that letter has the same hash with the peace letter. That message still recognized as a letter written by you because the letter and the hash matchs.

A couple of things:

1) It’s very fast. “How is that a disadvantage?” you may ask. Don’t we want computers to be fast? Well, not in this case. For one, you don’t need the speed. If your login takes 500 ms instead of 0.5 ms because of a slow hash function, you don’t really mind. On the other hand, being able to calculate a billion MD5 hashes per second instead of ten thousand increases the speed of brute force attacks significantly, making your hashes less secure.

This is bad in it’s own right, but it doesn’t even matter with MD5, because

2) It’s susceptible to so called “collision attacks”. A collision attack operates on a simple principle: since a hash function produces a fixed length output from variable length input, there will be multiple inputs with the same hash. If I can craft my own input with the expected hash, I can pass of forged things as valid.

By their nature, all hash functions have collisions, but for *good* hash functions finding these collisions should be no easier than just guessing. For MD5, it is *significantly* easier, making it broken by today’s metrics.

One of the primary ways to measure the strength of a supposedly cryptographically secure hashing algorithm is *collision resistance.* If hash has a 128-bit output (like MD5 does), it should take on average 2^(128-1) guesses before you find two values that hash to the same result. Since it was first introduced in 1991, vulnerabilities have been found that reduced the number of guesses required, and at the same time computer hardware improved so more guesses could be made per second. Last I heard there are done around 2^(20) ~= 1 billion guesses, which takes a few seconds on a modern GPU.

While some algorithms are clearly better designed than others, a lot of it comes down to novelty. People have been banging on MD5 for nearly 30 years, so it is no surprise the vulnerabilities were eventually found. All hashing algorithms are going to have some kind of vulnerability, it all comes down to an arms race between algorithm authors and cryptoanalysts.

Several of the other comments already say it’s possible to generate a hash collision. There are some specifics about this that I think are important: it is possible to create two messages with the same hash. It is not possible to create a message with a specific hash. So you create two documents with the same hash, but it’s not possible to create a document that matches the hash of another, given, document. You need to control and change both documents to create a collision. In other words, MD5 is still [preimage resistant](https://en.wikipedia.org/wiki/Preimage_attack), but not [collision resistant](https://en.wikipedia.org/wiki/Collision_resistance).