Message Digest number 5 (MD5)
The MD5 (Message Digest number 5) algorithm generate a unique, 128-bit
cryptographic message digest value derived from the contents of input
stream. This value is considered to be a highly reliable fingerprint
that can be used to verify the integrity of the file's contents. If as
little as a single bit value in the file is modified, the MD5 checksum
for the file changes. Forgery of a file in a way that causes MD5 to
generate the same result as that for the original file is considered to
be extremely difficult.
A set of MD5 checksums for critical system, application, and data files
provides a compact way of storing information for use in periodic integrity
checks of those files.
Details for the MD5 cryptographic checksum algorithm and C source code are
provided in RFC 1321. The MD5 algorithm has been implemented in numerous
computer languages including C, Perl, and Java.
The Advanced CheckSum Verifier is an Windows GUI utility, which
generates and verifies message digests (digital signatures) using the MD5
algorithm. This program can be useful when necessary verifying of data
burned to CD-R(W), transmitted over network, or for file comparison,
and detection of file corruption and tampering.
History of MD5 as described in the RSA Laboratories Crypto FAQ.
MD2, MD4, and MD5 are message-digest algorithms developed by Rivest.
They are meant for digital signature applications where a large message has
to be "compressed" in a secure manner before being signed with the private
key. All three algorithms take a message of arbitrary length and produce a
128-bit message digest. While the structures of these algorithms are
somewhat similar, the design of MD2 is quite different from that of MD4 and
MD5. MD2 was optimized for 8-bit machines, whereas MD4 and MD5 were aimed at
32-bit machines. Description and source code for the three algorithms can be
found as Internet RFCs 1319-1321.
MD2 was developed by Rivest in 1989. The message is first padded so its
length in bytes is divisible by 16. A 16-byte checksum is then appended to
the message, and the hash value is computed on the resulting message. Rogier
and Chauvaud have found that collisions for MD2 can be constructed if the
calculation of the checksum is omitted. This is the only cryptanalytic
result known for MD2.
MD4 was developed by Rivest in 1990. The message is padded to ensure that its
length in bits plus 64 is divisible by 512. A 64-bit binary representation of
the original length of the message is then concatenated to the message.
The message is processed in 512-bit blocks in the Damgard/Merkle iterative
structure, and each block is processed in three distinct rounds. Attacks on
versions of MD4 with either the first or the last rounds missing were
developed very quickly by Den Boer, Bosselaers and others. Dobbertin has
shown how collisions for the full version of MD4 can be found in under a
minute on a typical PC. In recent work, Dobbertin (Fast Software Encryption,
1998) has shown that a reduced version of MD4 in which the third round of
the compression function is not executed but everything else remains the
same, is not one-way. Clearly, MD4 should now be considered broken.
MD5 was developed by Rivest in 1991. It is basically MD4 with "safety-belts"
and while it is slightly slower than MD4, it is more secure. The algorithm
consists of four distinct rounds, which has a slightly different design from
that of MD4. Message-digest size, as well as padding requirements, remains
the same. Den Boer and Bosselaers have found pseudo-collisions for MD5. More
recent work by Dobbertin has extended the techniques used so effectively in
the analysis of MD4 to find collisions for the compression function of MD5.
While stopping short of providing collisions for the hash function in its
entirety this is clearly a significant step. For a comparison of these
different techniques and their impact the reader is referred to.
Van Oorschot and Wiener have considered a brute-force search for collisions
in hash functions, and they estimate a collision search machine designed
specifically for MD5 (costing $10 million in 1994) could find a collision
for MD5 in 24 days on average. The general techniques can be applied to
other hash functions.