Basic concepts of Message Digest and Hash Function draft
Basic concepts of Message Digest and Hash Function draft
Table of Contents
What is Hashing? Cryptography Hash functions
Since, the hash value of first message block becomes an input to the second hash operation,
output of which alters the result of the third operation, and so on. This effect, known as
an avalanche effect of hashing.
Avalanche effect results in substantially different hash values for two messages that differ by even
a single bit of data.
Understand the difference between hash function and algorithm correctly. The hash function
generates a hash code by operating on two blocks of fixed-length binary data.
Hashing algorithm is a process for using the hash function, specifying how the message will be
broken up and how the results from previous message blocks are chained together.
The integrity check helps the user to detect any changes made to original file. It however, does not
provide any assurance about originality. The attacker, instead of modifying file data, can change
the entire file and compute all together new hash and send to the receiver. This integrity check
application is useful only if the user is sure about the originality of file.
the data integrity threats and the use of hashing technique to detect if any modification attacks
have taken place on the data.
Another type of threat that exist for data is the lack of message authentication. In this threat, the
user is not sure about the originator of the message. Message authentication can be provided
using the cryptographic techniques that use secret keys as done in case of encryption.
Before diving headfirst into the main topic, it is best to go through the basic concept of hashing
first.
What is Hashing?
Hashing consists of converting a general string of information into an intricate piece of data. This is
done to scramble the data so that it completely transforms the original value, making the hashed
value utterly different from the original.
Hashing uses a hash function to convert standard data into an unrecognizable format. These hash
functions are a set of mathematical calculations that transform the original information into their
hashed values, known as the hash digest or digest in general. The digest size is always the same
for a particular hash function like MD5 or SHA1, irrespective of input size.
Also Read: Top Data Structures and Algorithms Every Data Science Professional Should Know
Password Verification:
It is common to store user credentials of websites in a hashed format to prevent third parties from
reading the passwords. Since hash functions always provide the same output for the same input,
comparing password hashes is much more private.
2. It passes the password through a hash function and stores the digest on the server
3. When a user tries to log in, they enter the password again
4. It passes the entered password through the hash function again to generate a digest
5. If the newly developed digest matches the one on the server, the login is verified
Integrity Verification:
Some files can be checked for data corruption using hash functions. Like the above scenario, hash
functions will always give the same output for similar input, irrespective of iteration parameters.
3. When a user downloads the file, they recalculate the hash digest
4. If the digest matches the original hash value, file integrity is maintained
Now that you have a base foundation set in hashing, you can look at the focus for this tutorial, the
MD5 algorithm.
Ronald Rivest designed this algorithm in 1991 to provide the means for digital signature
verification. Eventually, it was integrated into multiple other frameworks to bolster security indexes.
The digest size is always 128 bits, and thanks to hashing function guidelines, a minor change in
the input string generate a drastically different digest. This is essential to prevent similar hash
generation as much as possible, also known as a hash collision.
You will now learn the steps that constitute the working of the MD5 algorithm.
Steps in MD5 Algorithm
Padding Bits
When you receive the input string, you have to make sure the size is 64 bits short of a multiple of
512. When it comes to padding the bits, you must add one(1) first, followed by zeroes to round out
the extra characters.
Padding Length
You need to add a few more characters to make your final string a multiple of 512. To do so, take
the length of the initial input and express it in the form of 64 bits. On combining the two, the final
string is ready to be hashed.
Initialize MD Buffer
The entire string is converted into multiple blocks of 512 bits each. You also need to initialize four
different buffers, namely A, B, C, and D. These buffers are 32 bits each and are initialized as
follows:
A = 01 23 45 67
B = 89 ab cd ef
C = fe dc ba 98
D = 76 54 32 10
Each 512-bit block gets broken down further into 16 sub-blocks of 32 bits each. There are four
rounds of operations, with each round utilizing all the sub-blocks, the buffers, and a constant array
value.
According to the image above, you see the values being run for a single buffer A. The correct
order is as follows:
As a final step, it adds the value of B to the string and is stored in buffer A.
The steps mentioned above are run for every buffer and every sub-block. When the last block’s
final buffer is complete, you will receive the MD5 digest.
The non-linear process above is different for each round of the sub-block.
Round 1: (b AND c) OR ((NOT b) AND (d))
With this, you conclude the working of the MD5 algorithm. You will now see the advantages
procured when using this particular hash algorithm.
Easy to Compare: Unlike the latest hash algorithm families, a 32 digit digest is relatively easier to
compare when verifying the digests.
Storing Passwords: Passwords need not be stored in plaintext format, making them accessible for
hackers and malicious actors. When using digests, the database also gets a boost since the size of all
hash values will be the same.
Low Resource: A relatively low memory footprint is necessary to integrate multiple services into the
same framework without a CPU overhead.
Integrity Check: You can monitor file corruption by comparing hash values before and after transit. Once
the hashes match, file integrity checks are valid, and it avoids data corruption.
MD5 is the third message-digest algorithm Rivest created. MD2, MD4 and MD5 have similar
structures, but MD2 was optimized for 8-bit machines, in comparison with the two later
algorithms, which are designed for 32-bit machines. The MD5 algorithm is an extension of
MD4, which the critical review found to be fast but potentially insecure. In comparison, MD5 is
not quite as fast as the MD4 algorithm, but offered much more assurance of data security.
Computation of the MD5 digest value is performed in separate stages that process each 512-bit
block of data along with the value computed in the preceding stage. The first stage begins with
the message-digest values initialized using consecutive hexadecimal numerical values. Each
stage includes four message-digest passes, which manipulate values in the current data block
and values processed from the previous block. The final value computed from the last block
becomes the MD5 digest for that block.
Is MD5 secure?
The goal of any message-digest function is to produce digests that appear to be random. To be
considered cryptographically secure, the hash function should meet two requirements:
2. It is impossible for an attacker to create two messages that produce the same hash value.
MD5 hashes are no longer considered cryptographically secure methods and should not be used
for cryptographic authentication, according to IETF.
In 2011, IETF published RFC 6151, "Updated Security Considerations for the MD5 Message-
Digest and the HMAC-MD5 Algorithms," which cited a number of recent attacks against MD5
hashes. It mentioned one that generated hash collisions in a minute or less on a standard
notebook and another that could generate a collision in as little as 10 seconds on a 2.6 gigahertz
Pentium 4 system. As a result, IETF suggested that new protocol designs should not use MD5 at
all and that the recent research attacks against the algorithm "have provided sufficient reason to
eliminate MD5 usage in applications where collision resistance is required such as digital
signatures."
Alternatives to MD5
A major concern with MD5 is the potential it has for message collisions when message hash
codes are inadvertently duplicated. MD5 hash code strings also are limited to 128 bits. This
makes them easier to breach than other hash code algorithms that followed.
Alternate hash codes to MD5 include the following.
Secure Hash Algorithm 1 (SHA-1). Developed by the U.S. government in the 1990s, SHA-1
used techniques like those of MD5 in the design of message-digest algorithms. But SHA-1
generated more secure 160-bit values when compared to MD5's 128-bit hash value lengths.
Despite this, SHA-1 had some weaknesses and did not prove to be the ultimate algorithmic
methodology for encryption, either. Security concerns began to emerge, prompting companies
like Microsoft to discontinue support for SHA-1 in its software.
The SHA-2 hash code family. The more secure successor to SHA-1 and one that is widely used
today is the SHA-2 family of hash codes. SHA-2 hash codes were created by the U.S. National
Security Agency in 2001. They represent a significant departure from SHA-1 in that the SHA-2
message-digest algorithms were longer and harder to break. The SHA-2 family of algorithms
delivers hash values that are 224, 256, 384 and 512 bits in length. They are known by the names
of their message-digest lengths -- for example, SHA-224 and SHA-256.
Cyclic redundancy check (CRC) codes. CRC codes are often suggested as possible
substitutions for MD5 because both MD5 and CRC perform hashing functions, and both deliver
checksums. But the similarity ends there. A 32-bit CRC code is used to detect errors during data
transmissions so corrupted or lost data can be identified. Meanwhile, MD5 is a secure hash
algorithm and a cryptographic hash function that can detect some data corruption but is
primarily intended for the secure encryption of data that is being transmitted and the verification
of digital certificates.