Hashing is a mechanism that is used for data integrity assurance. Hashing is based on a one-way mathematical function: functions that are relatively easy to compute, but significantly difficult to reverse. Grinding coffee is a good example of a one-way function: it is easy to grind coffee beans, but it is almost impossible to put all of the tiny pieces back together to rebuild the original beans.

Hashing is performed by sending data of an arbitrary lengths that are input into the hash function, and the result of the hash function is the fixed-length hash, which is known as the *digest* or *fingerprint*. Hashing is similar to the calculation of cyclic redundancy check (CRC) checksums, but it is much stronger cryptographically. That is, given a CRC value, it is easy to generate data with the same CRC. However, with hash functions, it is computationally infeasible for an attacker to have two different sets of data that result in the same fingerprint.

## Hashing in Action

The sender wants to ensure that the message is not altered on its way to the receiver. The sending device inputs the message into a hashing algorithm and computes its fixed-length digest or fingerprint. Both the message and the hash are in plaintext. This fingerprint is then attached to the message (and sent to the receiver). The receiving device removes the fingerprint from the message and inputs the message into the same hashing algorithm. If the hash that is computed by the receiving device is equal to the one that is attached to the message, the message has not been altered during transit.

Hashing does not add security to the message. When the message traverses the network, a potential attacker could intercept the message, change it, recalculate the hash, and append it to the message. Hashing prevents the message only from being changed accidentally, such as by a communication error. There is nothing unique to the sender in the hashing procedure; therefore, a hash can be computed for any data, as long as someone has the correct hash function.

Thus, hash functions are helpful to ensure that data does not change accidentally, but it cannot ensure that data is not deliberately changed.

These are two well-known hash functions:

- Message Digest 5 (MD5) with 128-bit digests
- Secure Hash Algorithm 1 (SHA-1) with 160-bit digests

## Hashed Message Authentication Code

Hash functions are the basis of the protection mechanism of Hashed Message Authentication Codes (HMACs). HMACs use existing hash functions, but with a significant difference. HMACs add a secret key as input to the hash function. Only the sender and the receiver know the secret key, and the output of the hash function now depends on the input data and the secret key. Therefore, only parties who have access to that secret key can compute the digest of an HMAC function. This behavior defeats man-in-the-middle attacks and provides authentication of the data origin. If two parties share a secret key and use HMAC functions for authentication, a properly constructed HMAC digest of a message that a party has received indicates that the other party was the originator of the message, because it is the only other entity possessing the secret key.

Cisco technologies use two well-known HMAC functions:

- Keyed MD5, based on the MD5 hashing algorithm
- Keyed SHA-1, based on the SHA-1 hashing algorithm

An HMAC digest is created by taking data of arbitrary lengths and inputting it into the hash function, together with a secret key. The result is the fixed-length hash that depends on the data and the secret key.

## HMAC in Action

The sender wants to ensure that the message is not altered in transit and wants to provide a way for the receiver to authenticate the origin of the message.

The sending device inputs data and the secret key into the hashing algorithm and calculates the fixed-length HMAC digest, or fingerprint. This authenticated fingerprint is then attached to the message and sent to the receiver. The receiving device removes the fingerprint from the message and uses the plaintext message with the secret key as input to the same hashing function. If the fingerprint that is calculated by the receiving device is equal to the fingerprint that was sent, the message has not been altered. Additionally, the origin of the message is authenticated, because only the sender possesses a copy of the shared secret key. The HMAC function has ensured the authenticity of the message.

**Note** : IPsec VPNs rely on HMAC functions to authenticate the origin of every packet and provide data integrity checking.

Cisco products use hashing for entity authentication, data integrity, and data authenticity purposes:

- IPsec gateways and clients use hashing algorithms, such as MD5 and SHA-1 in HMAC mode, to provide packet integrity and authenticity.
- Cisco IOS routers use hashing with secret keys in an HMAC-like manner to add authentication information to routing protocol updates.
- Cisco software images that you can download from Cisco.com have an MD5-based checksum that is available, so that customers can check the integrity of downloaded images.
- Hashing can also be used in a feedback-like mode to encrypt data. For example, TACACS+ uses MD5 to encrypt its session.

## Comparing Hashing Algorithms

The two most commonly used cryptographic hash functions are MD5 and SHA-1.

## MD5 Algorithm

The MD5 algorithm is a ubiquitous hashing algorithm that was developed by Ron Rivest and is used in various Internet applications today.

MD5 is a one-way function that makes it easy to compute a hash from the given input data, but makes it unfeasible to compute input data that is given only a hash. MD5 is also collision resistant, which means that two messages with the same hash are very unlikely to occur. MD5 is essentially a complex sequence of simple binary operations, such as XORs and rotations, that is performed on input data and produces a 128-bit digest.

The main algorithm itself is based on a compression function, which operates on blocks. The input is a data block plus a feedback of previous blocks. The 512-bit blocks are divided into sixteen 32-bit subblocks. These blocks are then rearranged with simple operations in a main loop, which consists of four rounds. The output of the algorithm is a set of four 32-bit blocks, which concatenate to form a single 128-bit hash value. The message length is also encoded into the digest.

## SHA-1 Algorithm

The U.S. National Institute of Standards and Technology (NIST) developed the Secure Hash Algorithm (SHA), the algorithm that is specified in the Secure Hash Standard (SHS). SHA-1 is a revision to the SHA that was published in 1994. The revision corrected an unpublished flaw in SHA. Its design is very similar to the Message Digest 4 (MD4) family of hash functions that Ron Rivest developed.

The SHA-1 algorithm takes a message of no less than 2^64 bits in length and produces a 160-bit message digest. The algorithm is slightly slower than MD5, but the larger message digest makes it more secure against brute force collision and inversion attacks.

## SHA-2 Algorithm

Secure Hash Algorithm 2 (SHA-2) specifies five SHAs—SHA-1, SHA-224, SHA-256, SHA384, and SHA-512—for computing a condensed representation of electronic data. When a message of any length less than 264 bits (for SHA-224 and SHA-256) or less than 2,128 bits (for SHA-384 and SHA-512) is input to a hash algorithm, the result is a message digest that ranges in length from 224 to 512 bits, depending on the algorithm.

The SHA-2 family of hash functions was approved by NIST for use by federal agencies in 2006, for all applications using SHAs. The publication encouraged all federal agencies to stop using SHA-1 for digital signatures, digital time stamping, and other applications that require collision resistance as soon as practical, and it mandated the use of the SHA-2 family of hash functions for these applications after 2010. After 2010, federal agencies used SHA-1 only for the following applications: HMACs, key derivation functions (KDFs), and random number generators (RNGs). This change was triggered in 2005, when security flaws were identified for SHA-1 in theoretical exploits that exposed weaknesses to collision attacks.

## Comments

You can follow this conversation by subscribing to the comment feed for this post.