Deep Dives11 min read

Hashing Explained: SHA-256, HMAC, and Why It Matters

toolsto.dev
Hashing Explained: SHA-256, HMAC, and Why It Matters

You type your password into a login form. The server checks it against the database. But the database doesn't store your actual password — it stores something like $2b$12$LJ3m4ys3Lz0YH9Qdnx.NQeG3pKfE2v7Yw1FT9KLmvqR/NXhXCfS. What happened between your keystrokes and that string?

Hashing. It's one of those concepts that underpins half the systems you use daily — password storage, data integrity, caching, digital signatures, blockchain — but most developers only understand it surface-deep.

What is a Hash Function?

A hash function takes input of any size and produces a fixed-size output (the "hash" or "digest"). The same input always produces the same output. Different inputs produce (almost always) different outputs.

Input:  "hello"
SHA-256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Input:  "Hello"  (one character changed)
SHA-256: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

Notice: changing one character completely changes the output. This is called the avalanche effect — a critical property that makes hash functions useful for security.

The Key Properties

A cryptographic hash function has three essential properties:

  1. Deterministic — same input always produces same output
  2. One-way — you can't reverse the hash to find the input (computationally infeasible)
  3. Collision-resistant — it's extremely hard to find two different inputs that produce the same hash

There's also:

  1. Avalanche effect — a tiny change in input causes a massive change in output
  2. Fixed output size — regardless of input size (a 1-byte file and a 1GB file produce the same size hash)

The Hash Algorithms You Need to Know

MD5 (128-bit / 32 hex characters)

"hello" → 5d41402abc4b2a76b9719d911017c592

Status: Broken. Collisions can be generated in seconds on modern hardware. MD5 was designed by Ron Rivest in 1991, and practical collision attacks were demonstrated in 2004.

Still useful for: Checksums (verifying file downloads), hash table keys, deduplication. Just not for anything security-related.

Not for: Password hashing, digital signatures, integrity verification in adversarial contexts.

SHA-1 (160-bit / 40 hex characters)

"hello" → aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d

Status: Broken. Google demonstrated a practical collision attack (SHAttered) in 2017. Git still uses SHA-1 internally for commit hashes, but is migrating to SHA-256.

Not for: Anything new. Use SHA-256 instead.

SHA-256 (256-bit / 64 hex characters)

"hello" → 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Status: Current standard. Part of the SHA-2 family designed by the NSA, published in 2001. No known practical attacks. Used in TLS certificates, Bitcoin, code signing, and almost every modern security protocol.

Use for: Data integrity, checksums, digital signatures, file verification.

SHA-512 (512-bit / 128 hex characters)

"hello" → 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca7
         2323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043

Status: Same family as SHA-256 but with a larger output. Actually faster than SHA-256 on 64-bit systems because its internal operations use 64-bit words.

Use for: When you want a larger hash (more collision resistance) or are running on 64-bit hardware.

Try computing all of these at once with the hash generator — paste any text and see MD5, SHA-1, SHA-256, and SHA-512 results simultaneously.

How Hash Functions Work (Conceptual)

You don't need to understand the bit-level math, but knowing the general approach helps:

  1. Padding — the input is padded to a specific block size (e.g., 512 bits for SHA-256)
  2. Initialization — start with a set of fixed constants (the "initial hash values")
  3. Processing — for each block of input, run it through a series of bitwise operations (shifts, rotations, XORs) mixed with the current hash state
  4. Output — the final hash state after processing all blocks is the digest

The "compression function" at the core of steps 2-3 is designed so that:

  • You can't work backwards from the output to find the input
  • Even a tiny change in input cascades through all the operations, producing a completely different output

Real-World Uses

1. Password Storage

Never store passwords in plaintext. Never store them encrypted (because you'd need to store the decryption key). Hash them.

import bcrypt

# When user creates an account
password = "hunter2"
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
# $2b$12$LJ3m4ys3Lz0YH9Qdnx.NQeG3pKfE2v7Yw1FT9KLmvqR/NXhXCfS

# When user logs in
if bcrypt.checkpw(submitted_password.encode(), hashed):
    # Password is correct

Wait — didn't I just say to use SHA-256? Not for passwords. Regular hash functions are too fast. An attacker with a GPU can compute billions of SHA-256 hashes per second, cracking most passwords by brute force.

Password hashing algorithms (bcrypt, scrypt, Argon2) are intentionally slow. They add a configurable "work factor" that makes each hash computation take 100ms+ instead of nanoseconds:

AlgorithmSpeed (per hash)Recommendation
SHA-256~1 nanosecondNever for passwords
bcrypt~100-300msGood, widely supported
scrypt~100-500msGood, memory-hard (resists GPU attacks)
Argon2id~100-500msBest, won the Password Hashing Competition

2. Data Integrity Verification

When you download a file, the website often shows a SHA-256 checksum:

nginx-1.25.3.tar.gz
SHA-256: 2e7e...a3f4

After downloading, you compute the hash locally and compare:

shasum -a 256 nginx-1.25.3.tar.gz
# should match the published hash

If they match, the file wasn't corrupted during download and hasn't been tampered with (assuming you trust the source of the hash).

3. Git Commits

Every Git commit is identified by a SHA-1 hash of its contents:

git log --oneline
# a1b2c3d feat: add user authentication
# e4f5g6h fix: resolve race condition in login

That a1b2c3d is the first 7 characters of a full SHA-1 hash. Git hashes the commit message, author, timestamp, parent commit, and the tree (file snapshot) to produce a unique identifier. This is why Git can detect if any file in your repository's history has been modified.

4. Caching and Deduplication

Content-addressable storage uses hashes as keys:

// Cache API responses by hashing the request
const cacheKey = sha256(JSON.stringify({ url, params, headers }));
const cached = cache.get(cacheKey);
if (cached) return cached;

CDNs use content hashes in filenames for cache busting:

app.a1b2c3d4.js  — the hash changes when the content changes
style.e5f6g7h8.css

5. Digital Signatures

Digital signatures combine hashing with asymmetric cryptography:

  1. Hash the document (SHA-256)
  2. Encrypt the hash with the signer's private key (RSA or ECDSA)
  3. Anyone with the public key can decrypt and verify the hash matches

This proves the document hasn't been modified and was signed by the key holder.

6. Blockchain

Every block in a blockchain contains the hash of the previous block:

Block 1: hash = SHA-256(transactions + previous_hash)
Block 2: hash = SHA-256(transactions + block_1_hash)
Block 3: hash = SHA-256(transactions + block_2_hash)

If anyone modifies a transaction in Block 1, its hash changes, which changes Block 2's hash, which changes Block 3's hash — breaking the entire chain. This is how blockchains achieve immutability.

HMAC: When Hashing Meets Authentication

A regular hash verifies that data hasn't been corrupted. HMAC (Hash-based Message Authentication Code) verifies that data hasn't been tampered with — by someone who doesn't have the secret key.

HMAC-SHA256("message", "secret_key") → authenticated hash

The difference:

  • Hash: Anyone can compute SHA-256("message") and get the same result
  • HMAC: Only someone with the secret key can compute HMAC-SHA256("message", "secret_key")

Where HMAC Is Used

  • JWT signatures — HS256 is HMAC-SHA256
  • Webhook verification — services like Stripe and GitHub sign payloads with HMAC so you can verify they're authentic
  • API authentication — AWS Signature v4 uses HMAC-SHA256

Here's how webhook verification works with Stripe:

const crypto = require('crypto');

function verifyStripeWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');

  return crypto.timingEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

Note the timingEqual — this prevents timing attacks where an attacker measures response time to guess the correct signature character by character.

Experiment with different HMAC algorithms using the HMAC generator.

Common Mistakes

Mistake 1: Using MD5 or SHA-1 for Security

# ❌ MD5 is broken for security purposes
import hashlib
password_hash = hashlib.md5(password.encode()).hexdigest()

# ✅ Use bcrypt, scrypt, or Argon2 for passwords
import bcrypt
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt())

Mistake 2: Hashing Passwords Without Salt

A salt is a random string added to the password before hashing. Without it, identical passwords produce identical hashes — making rainbow table attacks trivial.

# ❌ No salt — "password123" always hashes to the same value
sha256("password123")

# ✅ With salt — each user gets a unique hash even with the same password
sha256("a8f3k2j1" + "password123")  # user 1
sha256("m9x4p7b2" + "password123")  # user 2 — completely different hash

bcrypt, scrypt, and Argon2 handle salting automatically. If you're using them, you don't need to manage salts yourself.

Mistake 3: Using Regular Hashes for Passwords

SHA-256 computes in nanoseconds. A modern GPU can try billions of passwords per second. Password hashing algorithms are intentionally slow:

SHA-256:  ~3 billion hashes/sec (GPU)
bcrypt:   ~50,000 hashes/sec (GPU)
Argon2:   ~1,000 hashes/sec (GPU, with memory-hard settings)

That 3-million-fold slowdown is the entire point.

Mistake 4: Comparing Hashes with ==

String comparison with == can leak information through timing:

// ❌ Vulnerable to timing attacks
if (computedHash === expectedHash) { ... }

// ✅ Constant-time comparison
const crypto = require('crypto');
if (crypto.timingEqual(Buffer.from(computedHash), Buffer.from(expectedHash))) { ... }

Mistake 5: Not Verifying Checksums

You download a library, a binary, a Docker image. Do you verify the checksum? You should:

# Download the file and its checksum
curl -O https://example.com/release.tar.gz
curl -O https://example.com/release.tar.gz.sha256

# Verify
shasum -a 256 -c release.tar.gz.sha256

Computing Hashes in Code

JavaScript (Web Crypto API)

async function sha256(message) {
  const encoder = new TextEncoder();
  const data = encoder.encode(message);
  const buffer = await crypto.subtle.digest('SHA-256', data);
  return Array.from(new Uint8Array(buffer))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('');
}

Node.js

const crypto = require('crypto');

// Hash
const hash = crypto.createHash('sha256').update('hello').digest('hex');

// HMAC
const hmac = crypto.createHmac('sha256', 'secret').update('message').digest('hex');

Python

import hashlib
import hmac

# Hash
hash = hashlib.sha256(b"hello").hexdigest()

# HMAC
sig = hmac.new(b"secret", b"message", hashlib.sha256).hexdigest()

Command Line

# SHA-256
echo -n "hello" | shasum -a 256

# MD5
echo -n "hello" | md5

# HMAC (OpenSSL)
echo -n "message" | openssl dgst -sha256 -hmac "secret"

Quick Reference

NeedAlgorithmWhy
Password storageArgon2id, bcrypt, scryptIntentionally slow, memory-hard
Data integritySHA-256Fast, collision-resistant
File checksumsSHA-256Standard for downloads and releases
API signatures (HMAC)HMAC-SHA256Key-dependent, tamper-proof
Hash table keysxxHash, FNVBlazing fast, not cryptographic
Legacy compatibilityMD5, SHA-1Only if required by existing systems

Compute hashes instantly with the hash generator or generate HMAC signatures with the HMAC generator.


Sources: NIST FIPS 180-4 — Secure Hash Standard, RFC 2104 — HMAC (IETF), OWASP Password Storage Cheat Sheet, Have I Been Pwned (breach data referenced). Hash outputs verified with the toolsto.dev hash generator.

Related Tools