Trying to Break MD5 Hash

Hunting for Bias in MD5 — RemedySec

FIELD NOTES — CRYPTANALYSIS

MD5 SERIES · PART 1

Hunting for bias in MD5's avalanche effect

We fixed the first character of a million passwords at a time and asked the hash to give something away. Here's what 27 million digests actually showed.

Author RemedySec Labs

Reading time 9 min

Samples 27,000,000

Status Reproducible

The itch that started this Imagine a vending machine with no buttons, just a slot. Feed it anything: a coin, a ticket, a hundred-page contract. It always spits out exactly 32 tokens. Feed it the same thing twice, you get the identical 32 tokens back so it's not random. But the machine swears that staring at those 32 tokens tells you nothing about what you fed it. Not the length, not the first character, nothing.

That's the promise of a cryptographic hash. MD5 makes it in 32 hex characters. I didn't believe the second half of that claim. So I tried to make the machine talk.

LIVE DIFFUSION — type a password ● 128 bits

md5() → —

Flip one character above. On average ~64 of 128 bits change with no fixed subset tied to the character you changed. That's the property we are trying to break.

LIVE EXPERIMENT — run the bucket test yourself ● idle

First character

ABCE KOVZ

Samples

1,000 5,000 20,000 50,000

Run test

Expected per bucket: —

Max deviation: —

Buckets flagged: —

Each bar is a hex digit (0–f) how often it showed up as the first character of the MD5 hash , across N random passwords all starting with the letter you picked. The teal line is "expected if no correlation." Run it a few times, or bump the sample size watch the bars settle toward the line as N grows, the same pattern we found at 27 million samples.

TL;DR we ran the test at scale, looked for shortcuts, and the machine kept its promise. Full writeup below.

01The question

We know that for MD5 : change one input bit, and roughly half the output bits flip, unpredictably. I didn't want to take that on faith. If you fix the first character of a password, every sample starts with A does any trace of that survive into the hash or is seen in the hash? . Just a whisper. A single bit, byte, or hex digit anything which we could detect.

That's a testable, so we that's what I tried to find.

02Method

Generate large batches of random passwords with one feature held constant i.e first character, length 7 .. hash each with MD5, then bucket the outputs and check whether any bucket deviates from what pure chance predicts.

We ran this at increasing scale on lightning.ai first with 3,000 samples per letter, then 50,000 then 1,000,000 per letter across all 26 letters 26 million hashes in the final pass, bucketed by the first hex digit of the output.

Fig 1 Largest deviation seen per letter, n=1,000,000 per letter

We didn't stop at one run. A real correlation should survive a change of random seed.Every test was re-run with a fresh seed before anything got called a finding.

03The full sweep

One bucket isn't a thorough test, so we expanded to a 26×16 grid every letter A–Z fixed as the first character, against every possible first hex digit of the resulting hash visualized as a single heatmap.

Fig 2 — % deviation from expected, by first character (rows) and output hex digit (columns), n=1,000,000 per letter

If the bias were real, one whole row or column in the heatmap would stay the same color.

At small sample sizes, individual letters crossed our significance threshold first B and D, then a different set entirely once we reseeded. That instability is the tell:

Seedn / letterLetters flaggedMax deviation

421,000,000A, E, K, O, V1.21% 411,000,000E, F, P1.05%

Only E repeated. Across 26 independent letters and a ~1–2% per-letter false-positive rate at our threshold, that's exactly the overlap you'd expect from chance alone, not a pattern holding its shape.

04Findings

Verdict

No statistically reproducible correlation was found between the first character of a fixed-length password and any single hex digit, byte, or sampled bit of its MD5 digest, across 27 million hashes and multiple independent random seeds. Every apparent "hit" failed to survive a reseed.

This matches what differential and linear cryptanalysis predict about MD5's compression function: 64 steps of nonlinear mixing, modular addition, and misaligned bit rotation are enough to make a single input character's influence statistically indistinguishable from noise by the time it reaches the output.

Open thread

I've only tested first-character vs. first-hex-digit. Last character, middle occurrences, repeated-character patterns, and bit-level (not hex-digit-level) mutual information are still untested at this scale. That's next ambition or the project direction.

05What's next — this is Part 1

This is the first post in a series. We've only tested one narrow slice: first character vs. first hex digit. Part 2 widens the grid — last character vs. last digit, character position vs. specific output bytes, and a full 128-bit...

Trying to Break MD5 Hash

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI