Under the Hood: Building a Real-Time Chord Recognizer | WhatChord
The problem is not a lookup
The first intuition when building a chord recognizer is to build a<br>dictionary. There are only 12 pitch classes, which means there are<br>only 2^12 = 4096 possible pitch-class sets. Store a<br>name for each set, and when a user plays C-E-G, look up {C, E, G}<br>and return “C major.”
The problem is not memory. Four thousand entries is trivial. The<br>problem is meaning. A pitch-class set does not contain enough<br>information to decide what musicians will call it.
Piano players often leave out notes that a dictionary entry might<br>expect. Extended chords add notes that no fixed dictionary entry<br>anticipates. And the same set of pitch classes, as discussed in the<br>companion article, can legitimately<br>be described as multiple different chords depending on musical<br>context.
What you actually need is a scoring model. It has to evaluate how<br>well any given set of notes fits each chord type, rank all plausible<br>interpretations, and apply musical judgment when scores are close.
Overview: a four-stage pipeline
Before diving into each component, here is the overall shape of the<br>algorithm. A snapshot of sounding notes enters at the top; a ranked<br>list of chord interpretations comes out at the bottom.
Input: set of sounding pitch classes + lowest (bass) note
Pitch-class bitmask
12-bit integer: one bit per semitone in the octave
Candidate generation
Each sounding note becomes a candidate root, scored against<br>every chord template, extensions extracted
Score normalization
Raw scores are normalized for fair comparison across chord<br>complexities
Ranking
Musical heuristics resolve ambiguous scores; hard structural<br>rules override when the score alone would pick the wrong answer
Output: top ranked chord candidates, result cached in LRU
The rest of this article walks through each stage in detail, ending<br>with a discussion of known limitations.
Pitch classes and bitmasks
WhatChord models the common 12-tone equal temperament (12-TET)<br>pitch-class framework used by MIDI keyboards, which divides each<br>octave into equal semitone positions. A pitch class is the<br>note’s position within that octave, ignoring which octave it’s in,<br>so middle C, the C above it, and the C three octaves below all share<br>pitch class 0. In this engine, pitch classes are numbered 0 (C)<br>through 11 (B).
For analysis, the engine collapses the sounding notes into a set of<br>pitch classes plus the lowest sounding note as bass. The pitch-class<br>set is represented as a 12-bit integer mask where bit n is<br>set if pitch class n is present. C major (C=0, E=4, G=7)<br>looks like this:
11<br>10
A♯<br>G♯<br>F♯<br>D♯<br>C♯
// Pitch classes: C=0, E=4, G=7<br>int pcMask = (1 0) | (1 4) | (1 7);<br>// pcMask == 0b000010010001 == 0x091
This representation is compact and fast. Checking whether a pitch<br>class is present is a single bitwise AND. Counting present pitch<br>classes is a popcount. Rotating the set relative to a candidate root<br>is a loop over bits with modular arithmetic. All of these operations<br>are cheap.
A key design decision:<br>only pitch classes actually present in the voicing are tested as<br>candidate roots.<br>There are no “ghost roots” and the algorithm never proposes an<br>interpretation where the chord is rooted on a note that is not being<br>played. This keeps the candidate count small (bounded by the number<br>of sounding notes, typically 3–7) and avoids obviously wrong<br>readings.
This is a deliberate “solo keyboard” assumption. The current engine<br>is optimized for the common case where the same MIDI stream contains<br>both the harmony and the bass note. A future ensemble mode could<br>relax that rule for settings where another instrument is carrying<br>the bass, allowing rootless voicings to imply roots that are not<br>literally present in the keyboard part.
Chord templates
Chord qualities are also defined as bitmask templates. Each one<br>describes three sets of intervals relative to the root:
Required: tones that must be present to identify<br>this quality. Missing more than one required tone causes the<br>template to be skipped entirely.
Optional: tones frequently omitted in real<br>voicings (almost always the perfect 5th). Present when played,<br>unremarkable when absent.
Penalty: tones that actively contradict this<br>quality. Having a major 3rd present when you are trying to<br>identify a minor chord hurts the score.
The 22 templates, organized by complexity:
Quality<br>Required intervals<br>Optional<br>Key penalty tones
Major<br>R, M3<br>P5<br>m3, m7, M7
Minor<br>R, m3<br>P5<br>M3, m7, M7
Diminished<br>R, m3, ♭5<br>M3, P5
Augmented<br>R, M3, ♯5<br>m3, P5
Sus2<br>R, M2, P5<br>m3, M3, m7, M7
Sus4<br>R, P4, P5<br>m3, M3, m7, M7
Major 6<br>R, M3, M6<br>P5<br>m3, m7, M7
Minor 6<br>R, m3, M6<br>P5<br>M3, m7, M7
Dominant 7<br>R, M3, m7<br>P5<br>M7, m3
7sus2<br>R, M2, m7<br>P5<br>m3, M3, P4, M7
7sus4<br>R, P4, m7<br>P5<br>m3, M3, M7
7♭5<br>R, M3, ♭5, m7<br>P5, M7, m3
7♯5<br>R, M3, ♯5, m7<br>P5, M7, m3
Major 7<br>R, M3, M7<br>P5<br>m7, m3
Major 7sus2<br>R, M2, M7<br>P5<br>m3, M3, P4, m7
Major 7sus4<br>R, P4, M7<br>P5<br>m3, M3, M2, m7
Major 7♭5<br>R, M3, ♭5,...