Building a Real-Time Chord Recognizer

elasticdog1 pts0 comments

Under the Hood: Building a Real-Time Chord Recognizer | WhatChord

The problem is not a lookup

The first intuition when building a chord recognizer is to build a<br>dictionary. There are only 12 pitch classes, which means there are<br>only 2^12 = 4096 possible pitch-class sets. Store a<br>name for each set, and when a user plays C-E-G, look up {C, E, G}<br>and return “C major.”

The problem is not memory. Four thousand entries is trivial. The<br>problem is meaning. A pitch-class set does not contain enough<br>information to decide what musicians will call it.

Piano players often leave out notes that a dictionary entry might<br>expect. Extended chords add notes that no fixed dictionary entry<br>anticipates. And the same set of pitch classes, as discussed in the<br>companion article, can legitimately<br>be described as multiple different chords depending on musical<br>context.

What you actually need is a scoring model. It has to evaluate how<br>well any given set of notes fits each chord type, rank all plausible<br>interpretations, and apply musical judgment when scores are close.

Overview: a four-stage pipeline

Before diving into each component, here is the overall shape of the<br>algorithm. A snapshot of sounding notes enters at the top; a ranked<br>list of chord interpretations comes out at the bottom.

Input: set of sounding pitch classes + lowest (bass) note

Pitch-class bitmask

12-bit integer: one bit per semitone in the octave

Candidate generation

Each sounding note becomes a candidate root, scored against<br>every chord template, extensions extracted

Score normalization

Raw scores are normalized for fair comparison across chord<br>complexities

Ranking

Musical heuristics resolve ambiguous scores; hard structural<br>rules override when the score alone would pick the wrong answer

Output: top ranked chord candidates, result cached in LRU

The rest of this article walks through each stage in detail, ending<br>with a discussion of known limitations.

Pitch classes and bitmasks

WhatChord models the common 12-tone equal temperament (12-TET)<br>pitch-class framework used by MIDI keyboards, which divides each<br>octave into equal semitone positions. A pitch class is the<br>note’s position within that octave, ignoring which octave it’s in,<br>so middle C, the C above it, and the C three octaves below all share<br>pitch class 0. In this engine, pitch classes are numbered 0 (C)<br>through 11 (B).

For analysis, the engine collapses the sounding notes into a set of<br>pitch classes plus the lowest sounding note as bass. The pitch-class<br>set is represented as a 12-bit integer mask where bit n is<br>set if pitch class n is present. C major (C=0, E=4, G=7)<br>looks like this:

11<br>10

A♯<br>G♯<br>F♯<br>D♯<br>C♯

// Pitch classes: C=0, E=4, G=7<br>int pcMask = (1 0) | (1 4) | (1 7);<br>// pcMask == 0b000010010001 == 0x091

This representation is compact and fast. Checking whether a pitch<br>class is present is a single bitwise AND. Counting present pitch<br>classes is a popcount. Rotating the set relative to a candidate root<br>is a loop over bits with modular arithmetic. All of these operations<br>are cheap.

A key design decision:<br>only pitch classes actually present in the voicing are tested as<br>candidate roots.<br>There are no “ghost roots” and the algorithm never proposes an<br>interpretation where the chord is rooted on a note that is not being<br>played. This keeps the candidate count small (bounded by the number<br>of sounding notes, typically 3–7) and avoids obviously wrong<br>readings.

This is a deliberate “solo keyboard” assumption. The current engine<br>is optimized for the common case where the same MIDI stream contains<br>both the harmony and the bass note. A future ensemble mode could<br>relax that rule for settings where another instrument is carrying<br>the bass, allowing rootless voicings to imply roots that are not<br>literally present in the keyboard part.

Chord templates

Chord qualities are also defined as bitmask templates. Each one<br>describes three sets of intervals relative to the root:

Required: tones that must be present to identify<br>this quality. Missing more than one required tone causes the<br>template to be skipped entirely.

Optional: tones frequently omitted in real<br>voicings (almost always the perfect 5th). Present when played,<br>unremarkable when absent.

Penalty: tones that actively contradict this<br>quality. Having a major 3rd present when you are trying to<br>identify a minor chord hurts the score.

The 22 templates, organized by complexity:

Quality<br>Required intervals<br>Optional<br>Key penalty tones

Major<br>R, M3<br>P5<br>m3, m7, M7

Minor<br>R, m3<br>P5<br>M3, m7, M7

Diminished<br>R, m3, ♭5<br>M3, P5

Augmented<br>R, M3, ♯5<br>m3, P5

Sus2<br>R, M2, P5<br>m3, M3, m7, M7

Sus4<br>R, P4, P5<br>m3, M3, m7, M7

Major 6<br>R, M3, M6<br>P5<br>m3, m7, M7

Minor 6<br>R, m3, M6<br>P5<br>M3, m7, M7

Dominant 7<br>R, M3, m7<br>P5<br>M7, m3

7sus2<br>R, M2, m7<br>P5<br>m3, M3, P4, M7

7sus4<br>R, P4, m7<br>P5<br>m3, M3, M7

7♭5<br>R, M3, ♭5, m7<br>P5, M7, m3

7♯5<br>R, M3, ♯5, m7<br>P5, M7, m3

Major 7<br>R, M3, M7<br>P5<br>m7, m3

Major 7sus2<br>R, M2, M7<br>P5<br>m3, M3, P4, m7

Major 7sus4<br>R, P4, M7<br>P5<br>m3, M3, M2, m7

Major 7♭5<br>R, M3, ♭5,...

pitch chord classes class major present

Related Articles