Suffix BWT vs. cyclic shift BWT, and fast computation

Suffix BWT vs cyclic shift BWT, and fast computation | purplesyringa's blog Suffix BWT vs cyclic shift BWT, and fast computation July 3, 2026Intended audience: data compression geeks. The Burrows-Wheeler transform takes a string as an input and rearranges its characters, grouping them by context. It is invertible with 𝒪(1) additional input, and together these properties give it a place in data compression and genome alignment.What Wikipedia doesn’t tell you is that there are actually two variants of BWT, with subtly different performance characteristics and simplicity. These differences seem to be largely undocumented, so this post is me trying to make sense of it.Cyclic shift BWT I’ll cover what I call “cyclic shift BWT” first.Let’s start with the string bcacaba. We write down all cyclic shifts, or rotations of this string, along with their offsets:bcacaba (0) cacabab (1) acababc (2) cababca (3) ababcac (4) babcaca (5) abcacab (6) We then sort them lexicographically:ababcac (4) abcacab (6) acababc (2) babcaca (5) bcacaba (0) cababca (3) cacabab (1) The output of the BWT is the sequence of characters just before the start of each cyclic shift. For example, the first cyclic shift in the list starts at 4, so we write down the character at offset 3 in the input, which is c. The final output is cbcaaab. (Note that this is the same as taking the last column of the table above.)Two strings sorted next to each other are likely to have a long common prefix, and thus the symbol before them is also likely to match in realistic data. For example, if in a given text the string ender is always preceded by g, b, or a space, there’s going to be a long subsequence in the output composed entirely of these three symbols.This algorithm is invertible: if you know the output of the BWT, you can recover the input up to a cyclic shift. Here’s how you can do this:Preparation step: take the output of the BWT (cbcaaab) and sort its characters lexicographically. This reveals the first column of the table above: aaabbcc.Now take the first character of the output, c. It was produced by taking the last character of some cyclic shift t‖"c". We will now find the character before this c, i.e. the last character of t.There are two cs in the text, so there are two cyclic shifts ending with c: t1‖"c" and t2‖"c". Since we took the first c of the output, not the second one, it much be part of the lexicographically smaller shift t‖"c"=t1‖"c"t2‖"c" (and t1t2).Now consider shifts starting with c. There are also two of them, "c"‖t1 and "c"‖t2, and since t1t2, they’re ordered in the same way: "c"‖t1"c"‖t2.The first column tells us these cyclic shifts are located at indices 5 and 6 in the sorted order. The smaller one is at index 5, and according to the BWT output that shift ("c"‖t1) ends with a. Hence the c at index 0 is preceded by the a at index 5.This process can now be repeated starting with this a: it terminates the third shift out of three shifts ending with a, so the character before it is last in the third shift out of three shifts starting with a, i.e. c at index 2, etc. This recovers the characters of the input in reverse order.In pseudocode, the decoder looks something like this:# For each character of the BWT, determine its index among equal characters with counting sort. pos_in_char = [] counts = [0] * 256 for c in bwt: pos_in_char.append(counts[c]) counts[c] += 1

# Recover characters one by one. s = "" i = 0 for _ in range(len(bwt)): s = chr(bwt[i]) + s # Jump to the `pos_in_char[i]`th cyclic shift among shifts with the same leading character. # This `sum(...)` call is typically implemented with prefix sums for efficiency. i = sum(counts[:bwt[i]]) + pos_in_char[i]

print(s) # correct up to a cyclic shift To recover the original string exactly, not just up to a cyclic shift, we have to start decoding from i representing the cyclic shift with offset 0 (so-called primary index). In this example, the encoder can communicate that it’s index 4 to the decoder.Suffix BWT Now that you hopefully understand how cyclic shift BWT works, let’s compare it to suffix BWT.We start with the same string bcacaba, but this time we write down and sort suffixes instead of cyclic shifts:empty (7) a (6) aba (4) acaba (2) ba (5) bcacaba (0) caba (3) cacaba (1) Note that the order is different from cyclic shifts: whereas before, offset 4 (ababcac) was located before 6 (abcacab), now offset 4 (aba) is located after offset 6 (a). Such a flip can occur when one suffix is a prefix of another, since lexicographic comparison considers shorter strings smaller, all other things being equal.Just like before, we construct the output out of characters preceding the suffixes: abcca^ab. ^ denotes the full suffix, which is not preceded by anything, and in practice we drop it: abccaab.Decoding suffix BWT is a little subtle. Due to the empty suffix at the beginning, we have to bump up every computed index by one. And since the full suffix is dropped without a trace, we have to...

Suffix BWT vs. cyclic shift BWT, and fast computation

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI