GitHub - chkas/blqsort: Fast Branchless Quicksort with C++ Interface · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
chkas
blqsort
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>9 Commits<br>9 Commits
LICENSE
LICENSE
README.md
README.md
blqs.h
blqs.h
test_double.cpp
test_double.cpp
test_int128.cpp
test_int128.cpp
test_str.cpp
test_str.cpp
test_struct.cpp
test_struct.cpp
View all files
Repository files navigation
blqsort
blqsort is a fast branchless quicksort implementation for C++ that outperforms std::sort and pdqsort on random datasets.
On modern CPUs, avoiding branch misprediction is a key technique to speed up programs: When 'if' slows you down, avoid it.
Performance results naturally depend on the underlying hardware. The following benchmarks show the execution times for sorting 50 million doubles using different sorting implementations. The measurements were taken on an Apple M1 system using Clang and on an AMD Ryzen 3 system using GCC, both compiled with the -O3 option.
Implementation<br>Apple M1<br>AMD Ryzen
std::sort<br>1.33s<br>5.56s
pdqsort<br>1.33s<br>2.81s
blqsort<br>1.01s<br>2.06s
This paper by Edelkamp and A. Weiß shows how partitioning performance in Quicksort can be improved by avoiding conditional branches.
The strategy of using an auxiliary buffer for branchless partitioning is inspired by fluxsort. The “auxiliary buffer” here means a 512‑element stack array, not heap memory.
To avoid the O(n²) runtime caused by bad input data, the program can group identical elements together and switch to heapsort for that specific part if it detects a big imbalance during partitioning. The program also checks if a partition is already sorted.
For larger parts, it uses a median-of-medians strategy to find a good pivot. In addition, critical partitioning loops are explicitly unrolled.
For 2 to 12 elements, the algorithm uses custom sorting networks. This approach requires a separate code path for each size but sorts small subsets with very few swaps using a branchless sort‑2 primitive. Source for sorting networks
As a result, blqsort becomes faster than, for example, std::sort and pdqsort for random numbers.
For types with higher copy or move costs (such as strings), the buffer-based branchless approach becomes less efficient. In this case, a BlockQuicksort variant is used, where only element indices are processed branchlessly, and the actual data is moved using fewer swap operations.
Usage
You only need to include blqsort.h, and it can be used just as easily as std::sort.
#include "blqs.h"<br>double data[SIZE];<br>blqs::sort(data, data + SIZE);
Sorting Custom Data Structures
In practice, we often need to sort custom data structures. This is where SIMD libraries like Google Highway - while very fast for simple numbers - become difficult to use.
Using std::sort or blqs::sort gives you much more flexibility;
struct entry {<br>int32_t id;<br>int32_t value;<br>bool operator
Execution times for sorting 50 million of this structs.
Implementation<br>Apple M1<br>AMD Ryzen
std::sort<br>3.46s<br>4.75s
pdqsort<br>3.46s<br>4.72s
blqsort<br>0.97s<br>2.20s
About
Fast Branchless Quicksort with C++ Interface
Resources
Readme
License
MIT license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
C++<br>100.0%
You can’t perform that action at this time.