Ten years of ClickHouse in open source<br>Open searchOpen region selectorEnglish<br>Japanese
48.0kSign inGet started
->Scroll to top
BackBlog<br>Engineering<br>Copy pageCopied!More actionsView as Markdown Open this page in Markdown<br>Open in ChatGPT Ask questions about this page<br>Open in Claude Ask questions about this page<br>Open in v0 Ask questions about this page
Ten years of ClickHouse in open source
Alexey Milovidov<br>Jun 15, 2026 · 17 minutes read
ClickHouse was released in open source on Jun 15 2016, ten years ago. Since then, it became the most popular open source analytical database with more than 2000 contributors.
Building in the open #
There are different levels of open-source.
Level 0 : The minimum level is making the code open to the public for reading, but nothing more. This is the case of archival and museum releases, such as Doom or MS-DOS.
Level 1 : The next level is when the software is updated by commits in a public repository, but not necessarily accepting contributors. This is also an example of open source. SQLite and Ladybird are examples.
Level 2 : Accepting contributions but without a transparent and open development process. Most active open-source projects are on this level.
Level 3 : Open contribution guidelines, task tracker, code review system, development roadmap, testing and CI system, release cycle, user support, and documentation.
I always aim for the maximum. ClickHouse should be the best example of:
How to build a great database - if you want to build a new database, ClickHouse source code and development practices will serve as the best example. I always write the code so everyone can learn from it - by keeping it modular, orthogonal, and well-documented. When the code requires a complex concept, I explain it in the comments from scratch, so the readers don't have to refer to textbooks, Wikipedia, or AI.
A place to learn C++ development. Many people are looking for repositories representing the frontier of software engineering, and today ClickHouse is one of the most popular open source repositories in C++, where everyone can learn both the exciting stuff (C++23) and boring stuff (build systems, continuous integration and testing, code review practices, and AI).
A place for experiments on data structures and performance optimization. You can open a pull request as an experiment, without aiming for it to be merged - it will be tested with the same level of scrutiny as production releases. Found a new memory allocator, a new compression library, a new hash table, a data format, or a sorting algorithm? - bring it to ClickHouse, and it will expose it inside-out. The roadmap also includes a section about experimental, weird, and even ridiculous things.
Where you can be proud of your work. ClickHouse credits every contributor in the changelog and even inside the database in the system.contributors table! There are countless cases when a contributor sends an initial, incomplete implementation of a feature, and we help to finish it together. Even if the code has to be entirely rewritten, we do it proactively and take the responsibility for that, and always credit the initial author, because we care about your use case and the initial intent that made it happen. To put it simply, we love our contributors.
Before open source #
Prototypes and first commits #
The first commit in ClickHouse was made on May 29, 2009, and it was a performance optimization (a replacement of libc functions localtime, mktime, gmtime, which were extremely slow and annoyed me by showing up in the profiler). But it was before ClickHouse existed.
ClickHouse started as my experiment while I was working on data processing for a web analytics system. The system, similar to Google Analytics, received logs about pageviews sent from websites, and it was implemented with MySQL, data processing in C++, and custom data structures in C++ where MySQL couldn't suffice. The MySQL databases stored pre-aggregated reports for customers, and custom data structures used for calculating user sessions, user history, and similar stuff.
My experience from that time was - the data volume is growing, nothing works, and the new data appears in real time. If we can't process a five-minute chunk of logs in five minutes, there will be a delay. I will search for any creative solution while the delay accumulates and deploy it on the same work day.
That's how I was searching for any solution that works - any type of databases, any libraries, etc. Can we use TokuDB? Colleagues use LMDB, maybe it will save us? Let's try Judy Arrays. Someone at lunch told us about Hadoop, should we use it? I heard briefly about LZO and QuickLZ in the corridor - let's try it. If we store HyperLogLogs in MySQL BLOBs, how will we sum them? On a weekend, I will read that data compression book or the documentation on event-loop servers...
While stabilizing the data pipeline, I was also thinking about new features that I can bring to the product. If we...