Towards approachable observability with wide events

beingflo1 pts0 comments

Towards approachable observability with wide eventsmarending<br>← back ← back 17 May 2026<br>Towards approachable observability with wide events<br>A tragedy in four acts<br>I like to think about how to get operational insights into the services I host. It’s become kind of<br>a hobby, to be honest. Maybe this whole thing is just bike shedding. What better way to avoid the<br>crushing reality that an actual project you had in mind doesn’t live up to your expectations than by<br>not even attempting it, and instead building infrastructure around your existing applications? Maybe<br>that’s a topic for therapy.

Anyhow, I realized during my latest manic observability episode that I barely even remember how I<br>got here. So for posterity, I’m outlining the history and motivations for all the different phases<br>in this note.

Act 1: Metrics and Prometheus

A couple of years ago I got the itch to play around with embedded devices. I got myself an ESP32 and<br>a CO2 sensor and got to work. With a little elbow grease, I had the current CO2 concentration<br>displayed on a small seven segment display. Soon, I graduated to wanting to store and chart this<br>data.

At this point I had little knowledge on how to approach this, so I went the well-trodden path:<br>Prometheus to store time series data and Grafana to visualize it. Since my microcontroller was not<br>internet accessible, I had to put a push gateway in<br>front of Prometheus to allow pushing data into the system, instead of having Prometheus scrape the<br>metrics off the device on a regular schedule.

Around this time I also built and deployed this website. I had seen a cool statistics section on<br>another blog, where one could view how many page views the blog was getting and which pages were<br>popular. Wanting the same for this site, I initially transformed the static site into one with a<br>NodeJS backend just for this feature. After a short while I realized I wasn’t happy with that and<br>ripped it out again, planning instead to track visitors with an external service. You can watch this<br>unfold in this old note.

I could have used my existing Prometheus setup to scrape Caddy for access metrics, but that wouldn’t<br>have given me all the data I wanted. I’d need<br>a dedicated service serving the site and exposing metrics. So this kicked off<br>the realization that I needed a broader observability system than what Prometheus could give me.<br>What was always clear to me, by the way, is that I wanted one unified system for metrics and service<br>monitoring.

In hindsight, I also realize there was another push factor: PromQL. I got annoyed with learning a<br>language with such a narrow use case.

Act 2: Metrics with DuckDB

With all this in place, the time was ripe to venture out. And boy did I. The requirements for my<br>next system were clear: it should accept arbitrary JSON payloads (like sensor readings or access<br>logs), store them efficiently, and allow querying them in an ergonomic fashion.

Notice the shift away from time series data to something more general, something that could cover<br>multiple use cases. You can follow the journey from<br>musing on how to store that data to<br>benchmarking suitable databases in my notes.

Coming away from PromQL, I was craving more expressive power for querying and transforming data. So<br>I started looking into building a WebAssembly plugin system to safely<br>execute efficient transformations server-side. For visualization, I wasn’t satisfied with Grafana<br>anymore either — I wanted to go custom there as well. So I built a SolidJS frontend with<br>responsive plots using Observable Plot. All this started taking shape<br>under the name observatory.

You might already be guessing that this would not go so well. Indeed, the sensor readings and GPS<br>location data from my phone were handled fine, but I was missing a puzzle piece on how access logs<br>fit in. How would I instrument my services in such a way as to get the data into observatory in a<br>queryable form? I found the answer in the form of<br>traces. At first I wanted to build a<br>tracing collector into observatory, but that proved to be difficult. Instead, I turned towards a<br>more off-the-shelf stack to ingest tracing data.

Act 3: Tracing with ClickHouse

When I learned about tracing — not just in the context of distributed systems, but the general<br>concept of spans and events — I felt like I had hit the jackpot. Events you emit within spans mimic<br>log lines, except they are naturally associated with a particular request. No correlation id needed<br>(it’s the span id, and the tooling handles it for you). Metrics are just hardcoded aggregations done<br>in the application, which lends itself to very space-efficient storage in exchange for limited<br>flexibility. With traces, you can generate any metrics you might care about after the fact. Tracing<br>is kind of a one-stop-shop for observability.

So I set up an OpenTelemetry collector that would receive traces and store them in a<br>ClickHouse database. That part made a lot of sense, and instrumenting my<br>services with the excellent tracing...

data metrics observability prometheus tracing store

Related Articles