Towards approachable observability with wide events

Towards approachable observability with wide eventsmarending ← back ← back 17 May 2026 Towards approachable observability with wide events A tragedy in four acts I like to think about how to get operational insights into the services I host. It’s become kind of a hobby, to be honest. Maybe this whole thing is just bike shedding. What better way to avoid the crushing reality that an actual project you had in mind doesn’t live up to your expectations than by not even attempting it, and instead building infrastructure around your existing applications? Maybe that’s a topic for therapy.

Anyhow, I realized during my latest manic observability episode that I barely even remember how I got here. So for posterity, I’m outlining the history and motivations for all the different phases in this note.

Act 1: Metrics and Prometheus

A couple of years ago I got the itch to play around with embedded devices. I got myself an ESP32 and a CO2 sensor and got to work. With a little elbow grease, I had the current CO2 concentration displayed on a small seven segment display. Soon, I graduated to wanting to store and chart this data.

At this point I had little knowledge on how to approach this, so I went the well-trodden path: Prometheus to store time series data and Grafana to visualize it. Since my microcontroller was not internet accessible, I had to put a push gateway in front of Prometheus to allow pushing data into the system, instead of having Prometheus scrape the metrics off the device on a regular schedule.

Around this time I also built and deployed this website. I had seen a cool statistics section on another blog, where one could view how many page views the blog was getting and which pages were popular. Wanting the same for this site, I initially transformed the static site into one with a NodeJS backend just for this feature. After a short while I realized I wasn’t happy with that and ripped it out again, planning instead to track visitors with an external service. You can watch this unfold in this old note.

I could have used my existing Prometheus setup to scrape Caddy for access metrics, but that wouldn’t have given me all the data I wanted. I’d need a dedicated service serving the site and exposing metrics. So this kicked off the realization that I needed a broader observability system than what Prometheus could give me. What was always clear to me, by the way, is that I wanted one unified system for metrics and service monitoring.

In hindsight, I also realize there was another push factor: PromQL. I got annoyed with learning a language with such a narrow use case.

Act 2: Metrics with DuckDB

With all this in place, the time was ripe to venture out. And boy did I. The requirements for my next system were clear: it should accept arbitrary JSON payloads (like sensor readings or access logs), store them efficiently, and allow querying them in an ergonomic fashion.

Notice the shift away from time series data to something more general, something that could cover multiple use cases. You can follow the journey from musing on how to store that data to benchmarking suitable databases in my notes.

Coming away from PromQL, I was craving more expressive power for querying and transforming data. So I started looking into building a WebAssembly plugin system to safely execute efficient transformations server-side. For visualization, I wasn’t satisfied with Grafana anymore either — I wanted to go custom there as well. So I built a SolidJS frontend with responsive plots using Observable Plot. All this started taking shape under the name observatory.

You might already be guessing that this would not go so well. Indeed, the sensor readings and GPS location data from my phone were handled fine, but I was missing a puzzle piece on how access logs fit in. How would I instrument my services in such a way as to get the data into observatory in a queryable form? I found the answer in the form of traces. At first I wanted to build a tracing collector into observatory, but that proved to be difficult. Instead, I turned towards a more off-the-shelf stack to ingest tracing data.

Act 3: Tracing with ClickHouse

When I learned about tracing — not just in the context of distributed systems, but the general concept of spans and events — I felt like I had hit the jackpot. Events you emit within spans mimic log lines, except they are naturally associated with a particular request. No correlation id needed (it’s the span id, and the tooling handles it for you). Metrics are just hardcoded aggregations done in the application, which lends itself to very space-efficient storage in exchange for limited flexibility. With traces, you can generate any metrics you might care about after the fact. Tracing is kind of a one-stop-shop for observability.

So I set up an OpenTelemetry collector that would receive traces and store them in a ClickHouse database. That part made a lot of sense, and instrumenting my services with the excellent tracing...

Towards approachable observability with wide events

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast