Datadog costs 7× more than self-hosting for the same telemetry

ReyX1 pts0 comments

The Price of Observability: Why Your Monitoring Bill Exceeds Your Infrastructure Bill (2026) — CoreTechX

Executive Summary

There is a number that doesn't appear on any vendor pricing page, but which every CTO at scale eventually discovers: observability now consumes 15–25% of total cloud infrastructure spend at the median mid-market SaaS company. For some, it's higher. I've spoken with teams where Datadog alone exceeds their AWS compute bill. Not their cloud bill — their compute bill. The part that actually runs the product.

This isn't an accident. The SaaS observability pricing model — per-host fees stacked on per-GB ingestion stacked on per-event indexing stacked on per-metric cardinality charges — is structurally designed to grow faster than the infrastructure it monitors. Every new microservice, every autoscaling event, every developer who adds a customer_id tag to a metric: the bill ticks up. The infrastructure scales with revenue. The monitoring bill scales with both revenue and architectural complexity, which in modern distributed systems grow faster than revenue.

This briefing runs the numbers. We model five observability solutions — Datadog, Grafana Cloud, Splunk Observability, New Relic, and a self-built LGTM stack (Loki + Grafana + Tempo + Mimir) — across three realistic workload scales. We include infrastructure costs, licensing, and the fully-burdened engineering overhead of self-hosting. The gap between the cheapest and most expensive option is not a percentage. It's an order of magnitude.

The Pricing Models: How We Got Here

To understand why observability bills spiral, you have to understand the pricing models. Each vendor charges for a different combination of dimensions. The bill isn't one number — it's a matrix multiplication.

Vendor Pricing Anatomy (Public Rates, June 2026)

VendorHost-BasedIngest-BasedThe Trap<br>Datadog Infra $15–23/host/mo<br>APM $31–40/host/mo<br>DB Monitoring $70/host/moLogs: $0.10/GB ingested<br>+ $1.70/million events indexed<br>APM: $1.70/million spansCustom metrics: OTel data billed as custom. High-cardinality tags multiply metric count silently.<br>Splunk Observability Infra $15/host/mo<br>APM+Infra $60/host/mo<br>Full-Stack $75/host/moTraces: $50–100/million<br>+ Splunk Cloud logging $130–180/GB/daySplunk Cloud + Observability are separate products. Combined, the per-GB-day model is brutal at scale.<br>New Relic Per-user pricing (free for 1 full user)Data: $0.30–0.50/GB ingested<br>Free tier: 100 GB/mo100 GB cap is easily breached. Above it, per-GB pricing converges with Datadog.<br>Grafana Cloud $15/active user/mo<br>+ $19/mo platform feeMetrics: $8/1K series<br>Logs: $0.50/GB<br>Traces: $0.50/GBAdaptive Metrics/Logs reduce bill but require configuration. Free tier: 10K series, 50GB logs, 50GB traces.<br>Self-Built LGTM $0 — software is free$0 — you own the pipeEngineering time: 10–20 hrs/mo SRE at $75–150/hr. ClickHouse/TSDB tuning is not a side project.

Sources: Vendor public pricing pages accessed June 2026. Self-built estimates based on community benchmarks and our own infrastructure modeling. Enterprise discounts can reduce list prices by 20–40% at scale, but the relative ordering between vendors is stable.

The structural problem across all SaaS vendors is the same: the cost function is multiplicative, while the value function is sub-linear . Every new dimension you pay for — hosts, gigabytes, events, metrics, traces, users — multiplies against the others. A single poorly-tagged metric with a user_id dimension on Datadog can generate millions of unique time series and add five figures to an annual bill overnight. I've seen it happen. The engineer who added the tag had no idea. The billing system noticed immediately.

Workload A: The 20-Person Seed-Stage Startup

15 services · 30 hosts (K8s) · 50 GB logs/day · 1M spans/day · 14-day retention

Scenario

A seed-stage SaaS. Fifteen Node.js microservices running on a 30-node Kubernetes cluster. Log volume is moderate — around 50 GB/day uncompressed (structured JSON, ~500 events/second at 1 KB each). They're generating 1 million spans per day from their service mesh. They need 14-day retention for logs and traces, 30-day for metrics. Team of 20 engineers, no dedicated SRE — the CTO is on call.

At this scale, the cost differences between vendors are real but not yet existential. The operational overhead of self-hosting is the dominant variable, not the SaaS premium.

SolutionMonthly CostAnnualNotes<br>Datadog$4,200 $50,400Infra Pro ($15/host × 30) + APM ($31/host × 30) + logs ($0.10/GB × 1,500 GB/mo) + spans ($1.70/M × 30M). No custom metric overages yet.<br>Splunk Observability$1,800 $21,600App & Infra tier ($60/host × 30). Traces included at this volume. Logs via separate Splunk Cloud would add substantially.<br>New Relic$750 $9,000Free tier (100 GB) covers first ~2 days. Overages at $0.30/GB for remaining 1,400 GB = $420. Plus 3 paid users at $49/mo each.<br>Grafana Cloud$850 $10,200Logs: $0.50/GB × 1,500 GB = $750. Traces: $0.50/GB × ~50 GB estimated =...

bill host observability pricing cloud datadog

Related Articles