Apache Pulsar 5.0.0-M1: Introducing Self-Managing Scalable Topics

Apache Pulsar 5.0.0-M1: A Preview of the Next Major Release | Apache Pulsar

-->

✨ Apache Pulsar 5.0.0-M1 is here! ✨

The Apache Pulsar community is pleased to announce Apache Pulsar 5.0.0-M1 , the first milestone on the road to Pulsar 5.0. It is a preview : an early build that puts the major new features of 5.0 in your hands so you can try them against real workloads and send feedback well ahead of the general-availability (GA) release. It is not meant for production .

5.0 is a major release, and two changes stand out: Scalable Topics — a new kind of topic that grows and shrinks on its own — and the promotion of Oxia to Pulsar's recommended metadata store. This post walks through what's in M1 and how to try it.

A preview, built for feedback

M1 is a milestone build, not a final release. We're publishing it early, for one reason: the changes in 5.0 are far-reaching, and we want real-world feedback before they're finalized for GA.

So please run it on non-production clusters, exercise the new APIs, and tell us what works and what doesn't — open issues on GitHub or start a thread on the dev@pulsar.apache.org mailing list. The feedback you give now directly shapes what 5.0 becomes at GA.

Scalable Topics: topics that size themselves

A topic should be a logical concept — a named stream you publish to and consume from. As an application developer, you shouldn't have to think about the infrastructure that makes that stream fast. Yet for its entire history, Pulsar has asked you to answer an infrastructure question up front that has nothing to do with your application: how many partitions?

That single number is a guess that's easy to get wrong and hard to undo. Choose too few and you cap your throughput; choose too many and you pay for overhead a quiet topic never needed. You can raise the count later but never lower it, and changing it breaks per-key ordering. The decision pulls infrastructure concerns into application design — and demands them before you even know how the topic will be used.

Scalable topics take that decision away. A scalable topic — addressed with the new topic:// scheme — is a single logical stream that Pulsar sizes to its actual load. Internally, it's a set of key-range segments that the broker splits when part of the keyspace gets hot and merges when it goes cold: at runtime, with no downtime, and without ever breaking the ordering of an individual key.

The goal is for one topic type to be the right choice in every situation, transparently and out of the box — from a single firehose pushing tens of gigabytes per second to millions of tiny topics that each carry a trickle of traffic. You model your application around the topics that fit your domain, and the system adapts continuously to how they're actually used. No capacity planning, no re-sharding, no hard decisions forced on developers.

Scalable topics are delivered by a family of proposals in 5.0:

PIP-460: Scalable Topics (Topics v5) — the overall model: the segment DAG, range-based key routing, and design principles.

PIP-468: Scalable Topic Controller — the broker-side controller that runs split/merge, assigns consumers, and pushes the live topology to clients.

PIP-483: Auto Split/Merge — splits hot segments and merges cold ones automatically, tunable per broker, namespace, and topic.

PIP-466: New Java Client API (V5) — a modern, sync-first Java client built for scalable topics.

PIP-473: Metadata-Driven Transactions — transactions that survive segment splits and merges.

PIP-475: Regular-to-Scalable Migration — convert an existing topic to a scalable topic in place, with no data copy.

A new client API

The V5 client is groundbreaking work in its own right. Over more than a decade, Pulsar's client API grew one feature at a time, accumulating options, overloads, and subtle inconsistencies along the way. The V5 client is a clean-slate redesign that distills those years of lessons — learned from users running Pulsar in production — into a focused API: it keeps the capabilities that matter and sheds the noise and rough edges that built up over time.

Consumption is the clearest example. The classic client offers a single Consumer shaped by one of four subscription types — Exclusive, Failover, Shared, Key_Shared — plus a separate Reader, with behavior that shifts subtly as you combine options. The V5 client replaces all of that with three purpose-built consumers, each exposing exactly the operations that make sense for it:

Stream consumer — ordered consumption with cumulative acknowledgment.

Queue consumer — parallel, individually-acknowledged work-queue consumption with dead-letter support.

Checkpoint consumer — for stream processors such as Flink and Spark that track their own position.

For now, the classic client API remains fully supported, and existing applications keep working unchanged; scalable topics, however, are available only through the V5 API. Longer term, scalable topics are designed to...

Apache Pulsar 5.0.0-M1: Introducing Self-Managing Scalable Topics

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI