BigDATAwire - Data Science • AI • Advanced Analytics
SUBSCRIBE
Shares
SHARE
-->
Off the Wire Press Releases
May 28, 2026
Netskope Says NewEdge Sets New SASE Benchmark for In-Country Data Sovereignty
Starburst Unveils Enterprise Intelligence Platform at AI & Datanova
ROVI DataHub Unifies Energy Storage Data to Accelerate Long Duration Battery Innovation
Lightbits Labs Announces Early, Initial Interoperability with Microsoft Windows Server NVMe-oF Initiator Preview
quantilope Launches quinn Search to Turn Research Archives Into AI-Powered Knowledge Base
Redwood Report Finds Fragmented Data Pipelines Are Holding Back AI in Financial Services
DataHub Launches Context Layer for AI Analytics Agents, Claims Near-90% Accuracy
Snowflake Announces Intent to Acquire Natoma, Providing Secure Connectivity For The Agentic Enterprise
Unravel Data Launches Arvix AI to Autonomously Optimize Databricks, Snowflake and BigQuery
May 27, 2026
LogicMonitor Makes AI the Front Door to IT Operations
View All Off the Wire
LinkedIn Introduces Northguard, Its Replacement for Kafka
Financial Services
by Alex Woodie |<br>June 25, 2025
Shares
Facing scalability limitations with Apache Kafka for log file management, LinkedIn developed a new publish-and-subscribe (pub/sub) system that didn’t face the same limitations. The replacement pub/sub system that LinkedIn developed is called Northguard, and it’s now actively migrating its Kafka-based data to Northguard through a virtualized pub/sub layer dubbed Xinfra, the company announced today.
When Jay Kreps and his LinkedIn engineer colleagues Jun Rao and Neha Narkhede created Apache Kafka back in 2010, the social media site had 90 million members. At that time, the company struggled with major latency issues as it tried to load about 1 billion files per day into its Hadoop-based data infrastructure. To address this challenge, Kreps and company developed Kafka as a distributed, fault-tolerant, high-throughput, and scalable platform for building real-time data pipelines.
Kafka was a big hit internally at LinkedIn, as it provided a virtualization layer between the creation (or publishers) of data and the consumers (or subscribers) of data. It was used extensively internally, and was donated to the Apache Software Foundation the following year. Kreps, Rao, and Narkhede left LinkedIn and in 2014 co-founded Confluent, which last year generated nearly $1 billion in revenue.
Over the years, LinkedIn’s business expanded, and Kafka remained a central component of its internal and user-facing systems and applications. However, at some point, the volume of data being generated within LinkedIn surpassed Kafka’s capabilities. Today, with 1.2 billion users, its pub/sub systems are asked to ingest more than 32 trillion records per day, accounting for 17 PB across 400,000 topics, which run on more than 150 clusters accounting for more than 10,000 individual nodes.
This scale of data has surpassed Kafka’s capabilities, according to LinkedIn engineers Onur Karaman and Xiongqi Wu. “….[A]s LinkedIn grew and our use cases became more demanding, it became increasingly difficult to scale and operate Kafka,” the engineers wrote in a post on the LinkedIn Engineering Blog today. “That’s why we’re moving to the next step on our journey with Northguard, a log storage system with improved scalability and operability.
The Kafka challenges centered on five main areas, according to Karaman and Wu. Scaling the Kafka clusters became increasingly difficult as LinkedIn added more use cases, which resulted in more data and more metadata. With 150 Kafka clusters to manage, load balancing was also an issue.
The availability of data was also challenge, particularly since data replication was handled at the individual partition level. Consistency also became a problem, particularly when LinkedIn traded off consistency in favor of availability (due to the aforementioned partition replication issue). Lastly, durability of data suffered from weak guarantees.
“We needed a system that scales well not just in terms of data, but also in terms of its metadata and cluster size, all while supporting lights-out operations with even load distribution by design and fast cluster deployments, regardless of scale,” Karaman and Wu wrote. “Additionally, we required strong consistency in both our data and metadata, along with high throughput, low latency, highly available, high durability, low cost, compatibility with various types of hardware, pluggability, and testability.”
Northguard is a new pub/sub system that will replace Kafka at LinkedIn (Image courtesy LInkedIn)
The solution that Karaman and Wu came up with is a log storage system called Northguard. The engineer describe the core characteristics of the new system:
“To achieve high scalability, Northguard shards its data and metadata, maintains minimal global state,...