Now Loading

LinkedIn Debuts Northguard and Xinfra to Replace Kafka for Enterprise-Scale Log Handling

Northguard and Xinfra

LinkedIn has unveiled Northguard, a cutting-edge internal log storage system, alongside Xinfra, a new virtualization layer, both designed to supersede Apache Kafka and address the platform’s growing scalability and operability challenges.

After relying on Kafka for over 15 years, LinkedIn encountered limitations as its operational scale expanded—serving more than 1.2 billion members and processing tens of petabytes of data daily. These pressures motivated the development of Northguard, which introduces a modern architecture aimed at high-threshold performance and ease of management.

At the core of Northguard is a decentralized metadata management system that shares both data and metadata. By minimizing global state and balancing loads through log striping and segment-level replication, the solution avoids bottlenecks common in Kafka’s traditional partition-based design. It also uses Raft consensus for fault tolerance and supports advanced storage policies for data placement, replication and retention.

Complementing Northguard is Xinfra, a virtualization system for publish/subscribe infrastructure. Xinfra abstracts the underlying log systems, enabling applications to interact with unified Pub/Sub interfaces regardless of whether the data is stored in Kafka or Northguard clusters. Key features include transparent topic migration, dual writes during rollouts, epoch-based ordering, and unified metadata services. It also manages consumer group functionality using MySQL, Vitess, and ZooKeeper with low-latency caching from Couchbase.

Together, these innovations allow LinkedIn to transition thousands of mission-critical topics to Northguard without service interruptions. Xinfra currently supports over 90% of LinkedIn’s internal applications, ensuring a smooth migration path for existing Kafka deployments. The shift signifies a broader trend within LinkedIn’s infrastructure: prioritizing decentralized control, operational simplicity, and system resilience. Future enhancements include support for auto-scaling topics and even greater fault tolerance for virtualized workloads.

Reactions from the engineering community have been mixed. While Northguard addresses significant scalability issues, some developers have raised integration concerns, as it diverges from Kafka’s existing ecosystem. LinkedIn has mitigated these challenges through Xinfra’s compatibility layer, though some translation complexity remains .

Introduced during an internal meetup in April, the announcement featured talks led by senior engineers Onur Karaman (Northguard) and Wesley Wu (Xinfra), both formerly core contributors to Kafka’s scalability features within LinkedIn.

Northguard and Xinfra mark LinkedIn’s move beyond its own open-source contribution, Kafka, toward a self-optimized streaming infrastructure capable of sustaining future growth. Although not open-source, the platform’s in-house design may influence next-generation Pub/Sub solutions across the industry. As large-scale event streaming becomes a critical enterprise need, LinkedIn’s experience may offer valuable lessons in designing scalable, resilient, and tamed log management systems.

Upcoming Conferences