Kafka

The following notes are all taken from reading HelloInterview:

Overview

When an event happens, the producer creates a message (also called a record) and sends it to a Kafka topic. Each message includes a required value field and three optional fields:

  1. Key: Determines which partition the message goes to.

  2. Timestamp: Helps to order messages within a partition.

  3. Headers: Key-value pairs, similar to HTTP headers, used to store metadata about the message.

Kafka Message Structure

Partition Assignment: Kafka assigns messages to partitions based on their key. If a message has no key, it uses a round-robin or another set rule. Messages with the same key always go to the same partition, keeping them in order.

Broker Selection: Kafka identifies which broker handles the partition using cluster data. The producer then sends the message directly to that broker.

Kafka Architecture

Terminology

Kafka Cluster

  • Composed of multiple brokers

  • More brokers = higher scalability for storage and client handling.

Broker

  • The servers (physical/virtual) that hold the "queue".

  • Stores data and manages client requests.

Partition

  • Ordered, immutable sequence of messages, like a log file.

  • Key for scaling, as partitions enable parallel message consumption.

Topic

  • Logical grouping of partitions.

  • Used for publishing and subscribing to data.

  • Supports multiple producers writing data simultaneously.

Topic vs Partition

  • Topic: Logical organization of messages.

  • Partition: Physical organization of messages (can span multiple brokers).

Producers and Consumers

  • Producers: Write data to topics.

  • Consumers: Read data from topics.

  • Kafka provides APIs for both but leaves message creation/processing to developers.

Message Queue vs Stream

  • Message Queue: Consumers acknowledge messages after processing.

  • Stream: Consumers process messages without acknowledgments, enabling complex processing.

Last updated

Was this helpful?