Kafka
The following notes are all taken from reading HelloInterview:
Overview
When an event happens, the producer creates a message (also called a record) and sends it to a Kafka topic. Each message includes a required value field and three optional fields:
Key: Determines which partition the message goes to.
Timestamp: Helps to order messages within a partition.
Headers: Key-value pairs, similar to HTTP headers, used to store metadata about the message.

Partition Assignment: Kafka assigns messages to partitions based on their key. If a message has no key, it uses a round-robin or another set rule. Messages with the same key always go to the same partition, keeping them in order.
Broker Selection: Kafka identifies which broker handles the partition using cluster data. The producer then sends the message directly to that broker.

Terminology
Kafka Cluster
Composed of multiple brokers
More brokers = higher scalability for storage and client handling.
Broker
The servers (physical/virtual) that hold the "queue".
Stores data and manages client requests.
Partition
Ordered, immutable sequence of messages, like a log file.
Key for scaling, as partitions enable parallel message consumption.
Topic
Logical grouping of partitions.
Used for publishing and subscribing to data.
Supports multiple producers writing data simultaneously.
Topic vs Partition
Topic: Logical organization of messages.
Partition: Physical organization of messages (can span multiple brokers).
Producers and Consumers
Producers: Write data to topics.
Consumers: Read data from topics.
Kafka provides APIs for both but leaves message creation/processing to developers.
Message Queue vs Stream
Message Queue: Consumers acknowledge messages after processing.
Stream: Consumers process messages without acknowledgments, enabling complex processing.
Last updated
Was this helpful?