# Kafka

The following notes are all taken from reading [HelloInterview](https://www.hellointerview.com/learn/system-design/deep-dives/kafka):&#x20;

## &#x20;Overview

When an event happens, the producer creates a message (also called a record) and sends it to a Kafka topic. Each message includes a required **value** field and three optional fields:

1. **Key**: Determines which partition the message goes to.
2. **Timestamp**: Helps to order messages within a partition.
3. **Headers**: Key-value pairs, similar to HTTP headers, used to store metadata about the message.

<div align="center"><figure><img src="/files/YPiJCnpu7cbw5TOxUVgw" alt="" width="338"><figcaption><p>Kafka Message Structure</p></figcaption></figure></div>

**Partition Assignment:** Kafka assigns messages to partitions based on their key. If a message has no key, it uses a round-robin or another set rule. Messages with the same key always go to the same partition, keeping them in order.

**Broker Selection:** Kafka identifies which broker handles the partition using cluster data. The producer then sends the message directly to that broker.

<figure><img src="/files/dhAZKxcFFOzxo2kR1N2h" alt=""><figcaption><p>Kafka Architecture</p></figcaption></figure>

## Terminology

### **Kafka Cluster**

* Composed of multiple **brokers**
* More brokers = higher scalability for storage and client handling.

### **Broker**

* The servers (physical/virtual) that hold the "queue".
* Stores data and manages client requests.

### **Partition**

* Ordered, immutable sequence of messages, like a log file.
* Key for scaling, as partitions enable parallel message consumption.

### **Topic**

* Logical grouping of **partitions**.
* Used for publishing and subscribing to data.
* Supports multiple producers writing data simultaneously.

### **Topic vs Partition**

* **Topic**: Logical organization of messages.
* **Partition**: Physical organization of messages (can span multiple brokers).

### **Producers and Consumers**

* **Producers**: Write data to topics.
* **Consumers**: Read data from topics.
* Kafka provides APIs for both but leaves message creation/processing to developers.

### **Message Queue vs Stream**

* **Message Queue**: Consumers acknowledge messages after processing.
* **Stream**: Consumers process messages without acknowledgments, enabling complex processing.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://notes.mikaelsamvelian.com/system-design/kafka.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
