> For the complete documentation index, see [llms.txt](https://notes.mikaelsamvelian.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://notes.mikaelsamvelian.com/system-design/kafka/advanced-topics.md).

# Advanced Topics

For scaling Kafka, focus on partitioning (key choice and number of partitions) and adding brokers. For fault tolerance, use replication and track consumer offsets. To improve performance, batch and compress messages, and always think about efficient partitioning.

## **Kafka Broker Constraints**:

* A single broker can store **\~1TB** and handle **\~10,000 messages/sec** (depends on hardware).
* Keep Kafka messages small **(<1MB)** for optimal performance; Kafka is not for storing large files.
* Use Kafka for small messages like pointers (e.g., store large videos in S3, not in Kafka).

## Scalability

* **Horizontal Scaling**: Add more brokers to distribute load. Ensure enough partitions to utilize all brokers.
* **Partitioning Strategy**: Choose a good key for partitioning (e.g., ad ID). A bad key can cause "hot partitions" (overloaded).
* Use random partitioning, salting (adding randomness), or compound keys to handle hot partitions.

## Fault Tolerance & Durability

* **Replication**: Each partition is replicated to ensure durability. The replication factor (e.g., 3) defines how many replicas exist.
  * The replication factor should not exceed the total number of brokers in your cluster.
  * **Rule of Thumb**:
    * Ensure the replication factor is less than or equal to the number of brokers.
* **Producer Acknowledgments (acks)**: Set `acks=all` for maximum durability—ensures all replicas confirm receipt of a message.
* **Consumer Recovery**: Offsets are tracked to ensure consumers can pick up where they left off if they crash. Rebalancing happens automatically if a consumer fails.

## Errors & Retries

* **Producer Retries**: Kafka producers automatically retry sending failed messages with a configurable number of attempts.
* **Consumer Retries**: Kafka doesn't handle retries for consumers natively, but you can set up a separate "dead letter queue" (DLQ) for retrying or logging failed messages.

## Performance Optimizations

* **Batching**: Send messages in batches to reduce overhead. Adjust `maxSize` and `maxTime` for better throughput.
* **Compression**: Compress messages (e.g., GZIP) to improve speed by reducing message size.
* **Partitioning Strategy**: Ensure even distribution across partitions for better parallelism and throughput.

## Retention Policies

* Kafka allows setting a retention period for messages via `retention.ms` (time-based) or `retention.bytes` (size-based). The default is 7 days or 1GB.
* If you need longer storage, adjust retention settings—but be mindful of storage costs and performance trade-offs.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://notes.mikaelsamvelian.com/system-design/kafka/advanced-topics.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
