Apache Kafka Cheatsheet

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform built around an immutable, append-only commit log. Producers write records to topics; consumers read at their own pace; the log retains data on disk for hours, days, or forever. Unlike a traditional message queue, Kafka does not delete messages on consumption - many independent consumers can replay the same stream.

Kafka scales horizontally by partitioning each topic across brokers and replicating partitions for durability. It is the de-facto backbone for event-driven architectures: connecting microservices, feeding data lakes, powering CDC pipelines, and serving as the source-of-truth log for stream processors like Flink.

When Kafka is the right fit

Event backbone - one durable log many services can read independently without point-to-point coupling
CDC pipelines - capture database changes and fan them out to lakes, search indexes, caches
Log aggregation - collect logs and metrics from thousands of hosts into a single replayable stream
Stream processing source - durable input for Flink, Kafka Streams, Spark with replay on failure
Decoupled microservices - asynchronous communication with backpressure and history
High-throughput ingestion - millions of events/sec with horizontal scale-out
Event sourcing - the log itself is the system of record; rebuild state by replay
Audit & compliance - immutable record of every event for retention windows of months or years

Less ideal for: request/response RPC (use gRPC), small low-throughput queues with complex routing (use RabbitMQ), or per-message TTL/priority semantics (Kafka is FIFO per partition only).

Often replaces: RabbitMQ, ActiveMQ, IBM MQ and other JMS brokers at scale; ad-hoc HTTP webhook fan-out; in-house log shippers feeding a data lake.

Project resources

Releases & stats

4.1.0 4.0.0 3.9.x 3.8.x 3.7.x 3.6.x Full history →

4.1.0

Latest (Apr 2026)

~30k

GitHub stars

~1,400

Contributors

2012

ASF top-level since