← All cheatsheets
Apache Flink® Cheatsheet
Key concepts - hover any entry for details
What is Apache Flink?

Apache Flink is a distributed stream-processing engine built for stateful computations over unbounded and bounded data streams. Unlike batch systems that wait for all data before processing, Flink processes each event as it arrives - with millisecond latency - while maintaining persistent state across millions of keys and guaranteeing exactly-once correctness even after failures.

Its unified API handles both streaming (continuous, never-ending) and batch (finite, historical) workloads with the same SQL or DataStream code. State is first-class: Flink checkpoints the entire pipeline to durable storage periodically, so any failure is fully recoverable with no data loss or duplication.

When Flink is the right fit

Less ideal for: pure ad-hoc SQL queries (use Trino/Athena), simple periodic batch reports with no streaming requirement (use Spark), or sub-millisecond latency (use in-memory DBs).

Often replaces: Apache Spark Streaming / Structured Streaming, Apache Storm, Apache Samza, or hand-rolled Kafka consumer apps with bespoke state management.

Project resources
Releases & stats
2.1.0 2.0.0 1.20.x 1.19.x 1.18.x 1.17.x Full history →
2.1.0
Latest (Apr 2026)
~24k
GitHub stars
~1,800
Contributors
2014
ASF top-level since