← All cheatsheets
Apache Iceberg Cheatsheet
Key concepts - hover any entry for details
What is Apache Iceberg?

Apache Iceberg is an open table format for huge analytic datasets stored on object storage (S3, GCS, ADLS) or HDFS. It sits above the data files (Parquet, ORC, Avro) and provides a transactional metadata layer that turns a pile of immutable files into a real table with ACID semantics, schema evolution, and time travel.

Unlike Hive tables - which track partitions in a metastore and rely on directory listings - Iceberg keeps an explicit, versioned tree of snapshots, manifests, and data files. That lets any compliant engine (Spark, Flink, Trino, Snowflake, DuckDB) read and write the same table safely, with predictable performance and no expensive directory scans.

When Iceberg is the right fit

Less ideal for: OLTP / point-row lookups (use a database), sub-second query latency on small data (use a warehouse or in-memory engine), or tiny datasets where the metadata overhead is bigger than the data itself.

Often replaces: Hive-managed Parquet/ORC tables, directory-layout-based Parquet datasets on S3/GCS/ADLS, and proprietary warehouse-only formats when an open, multi-engine table layer is wanted.

Project resources
Releases & stats
1.9.0 1.8.x 1.7.x 1.6.x 1.5.x 1.4.x Full history →
1.9.0
Latest (Mar 2026)
~7k
GitHub stars
~580
Contributors
2020
ASF top-level since