Apache Iceberg Cheatsheet

What is Apache Iceberg?

Apache Iceberg is an open table format for huge analytic datasets stored on object storage (S3, GCS, ADLS) or HDFS. It sits above the data files (Parquet, ORC, Avro) and provides a transactional metadata layer that turns a pile of immutable files into a real table with ACID semantics, schema evolution, and time travel.

Unlike Hive tables - which track partitions in a metastore and rely on directory listings - Iceberg keeps an explicit, versioned tree of snapshots, manifests, and data files. That lets any compliant engine (Spark, Flink, Trino, Snowflake, DuckDB) read and write the same table safely, with predictable performance and no expensive directory scans.

When Iceberg is the right fit

Open lakehouse - one table queryable from Spark, Flink, Trino, Snowflake, BigQuery without lock-in
Multi-engine writes - safe concurrent ingestion from streaming and batch jobs into the same table
Schema evolution - add, drop, rename columns without rewriting data or breaking old queries
Time travel & audit - query any past snapshot for debugging, compliance, or reproducible ML training
Row-level mutations - UPDATE, DELETE, MERGE in a data-lake setting (not just append-only)
Streaming sinks - Flink/Spark Structured Streaming continuously commit small files; compaction tidies up
Partition evolution - change how a table is partitioned without rewriting historical data
Petabyte tables - efficient planning without listing directories; manifests prune at file granularity

Less ideal for: OLTP / point-row lookups (use a database), sub-second query latency on small data (use a warehouse or in-memory engine), or tiny datasets where the metadata overhead is bigger than the data itself.

Often replaces: Hive-managed Parquet/ORC tables, directory-layout-based Parquet datasets on S3/GCS/ADLS, and proprietary warehouse-only formats when an open, multi-engine table layer is wanted.

Project resources

Releases & stats

1.9.0 1.8.x 1.7.x 1.6.x 1.5.x 1.4.x Full history →

1.9.0

Latest (Mar 2026)

~7k

GitHub stars

~580

Contributors

2020

ASF top-level since