This page introduces the Apache Iceberg C++ library (iceberg-cpp): its purpose, key capabilities, major subsystems, library variants, and how it maps to the Apache Iceberg table format specification. For build instructions see page 2 for detailed architecture see page 3.1
iceberg-cpp is a C++ implementation of the Apache Iceberg table format. It provides the data structures, algorithms, and catalog integrations required to read, write, and manage Iceberg tables from C++ applications or engines.
The library is written in C++23, licensed under Apache License 2.0, and is part of the Apache Software Foundation.
Minimum requirements:
| Requirement | Version |
|---|---|
| CMake | 3.25+ |
| GCC | 14+ |
| Clang | 16+ |
| MSVC | 2022+ |
Sources: README.md29-33
Apache Iceberg defines a table format specification covering schemas, partition specs, sort orders, snapshots, manifest files, and catalog contracts. iceberg-cpp implements this specification in C++:
| Spec Concept | C++ Implementation |
|---|---|
| Table metadata | TableMetadata struct, TableMetadataBuilder class |
| Schema | Schema, SchemaField, Type hierarchy |
| Partition spec | PartitionSpec, PartitionField, Transform |
| Sort order | SortOrder, SortField |
| Snapshot | Snapshot, SnapshotRef, SnapshotSummaryBuilder |
| Manifest file / list | ManifestFile, ManifestEntry, ManifestListWriter, ManifestReader |
| Catalog | Catalog abstract interface |
| Table requirements | TableRequirement hierarchy |
| Table updates | TableUpdate, PendingUpdate hierarchy |
Sources: src/iceberg/type_fwd.h1-213
The build system produces up to three distinct library targets depending on CMake flags:
ICEBERG_BUILD_STATIC / ICEBERG_BUILD_SHARED → iceberg
ICEBERG_BUILD_REST → iceberg_rest
ICEBERG_BUILD_BUNDLE → iceberg_bundle
Library variant diagram:
Sources: src/iceberg/CMakeLists.txt147-244
| Library | CMake Flag | Extra Dependencies | Description |
|---|---|---|---|
iceberg | (default) | nanoarrow, nlohmann_json, CRoaring, zlib | Core types, metadata, catalog interface, expressions, manifests |
iceberg_rest | ICEBERG_BUILD_REST=ON | cpr, OpenSSL, CURL | REST catalog client |
iceberg_bundle | ICEBERG_BUILD_BUNDLE=ON | Arrow, Parquet, Avro | Adds Avro/Parquet readers/writers and Arrow FileIO |
Sources: src/iceberg/CMakeLists.txt173-244 cmake_modules/IcebergThirdpartyToolchain.cmake512-525
The library is organized into seven functional areas. The diagram below maps subsystem names to their primary source directories and key classes.
Component map:
Sources: src/iceberg/CMakeLists.txt20-117 src/iceberg/CMakeLists.txt173-244 src/iceberg/type_fwd.h1-213
Defined in src/iceberg/type.h and src/iceberg/type_fwd.h. Provides the Iceberg type hierarchy:
BooleanType, IntType, LongType, FloatType, DoubleType, DecimalType, DateType, TimeType, TimestampType, TimestampTzType, StringType, UuidType, FixedType, BinaryTypeStructType, ListType, MapTypeType; primitives from PrimitiveType; nested from NestedTypeThe TypeId enum identifies each type. For complete documentation see page 3.3
TableMetadata (in src/iceberg/table_metadata.h) is the central data structure holding all table state: schema list, partition specs, sort orders, snapshots, and references. TableMetadataBuilder provides a fluent API for constructing or mutating metadata. For details see page 4.1
The Table class (src/iceberg/table.h) is the entry point for all table interactions. It exposes factory methods for update operations:
Table::NewTransaction() → TransactionTable::NewUpdateSchema() → UpdateSchemaTable::NewUpdatePartitionSpec() → UpdatePartitionSpecTable::NewFastAppend() → FastAppendTable::NewScan() → TableScanBuilderTransaction (src/iceberg/transaction.h) batches multiple PendingUpdate instances and commits them atomically through Catalog::UpdateTable. For details see pages 5.1 and 5.2
Table operation flow:
Sources: src/iceberg/table.h38-185 src/iceberg/transaction.h33-154 src/iceberg/transaction.cc302-343
Located in src/iceberg/expression/. Provides predicate push-down and filtering:
Expression: base type for And, Or, Not, PredicateUnboundPredicate / BoundPredicate: predicates before and after binding to a SchemaInclusiveMetricsEvaluator, StrictMetricsEvaluator, ManifestEvaluator, ResidualEvaluatorFor details see pages 7.3 and 7.4
Located in src/iceberg/manifest/. Implements Iceberg manifest file format (V1, V2, V3):
ManifestEntry, DataFile, ManifestFile, ManifestListManifestReader, ManifestListReaderManifestWriter, ManifestListWriter, RollingManifestWriterFor details see page 4.8
FileIO (src/iceberg/file_io.h) is an abstract interface for reading and writing files. ArrowFileSystemFileIO (in src/iceberg/arrow/arrow_fs_file_io.h) is the concrete implementation backed by Apache Arrow's filesystem layer. For details see page 8.5
Catalog (src/iceberg/catalog.h) is the abstract interface for namespace and table management. Two implementations are provided:
InMemoryCatalog (src/iceberg/catalog/memory/in_memory_catalog.cc) — in-process, for testingRestCatalog (in iceberg_rest) — HTTP-based catalog following the Iceberg REST Catalog specificationFor details see pages 6.1 and 6.2
All fallible operations return Result<T> or Status (defined in src/iceberg/result.h):
Result<T> = std::expected<T, Error>
Status = Result<void>
Error carries an ErrorKind enum and a message string. Common error kinds include kNotFound, kInvalidSchema, kCommitFailed, kIOError, and kValidationFailed. Utility macros ICEBERG_RETURN_UNEXPECTED and ICEBERG_ASSIGN_OR_RAISE simplify error propagation. For details see page 3.2
Sources: src/iceberg/result.h30-129
src/iceberg/
├── arrow/ # ArrowFileSystemFileIO, Arrow metadata utilities (bundle only)
├── avro/ # AvroReader, AvroWriter, schema utilities (bundle only)
├── catalog/ # Catalog interface; memory/ and rest/ subdirs
├── expression/ # Expression, Predicate, Literal, evaluators
├── manifest/ # ManifestReader, ManifestWriter, ManifestEntry
├── parquet/ # ParquetReader, ParquetWriter (bundle only)
├── row/ # StructLike, ArrowArrayWrapper, PartitionValues
├── update/ # PendingUpdate subclasses (UpdateSchema, FastAppend, etc.)
└── util/ # Bucket hashing, decimal, UUID, temporal, URL encoding, etc.
Sources: src/iceberg/CMakeLists.txt165-171 src/iceberg/CMakeLists.txt241-243
The table below maps Iceberg concepts to the primary C++ classes a consumer will interact with.
| Concept | Primary Class(es) | Header |
|---|---|---|
| Table entry point | Table, StagedTable, StaticTable | iceberg/table.h |
| Atomic changes | Transaction | iceberg/transaction.h |
| Update operations | UpdateSchema, UpdatePartitionSpec, UpdateSortOrder, UpdateProperties, FastAppend, ExpireSnapshots | iceberg/update/ |
| Schema | Schema, SchemaField | iceberg/schema.h |
| Types | Type, StructType, ListType, MapType, primitive types | iceberg/type.h |
| Table metadata | TableMetadata, TableMetadataBuilder | iceberg/table_metadata.h |
| Partitioning | PartitionSpec, PartitionField, Transform | iceberg/partition_spec.h |
| Snapshots | Snapshot, SnapshotRef | iceberg/snapshot.h |
| Catalog | Catalog, InMemoryCatalog, RestCatalog | iceberg/catalog.h |
| Scan planning | TableScanBuilder, DataTableScan, FileScanTask | iceberg/table_scan.h |
| Predicates | Expression, UnboundPredicate, BoundPredicate, Literal | iceberg/expression/ |
| File access | FileIO, Reader, Writer | iceberg/file_io.h |
| Error handling | Result<T>, Status, Error, ErrorKind | iceberg/result.h |
Sources: src/iceberg/type_fwd.h27-213
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.