This page documents how schemas evolve over time in the Iceberg C++ library. It covers the data structures that track schema history in TableMetadata, the three update operations (AddSchema, SetCurrentSchema, RemoveSchemas), how schema IDs and field IDs are managed, and the side effects that schema changes have on partition specs and sort orders.
For general schema structure and field lookup, see Schema Definition and Fields. For how update operations are batched into commits, see Transaction System. For the TableMetadataBuilder API in full, see TableMetadata and TableMetadataBuilder.
A table accumulates schemas over its lifetime. The TableMetadata struct stores all historical schemas and tracks which one is currently active.
Schema-related fields in TableMetadata
| Field | Type | Purpose |
|---|---|---|
schemas | vector<shared_ptr<Schema>> | All schemas ever added to the table |
current_schema_id | int32_t | ID of the schema currently in use |
last_column_id | int32_t | Highest field ID ever assigned across all schemas |
Sources: src/iceberg/table_metadata.h93-97
The schemas vector retains old schemas so that older snapshots — which may reference previous schema versions — remain readable. The last_column_id is a monotonically increasing counter; it ensures that no two fields across the entire history of the table ever share the same ID.
TableMetadata schema evolution data model
Sources: src/iceberg/table_metadata.h93-97 src/iceberg/schema.h49-198
Each Schema object carries an integer schema_id. The initial schema of a new table receives Schema::kInitialSchemaId = 0. Subsequent schemas receive IDs assigned by TableMetadataBuilder::Impl::ReuseOrCreateNewSchemaId, which:
Schema::SameSchema() to find a structural match ignoring the schema ID.AddSchema idempotent).max_existing_id + 1.Sources: src/iceberg/schema.h51 src/iceberg/schema.h176
Field IDs are stable integers attached to every SchemaField. They are the primary key for matching fields across schema versions — field names may be renamed, but IDs persist.
The TableMetadata::last_column_id field records the highest field ID ever assigned. AddSchema updates this value whenever a schema with higher field IDs is added. No new field may be given an ID ≤ last_column_id.
When TableMetadata::Make creates a new table, it calls AssignFreshIds to normalize all incoming field IDs into a dense sequential range starting at 1.
AssignFreshIds(Schema::kInitialSchemaId, schema, next_id)
The next_id lambda increments last_column_id and returns the new value. AssignFreshIds traverses the full nested type tree (struct fields, list elements, map keys and values) in depth-first order. After this step, FreshPartitionSpec and FreshSortOrder rebuild the partition spec and sort order using the remapped source field IDs.
Sources: src/iceberg/table_metadata.cc204-233 src/iceberg/test/assign_id_visitor_test.cc91-183
Fresh ID assignment traversal order (nested schema)
Sources: src/iceberg/test/assign_id_visitor_test.cc111-170
Schema evolution is expressed through three TableUpdate subclasses. All three implement ApplyTo(TableMetadataBuilder&) and GenerateRequirements(TableUpdateContext&).
Schema-related TableUpdate subclasses
Sources: src/iceberg/table_update.h154-284 src/iceberg/table_update.cc81-225
AddSchematable::AddSchema::ApplyTo calls TableMetadataBuilder::AddSchema(schema_), which delegates to TableMetadataBuilder::Impl::AddSchema. The internal method:
ReuseOrCreateNewSchemaId to find or assign an ID.metadata_.schemas and schemas_by_id_, last_column_id is updated if the new schema's highest field ID is greater, and the change is recorded in changes_.last_added_schema_id_ to the new ID.GenerateRequirements calls context.RequireLastAssignedFieldIdUnchanged(), which emits an AssertLastAssignedFieldId requirement. This prevents two concurrent writers from assigning overlapping field IDs.
Sources: src/iceberg/table_update.cc83-107 src/iceberg/table_update.h154-176
SetCurrentSchematable::SetCurrentSchema::ApplyTo calls TableMetadataBuilder::SetCurrentSchema(schema_id_), which delegates to TableMetadataBuilder::Impl::SetCurrentSchema. The value schema_id_ may be the literal sentinel kLastAdded = -1, which the builder resolves to last_added_schema_id_.
Side effects: switching the current schema requires rebinding all partition specs and sort orders to the new schema. The builder iterates metadata_.partition_specs and metadata_.sort_orders, rebuilds each one with the same field IDs but validated against the new schema, and replaces the in-memory vectors.
GenerateRequirements calls context.RequireCurrentSchemaIdUnchanged(), emitting an AssertCurrentSchemaID requirement so concurrent modifications do not silently override a schema change.
Sources: src/iceberg/table_metadata.cc913-964 src/iceberg/table_update.cc111-128
RemoveSchemastable::RemoveSchemas::ApplyTo calls TableMetadataBuilder::RemoveSchemas(schema_ids_). The implementation filters metadata_.schemas to exclude the listed IDs. The current schema cannot be removed; attempting to do so is an error.
GenerateRequirements emits both RequireCurrentSchemaIdUnchanged and RequireNoBranchesChanged. The latter prevents removal of schemas that may still be referenced by active snapshots on branches.
Sources: src/iceberg/table_update.cc206-225 src/iceberg/table_update.h265-284
kLastAdded SentinelThe internal constant kLastAdded = -1 is used as a placeholder schema ID in SetCurrentSchema to mean "the schema most recently added in this builder session." This avoids the need for the caller to track IDs manually when AddSchema and SetCurrentSchema are chained.
// Conceptual usage inside TableMetadataBuilder
builder.AddSchema(new_schema); // assigns ID, stores as last_added_schema_id_
builder.SetCurrentSchema(kLastAdded); // resolves to last_added_schema_id_
The builder's Impl class maintains std::optional<int32_t> last_added_schema_id_ to track this. If SetCurrentSchema(kLastAdded) is called without a prior AddSchema, the builder records an error that surfaces at Build() time.
Sources: src/iceberg/table_metadata.cc64 src/iceberg/table_metadata.cc913-918
The diagram below shows how a schema change flows from a TableUpdate object through the builder to the final metadata, and which requirements it generates for optimistic concurrency control.
Schema evolution operation flow
Sources: src/iceberg/table_update.cc83-128 src/iceberg/table_metadata.cc913-975
SetCurrentSchema Side Effects on Partition Specs and Sort OrdersChanging the current schema requires that all existing partition specs and sort orders remain valid against the new schema. The builder rebuilds them in place — it does not remove them, only re-validates. If a partition spec or sort order references a field by source ID that no longer exists in the new schema, SetCurrentSchema fails.
This validation calls PartitionSpec::ValidatePartitionName for each partition spec against the new schema pointer.
Sources: src/iceberg/table_metadata.cc930-964
TableMetadataTableMetadata provides two accessor methods for schema retrieval:
| Method | Description |
|---|---|
Schema() | Returns the schema with current_schema_id |
SchemaById(int32_t schema_id) | Returns the schema with the given ID |
Both return Result<shared_ptr<Schema>> and produce a NotFound error if the ID is absent. The TableMetadataCache class wraps the metadata and lazily builds a SchemasMap (unordered_map<int32_t, shared_ptr<Schema>>) for O(1) lookup by ID via GetSchemasById().
Sources: src/iceberg/table_metadata.h135-140 src/iceberg/table_metadata.h168-199 src/iceberg/table_metadata.cc235-247
The following TableMetadataBuilder methods are relevant to schema evolution:
| Builder Method | Underlying Impl Method | Effect |
|---|---|---|
AddSchema(shared_ptr<Schema>) | AddSchema(schema, last_column_id) | Adds a schema; reuses ID if equivalent schema exists |
SetCurrentSchema(shared_ptr<Schema>, int32_t) | AddSchema + SetCurrentSchema(kLastAdded) | Adds schema and immediately sets it as current |
SetCurrentSchema(int32_t schema_id) | SetCurrentSchema(schema_id) | Sets existing schema as current by ID |
RemoveSchemas(unordered_set<int32_t>) | RemoveSchemas(schema_ids) | Removes listed schemas (cannot remove current) |
Sources: src/iceberg/table_metadata.h278-321
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.