Query Planning and Data Access

Relevant source files

This page explains the architecture and workflow for scanning Iceberg tables and reading data files in the C++ implementation. It covers the complete read path from configuring a table scan through planning file tasks to reading Arrow batches.

The query planning system is organized into several subsystems, each documented in detail in child pages:

Page 7.1 (Table Scan): TableScanBuilder configuration API, TableScan and DataTableScan implementations
Page 7.2 (File Scan Tasks): FileScanTask structure, ArrowArrayStream integration for reading data
Page 7.3 (Predicate Pushdown and Filtering): Expression framework, binding, evaluation, and multi-level filtering
Page 7.4 (Schema Projection and Type Utilities): Schema manipulation visitors and projection computation

For related topics, see page 4 (Table Metadata System) for snapshot and manifest structure, and page 8 (File Format Support) for Parquet and Avro reader implementations.

Overview

The query planning pipeline transforms a scan configuration into executable file read tasks through several stages:

Pipeline Stages

Stage	Components	Purpose
Configuration	`TableScanBuilder`	Accumulates scan parameters: filters, projections, snapshot selection, options
Planning	`TableScan`, `ManifestGroup`	Loads manifests, applies multi-level filtering, produces `FileScanTask` objects
Execution	`FileScanTask`, `Reader`, `ArrowArrayStream`	Opens data files and streams Arrow batches to the user

The architecture emphasizes:

Lazy evaluation: Snapshots and manifests are loaded on-demand
Multi-level filtering: Pruning at manifest, entry, and row levels minimizes I/O
Immutability: TableScan is immutable once built, enabling safe concurrent use

Sources: src/iceberg/table_scan.h129-332 src/iceberg/table_scan.cc213-502

Core Architecture

Query Planning Class Hierarchy

Sources: src/iceberg/table_scan.h104-332 src/iceberg/table_scan.cc213-502

Key Components

Component	Class/Struct	Location	Purpose
Scan builder	`TableScanBuilder`	src/iceberg/table_scan.h129-268	Fluent API for configuring scans
Scan base	`TableScan`	src/iceberg/table_scan.h271-314	Abstract scan with common functionality
Data scan	`DataTableScan`	src/iceberg/table_scan.h317-331	Concrete implementation using `ManifestGroup`
Scan task	`FileScanTask`	src/iceberg/table_scan.h60-101	Executable unit: one data file + deletes + filter
Scan context	`internal::TableScanContext`	src/iceberg/table_scan.h106-125	Holds all scan parameters
Manifest processor	`ManifestGroup`	manifest/manifest_group.h	Coordinates manifest reading and filtering

Sources: src/iceberg/table_scan.h60-332 src/iceberg/table_scan.cc213-502

Query Planning Pipeline

End-to-End Scan Flow

Sources: src/iceberg/table_scan.cc213-372 src/iceberg/table_scan.cc475-502

Pipeline Stages

The scan execution progresses through distinct stages:

Configuration Stage (TableScanBuilder)
- User builds scan with fluent API
- Accumulates filters, projections, snapshot selection
- Validates configuration consistency
- See page 7.1 for details
Planning Stage (DataTableScan::PlanFiles())
- Loads snapshot and manifest list
- Creates ManifestGroup with configuration
- Applies manifest-level filtering
- Reads and filters manifest entries
- Matches delete files to data files
- Computes residual filters
- Produces FileScanTask objects
- See pages 7.3 and 7.4 for filtering and projection details
Execution Stage (FileScanTask::ToArrow())
- Opens data file via format-specific reader
- Wraps reader in ArrowArrayStream
- User reads Arrow batches
- See page 7.2 for task structure and page 8 for readers

Sources: src/iceberg/table_scan.h129-332 src/iceberg/table_scan.cc213-502

Configuration with TableScanBuilder

The TableScanBuilder provides a fluent API for configuring table scans. It accumulates parameters in a TableScanContext and validates them before producing an immutable TableScan.

Builder Pattern Overview

Common Configuration Patterns

Configuration	Builder Calls	Result
Full table scan	`Make(metadata, io).Build()`	All columns, current snapshot, no filter
Filtered scan	`Filter(expr).Build()`	Apply predicate during planning and reading
Column subset	`Select({"col1", "col2"}).Build()`	Only selected columns plus filter columns
Time travel	`UseSnapshot(id).Build()`	Scan specific snapshot by ID
Case-insensitive	`CaseSensitive(false).Build()`	Case-insensitive column name matching

For detailed documentation of all builder methods, validation rules, and schema resolution logic, see page 7.1 (Table Scan).

Sources: src/iceberg/table_scan.h129-268 src/iceberg/table_scan.cc213-372 src/iceberg/test/table_scan_test.cc267-325

Planning with DataTableScan

The DataTableScan class implements PlanFiles() to produce FileScanTask objects. It delegates to ManifestGroup for the actual work.

PlanFiles() Execution Flow

Sources: src/iceberg/table_scan.cc475-502

Schema Projection

The scan computes a projected schema that includes:

Explicitly selected columns (via Select())
Columns referenced in the filter expression
Parent struct fields when nested columns are selected

The projection logic is in TableScan::ResolveProjectedSchema() at src/iceberg/table_scan.cc413-453

Example

Schema: { id: int, data: string, value: long }
Select: ["data"]
Filter: Equal("id", 42)

→ Projected schema: { id: int, data: string }
  (includes both selected and filter-referenced columns)

For details on projection computation and the GetProjectedIdsVisitor, see page 7.4 (Schema Projection and Type Utilities).

Sources: src/iceberg/table_scan.cc413-502 src/iceberg/test/table_scan_test.cc669-770

File Scan Tasks

A FileScanTask represents an executable unit of work: reading one data file with associated delete files and a residual filter.

FileScanTask Structure

Sources: src/iceberg/table_scan.h60-101 src/iceberg/table_scan.cc178-211

Key Concepts

Component	Description
`data_file_`	The primary data file to read (Parquet or Avro)
`delete_files_`	Position and equality delete files that apply to this data file
`residual_filter_`	Filter expression that must be evaluated during file reading
`ToArrow()`	Opens the file and returns an `ArrowArrayStream` for reading batches

The task encapsulates everything needed to read a file:

File location and format
Schema projection
Residual filtering
Delete file application (not yet implemented)

For detailed documentation of FileScanTask, ArrowArrayStream integration, and reading patterns, see page 7.2 (File Scan Tasks).

Sources: src/iceberg/table_scan.h60-101 src/iceberg/table_scan.cc178-211 src/iceberg/test/file_scan_task_test.cc43-186

Manifest Processing

The ManifestGroup class coordinates reading manifests, applying filters, and producing FileScanTask objects. It implements multi-level filtering to minimize I/O.

ManifestGroup Role in Planning

Sources: src/iceberg/table_scan.cc475-502

Multi-Level Filtering

The filtering happens at progressively finer granularity:

Level	Component	Operates On	Optimization
1	`ManifestEvaluator`	Entire manifest files	Skip reading manifests that can't match partition filters
2	`ManifestReader`	Manifest entries	Filter entries based on partition values and file statistics
3	`DeleteFileIndex`	Data files	Match applicable delete files to each data file
4	`ResidualEvaluator`	Row data	Compute filter remaining after partition/stats pruning

For detailed documentation of the expression framework, predicate binding, and multi-level filtering, see page 7.3 (Predicate Pushdown and Filtering).

Sources: src/iceberg/table_scan.cc475-502

Column Selection for Manifests

The scan requests different columns from manifest files based on configuration:

Sources: src/iceberg/table_scan.cc43-58 src/iceberg/table_scan.cc455-502

Data Reading

Once planning produces FileScanTask objects, reading data involves opening files and streaming Arrow batches.

Data Read Flow

Sources: src/iceberg/table_scan.cc195-211 src/iceberg/table_scan.cc60-156

ArrowArrayStream C ABI

The ArrowArrayStream struct provides a C ABI for streaming Arrow data:

Callback	Purpose	Implementation
`get_schema`	Returns the Arrow schema	Calls `reader->Schema()`
`get_next`	Returns next batch or null if done	Calls `reader->Next()`
`get_last_error`	Returns error message string	Returns stored error from private data
`release`	Cleans up resources	Deletes `ReaderStreamPrivateData`

The callbacks operate on ReaderStreamPrivateData at src/iceberg/table_scan.cc61-73 which stores the Reader and last error message.

Reading Example

For details on FileScanTask structure and the ArrowArrayStream integration, see page 7.2 (File Scan Tasks). For reader implementations, see page 8 (File Format Support).

Sources: src/iceberg/table_scan.cc60-211 src/iceberg/test/file_scan_task_test.cc129-146

Snapshot and Manifest Loading

Snapshot Structure

A Snapshot represents a point-in-time view of table data. Each snapshot references a manifest list file containing metadata about all data and delete files.

Sources: src/iceberg/snapshot.h383-454

Lazy Manifest Loading with SnapshotCache

The SnapshotCache class provides lazy-loading of manifests from the manifest list:

The cache partitions manifests into data and delete manifests during initialization, storing them in a single vector with data manifests at the head and delete manifests at the tail.

Key Methods:

Manifests(): Returns all manifest files (data + delete)
DataManifests(): Returns only data manifests
DeleteManifests(): Returns only delete manifests

Sources: src/iceberg/snapshot.h459-504 src/iceberg/snapshot.cc223-277

ManifestGroup: Coordinating Manifest Processing

ManifestGroup is the central coordinator for reading manifests and producing scan tasks. It applies filters at multiple levels and matches delete files to data files.

Construction and Configuration

Sources: src/iceberg/manifest/manifest_group.h56-110 src/iceberg/manifest/manifest_group.cc40-85

Multi-Level Filtering Pipeline

The ManifestGroup applies filters at three progressively finer levels:

Level	Component	Filter Type	Applies To	Pruning Effect
1	`ManifestEvaluator`	Partition filter	Entire manifest files	Skip reading manifests that don't match partition constraints
2	`ManifestReader`	Row filter + file filter	Individual manifest entries	Skip entries based on partition values and file-level statistics
3	`ResidualEvaluator`	Remaining row filters	Data within files	Compute filters that must be applied during file reads

Sources: src/iceberg/manifest/manifest_group.cc160-233 src/iceberg/expression/manifest_evaluator.h

Manifest-Level Filtering

Before reading a manifest file, ManifestEvaluator checks if the manifest's partition summaries can satisfy the partition filter:

Example from tests:

// A manifest with partition summaries for data_bucket in [0, 5]
// Can be skipped if filter requires data_bucket = 10
ManifestEvaluator evaluator(spec, partition_filter);
if (!evaluator.Eval(manifest)) {
  // Skip reading this manifest entirely
}

Sources: src/iceberg/manifest/manifest_group.cc162-172 src/iceberg/test/manifest_group_test.cc432-475

Entry-Level Filtering

ManifestReader applies row-level and file-level filters to manifest entries:

Row Filter: Evaluated against partition values and file-level statistics (bounds, null counts, NaN counts)

File Filter: Evaluated against file metadata fields (record_count, file_size_in_bytes, etc.)

Custom Predicate: User-defined function for arbitrary filtering logic

Sources: src/iceberg/manifest/manifest_reader.h src/iceberg/test/manifest_group_test.cc369-428

Delete File Index

The DeleteFileIndex efficiently matches delete files to data files based on partition values and sequence numbers. It supports three types of deletes:

Position Deletes: Row positions to delete (file path + row position)
Equality Deletes: Rows matching equality conditions
Deletion Vectors: Bitmap-based position deletes (V3+)

Index Structure

Key Design Points:

Global vs Partitioned: Equality deletes in unpartitioned tables are "global" and apply to all data files. Position deletes are never global (except in unpartitioned tables).
Sequence Number Ordering: Position deletes are sorted by sequence_number to enable binary search when filtering by data file sequence.
Partition Key Hashing: Uses PartitionKey (derived from PartitionValues) as map key for O(1) partition lookup.

Sources: src/iceberg/delete_file_index.h103-316 src/iceberg/delete_file_index.cc

Building the Index

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc190-200

Matching Deletes to Data Files

The ForDataFile() method returns all delete files applicable to a given data file:

Sequence Number Filtering Logic:

A delete file applies to a data file if:

delete.sequence_number > data.sequence_number (delete was committed after data)
Exception: In V3+, deletion vectors use referenced_data_file and are always applicable

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc263-359

FileScanTask Generation

A FileScanTask encapsulates everything needed to read a data file:

Task Creation Process

Sources: src/iceberg/table_scan.h src/iceberg/manifest/manifest_group.cc187-232

Residual Filter Computation

The ResidualEvaluator determines what portion of a filter must be applied during file reading:

Example:

Original filter: id = 5 AND data = 'test'
Partition: data_bucket = hash(data) % 16

After partition pruning: id = 5 AND data = 'test'
  (data = 'test' cannot be fully evaluated from partition, so remains)
  
Residual filter: id = 5 AND data = 'test'

If a filter can be fully satisfied by partition metadata or file statistics, the residual may simplify to True, meaning no filtering is needed during file reading.

Sources: src/iceberg/expression/residual_evaluator.h src/iceberg/test/manifest_group_test.cc476-537

Filtering Strategies and Optimizations

Ignore Flags

ManifestGroup supports flags to skip certain entry types:

Method	Effect	Use Case
`IgnoreDeleted()`	Skip `ManifestStatus::kDeleted` entries	Standard reads (ignore removal markers)
`IgnoreExisting()`	Skip `ManifestStatus::kExisting` entries	Incremental processing (only new files)
`IgnoreResiduals()`	Don't compute residual filters	When downstream will re-apply full filter

Sources: src/iceberg/manifest/manifest_group.h101-108 src/iceberg/test/manifest_group_test.cc293-367

Column Selection and Stats Dropping

ManifestReader can drop statistics (bounds, value counts, etc.) when they're not needed:

Logic:

If columns include any stats field (value_counts, lower_bounds, etc.), keep all stats
If columns include Schema::kAllColumns wildcard ("*"), keep all stats
Otherwise, drop all stats to reduce memory usage

Sources: src/iceberg/manifest/manifest_reader.h src/iceberg/test/manifest_reader_test.cc356-419

Partition Spec Compatibility

Delete files must have compatible partition specs with data files:

Compatible if:
1. Same partition spec ID, OR
2. Delete file is unpartitioned (spec_id = 0) AND content is equality deletes

Global equality deletes (unpartitioned) apply across all partitions. Position deletes are never global.

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc570-599

Practical Examples

Example 1: Basic Table Scan with Filtering

Sources: src/iceberg/test/table_scan_test.cc50-245

Example 2: Incremental Processing with IgnoreExisting

This pattern is useful for change data capture or incremental ETL where you only want to process files added since the last snapshot.

Sources: src/iceberg/test/manifest_group_test.cc331-367

Example 3: Delete File Index with Sequence Number Filtering

The AfterSequenceNumber() method filters out delete files committed before a certain sequence, useful for incremental scans that start from a known sequence number.

Sources: src/iceberg/test/delete_file_index_test.cc232-261

Key Implementation Files

Component	Header File	Implementation	Purpose
Snapshot	src/iceberg/snapshot.h383-506	src/iceberg/snapshot.cc156-277	Snapshot metadata and lazy manifest loading
ManifestGroup	src/iceberg/manifest/manifest_group.h52-140	src/iceberg/manifest/manifest_group.cc40-316	Manifest processing coordination
DeleteFileIndex	src/iceberg/delete_file_index.h103-316	src/iceberg/delete_file_index.cc	Delete file indexing and matching
ManifestReader	src/iceberg/manifest/manifest_reader.h	src/iceberg/manifest/manifest_reader.cc	Reading and filtering manifest entries
FileScanTask	src/iceberg/table_scan.h	src/iceberg/table_scan.cc	Scan task abstraction

Sources: Listed in table

Query Planning and Data Access

Relevant source files

The query planning system is organized into several subsystems, each documented in detail in child pages:

Page 7.1 (Table Scan): TableScanBuilder configuration API, TableScan and DataTableScan implementations
Page 7.2 (File Scan Tasks): FileScanTask structure, ArrowArrayStream integration for reading data
Page 7.3 (Predicate Pushdown and Filtering): Expression framework, binding, evaluation, and multi-level filtering
Page 7.4 (Schema Projection and Type Utilities): Schema manipulation visitors and projection computation

For related topics, see page 4 (Table Metadata System) for snapshot and manifest structure, and page 8 (File Format Support) for Parquet and Avro reader implementations.

Overview

The query planning pipeline transforms a scan configuration into executable file read tasks through several stages:

Pipeline Stages

Stage	Components	Purpose
Configuration	`TableScanBuilder`	Accumulates scan parameters: filters, projections, snapshot selection, options
Planning	`TableScan`, `ManifestGroup`	Loads manifests, applies multi-level filtering, produces `FileScanTask` objects
Execution	`FileScanTask`, `Reader`, `ArrowArrayStream`	Opens data files and streams Arrow batches to the user

The architecture emphasizes:

Lazy evaluation: Snapshots and manifests are loaded on-demand
Multi-level filtering: Pruning at manifest, entry, and row levels minimizes I/O
Immutability: TableScan is immutable once built, enabling safe concurrent use

Sources: src/iceberg/table_scan.h129-332 src/iceberg/table_scan.cc213-502

Core Architecture

Query Planning Class Hierarchy

Sources: src/iceberg/table_scan.h104-332 src/iceberg/table_scan.cc213-502

Key Components

Component	Class/Struct	Location	Purpose
Scan builder	`TableScanBuilder`	src/iceberg/table_scan.h129-268	Fluent API for configuring scans
Scan base	`TableScan`	src/iceberg/table_scan.h271-314	Abstract scan with common functionality
Data scan	`DataTableScan`	src/iceberg/table_scan.h317-331	Concrete implementation using `ManifestGroup`
Scan task	`FileScanTask`	src/iceberg/table_scan.h60-101	Executable unit: one data file + deletes + filter
Scan context	`internal::TableScanContext`	src/iceberg/table_scan.h106-125	Holds all scan parameters
Manifest processor	`ManifestGroup`	manifest/manifest_group.h	Coordinates manifest reading and filtering

Sources: src/iceberg/table_scan.h60-332 src/iceberg/table_scan.cc213-502

Query Planning Pipeline

End-to-End Scan Flow

Sources: src/iceberg/table_scan.cc213-372 src/iceberg/table_scan.cc475-502

Pipeline Stages

The scan execution progresses through distinct stages:

Configuration Stage (TableScanBuilder)
- User builds scan with fluent API
- Accumulates filters, projections, snapshot selection
- Validates configuration consistency
- See page 7.1 for details
Planning Stage (DataTableScan::PlanFiles())
- Loads snapshot and manifest list
- Creates ManifestGroup with configuration
- Applies manifest-level filtering
- Reads and filters manifest entries
- Matches delete files to data files
- Computes residual filters
- Produces FileScanTask objects
- See pages 7.3 and 7.4 for filtering and projection details
Execution Stage (FileScanTask::ToArrow())
- Opens data file via format-specific reader
- Wraps reader in ArrowArrayStream
- User reads Arrow batches
- See page 7.2 for task structure and page 8 for readers

Sources: src/iceberg/table_scan.h129-332 src/iceberg/table_scan.cc213-502

Configuration with TableScanBuilder

The TableScanBuilder provides a fluent API for configuring table scans. It accumulates parameters in a TableScanContext and validates them before producing an immutable TableScan.

Builder Pattern Overview

Common Configuration Patterns

Configuration	Builder Calls	Result
Full table scan	`Make(metadata, io).Build()`	All columns, current snapshot, no filter
Filtered scan	`Filter(expr).Build()`	Apply predicate during planning and reading
Column subset	`Select({"col1", "col2"}).Build()`	Only selected columns plus filter columns
Time travel	`UseSnapshot(id).Build()`	Scan specific snapshot by ID
Case-insensitive	`CaseSensitive(false).Build()`	Case-insensitive column name matching

For detailed documentation of all builder methods, validation rules, and schema resolution logic, see page 7.1 (Table Scan).

Sources: src/iceberg/table_scan.h129-268 src/iceberg/table_scan.cc213-372 src/iceberg/test/table_scan_test.cc267-325

Planning with DataTableScan

The DataTableScan class implements PlanFiles() to produce FileScanTask objects. It delegates to ManifestGroup for the actual work.

PlanFiles() Execution Flow

Sources: src/iceberg/table_scan.cc475-502

Schema Projection

The scan computes a projected schema that includes:

Explicitly selected columns (via Select())
Columns referenced in the filter expression
Parent struct fields when nested columns are selected

The projection logic is in TableScan::ResolveProjectedSchema() at src/iceberg/table_scan.cc413-453

Example

Schema: { id: int, data: string, value: long }
Select: ["data"]
Filter: Equal("id", 42)

→ Projected schema: { id: int, data: string }
  (includes both selected and filter-referenced columns)

For details on projection computation and the GetProjectedIdsVisitor, see page 7.4 (Schema Projection and Type Utilities).

Sources: src/iceberg/table_scan.cc413-502 src/iceberg/test/table_scan_test.cc669-770

File Scan Tasks

A FileScanTask represents an executable unit of work: reading one data file with associated delete files and a residual filter.

FileScanTask Structure

Sources: src/iceberg/table_scan.h60-101 src/iceberg/table_scan.cc178-211

Key Concepts

Component	Description
`data_file_`	The primary data file to read (Parquet or Avro)
`delete_files_`	Position and equality delete files that apply to this data file
`residual_filter_`	Filter expression that must be evaluated during file reading
`ToArrow()`	Opens the file and returns an `ArrowArrayStream` for reading batches

The task encapsulates everything needed to read a file:

File location and format
Schema projection
Residual filtering
Delete file application (not yet implemented)

For detailed documentation of FileScanTask, ArrowArrayStream integration, and reading patterns, see page 7.2 (File Scan Tasks).

Sources: src/iceberg/table_scan.h60-101 src/iceberg/table_scan.cc178-211 src/iceberg/test/file_scan_task_test.cc43-186

Manifest Processing

The ManifestGroup class coordinates reading manifests, applying filters, and producing FileScanTask objects. It implements multi-level filtering to minimize I/O.

ManifestGroup Role in Planning

Sources: src/iceberg/table_scan.cc475-502

Multi-Level Filtering

The filtering happens at progressively finer granularity:

Level	Component	Operates On	Optimization
1	`ManifestEvaluator`	Entire manifest files	Skip reading manifests that can't match partition filters
2	`ManifestReader`	Manifest entries	Filter entries based on partition values and file statistics
3	`DeleteFileIndex`	Data files	Match applicable delete files to each data file
4	`ResidualEvaluator`	Row data	Compute filter remaining after partition/stats pruning

For detailed documentation of the expression framework, predicate binding, and multi-level filtering, see page 7.3 (Predicate Pushdown and Filtering).

Sources: src/iceberg/table_scan.cc475-502

Column Selection for Manifests

The scan requests different columns from manifest files based on configuration:

Sources: src/iceberg/table_scan.cc43-58 src/iceberg/table_scan.cc455-502

Data Reading

Once planning produces FileScanTask objects, reading data involves opening files and streaming Arrow batches.

Data Read Flow

Sources: src/iceberg/table_scan.cc195-211 src/iceberg/table_scan.cc60-156

ArrowArrayStream C ABI

The ArrowArrayStream struct provides a C ABI for streaming Arrow data:

Callback	Purpose	Implementation
`get_schema`	Returns the Arrow schema	Calls `reader->Schema()`
`get_next`	Returns next batch or null if done	Calls `reader->Next()`
`get_last_error`	Returns error message string	Returns stored error from private data
`release`	Cleans up resources	Deletes `ReaderStreamPrivateData`

The callbacks operate on ReaderStreamPrivateData at src/iceberg/table_scan.cc61-73 which stores the Reader and last error message.

Reading Example

For details on FileScanTask structure and the ArrowArrayStream integration, see page 7.2 (File Scan Tasks). For reader implementations, see page 8 (File Format Support).

Sources: src/iceberg/table_scan.cc60-211 src/iceberg/test/file_scan_task_test.cc129-146

Snapshot and Manifest Loading

Snapshot Structure

A Snapshot represents a point-in-time view of table data. Each snapshot references a manifest list file containing metadata about all data and delete files.

Sources: src/iceberg/snapshot.h383-454

Lazy Manifest Loading with SnapshotCache

The SnapshotCache class provides lazy-loading of manifests from the manifest list:

The cache partitions manifests into data and delete manifests during initialization, storing them in a single vector with data manifests at the head and delete manifests at the tail.

Key Methods:

Manifests(): Returns all manifest files (data + delete)
DataManifests(): Returns only data manifests
DeleteManifests(): Returns only delete manifests

Sources: src/iceberg/snapshot.h459-504 src/iceberg/snapshot.cc223-277

ManifestGroup: Coordinating Manifest Processing

ManifestGroup is the central coordinator for reading manifests and producing scan tasks. It applies filters at multiple levels and matches delete files to data files.

Construction and Configuration

Sources: src/iceberg/manifest/manifest_group.h56-110 src/iceberg/manifest/manifest_group.cc40-85

Multi-Level Filtering Pipeline

The ManifestGroup applies filters at three progressively finer levels:

Level	Component	Filter Type	Applies To	Pruning Effect
1	`ManifestEvaluator`	Partition filter	Entire manifest files	Skip reading manifests that don't match partition constraints
2	`ManifestReader`	Row filter + file filter	Individual manifest entries	Skip entries based on partition values and file-level statistics
3	`ResidualEvaluator`	Remaining row filters	Data within files	Compute filters that must be applied during file reads

Sources: src/iceberg/manifest/manifest_group.cc160-233 src/iceberg/expression/manifest_evaluator.h

Manifest-Level Filtering

Before reading a manifest file, ManifestEvaluator checks if the manifest's partition summaries can satisfy the partition filter:

Example from tests:

// A manifest with partition summaries for data_bucket in [0, 5]
// Can be skipped if filter requires data_bucket = 10
ManifestEvaluator evaluator(spec, partition_filter);
if (!evaluator.Eval(manifest)) {
  // Skip reading this manifest entirely
}

Sources: src/iceberg/manifest/manifest_group.cc162-172 src/iceberg/test/manifest_group_test.cc432-475

Entry-Level Filtering

ManifestReader applies row-level and file-level filters to manifest entries:

Row Filter: Evaluated against partition values and file-level statistics (bounds, null counts, NaN counts)

File Filter: Evaluated against file metadata fields (record_count, file_size_in_bytes, etc.)

Custom Predicate: User-defined function for arbitrary filtering logic

Sources: src/iceberg/manifest/manifest_reader.h src/iceberg/test/manifest_group_test.cc369-428

Delete File Index

The DeleteFileIndex efficiently matches delete files to data files based on partition values and sequence numbers. It supports three types of deletes:

Position Deletes: Row positions to delete (file path + row position)
Equality Deletes: Rows matching equality conditions
Deletion Vectors: Bitmap-based position deletes (V3+)

Index Structure

Key Design Points:

Global vs Partitioned: Equality deletes in unpartitioned tables are "global" and apply to all data files. Position deletes are never global (except in unpartitioned tables).
Sequence Number Ordering: Position deletes are sorted by sequence_number to enable binary search when filtering by data file sequence.
Partition Key Hashing: Uses PartitionKey (derived from PartitionValues) as map key for O(1) partition lookup.

Sources: src/iceberg/delete_file_index.h103-316 src/iceberg/delete_file_index.cc

Building the Index

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc190-200

Matching Deletes to Data Files

The ForDataFile() method returns all delete files applicable to a given data file:

Sequence Number Filtering Logic:

A delete file applies to a data file if:

delete.sequence_number > data.sequence_number (delete was committed after data)
Exception: In V3+, deletion vectors use referenced_data_file and are always applicable

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc263-359

FileScanTask Generation

A FileScanTask encapsulates everything needed to read a data file:

Task Creation Process

Sources: src/iceberg/table_scan.h src/iceberg/manifest/manifest_group.cc187-232

Residual Filter Computation

The ResidualEvaluator determines what portion of a filter must be applied during file reading:

Example:

Original filter: id = 5 AND data = 'test'
Partition: data_bucket = hash(data) % 16

After partition pruning: id = 5 AND data = 'test'
  (data = 'test' cannot be fully evaluated from partition, so remains)
  
Residual filter: id = 5 AND data = 'test'

If a filter can be fully satisfied by partition metadata or file statistics, the residual may simplify to True, meaning no filtering is needed during file reading.

Sources: src/iceberg/expression/residual_evaluator.h src/iceberg/test/manifest_group_test.cc476-537

Filtering Strategies and Optimizations

Ignore Flags

ManifestGroup supports flags to skip certain entry types:

Method	Effect	Use Case
`IgnoreDeleted()`	Skip `ManifestStatus::kDeleted` entries	Standard reads (ignore removal markers)
`IgnoreExisting()`	Skip `ManifestStatus::kExisting` entries	Incremental processing (only new files)
`IgnoreResiduals()`	Don't compute residual filters	When downstream will re-apply full filter

Sources: src/iceberg/manifest/manifest_group.h101-108 src/iceberg/test/manifest_group_test.cc293-367

Column Selection and Stats Dropping

ManifestReader can drop statistics (bounds, value counts, etc.) when they're not needed:

Logic:

If columns include any stats field (value_counts, lower_bounds, etc.), keep all stats
If columns include Schema::kAllColumns wildcard ("*"), keep all stats
Otherwise, drop all stats to reduce memory usage

Sources: src/iceberg/manifest/manifest_reader.h src/iceberg/test/manifest_reader_test.cc356-419

Partition Spec Compatibility

Delete files must have compatible partition specs with data files:

Compatible if:
1. Same partition spec ID, OR
2. Delete file is unpartitioned (spec_id = 0) AND content is equality deletes

Global equality deletes (unpartitioned) apply across all partitions. Position deletes are never global.

Sources: src/iceberg/delete_file_index.cc src/iceberg/test/delete_file_index_test.cc570-599

Practical Examples

Example 1: Basic Table Scan with Filtering

Sources: src/iceberg/test/table_scan_test.cc50-245

Example 2: Incremental Processing with IgnoreExisting

This pattern is useful for change data capture or incremental ETL where you only want to process files added since the last snapshot.

Sources: src/iceberg/test/manifest_group_test.cc331-367

Example 3: Delete File Index with Sequence Number Filtering

The AfterSequenceNumber() method filters out delete files committed before a certain sequence, useful for incremental scans that start from a known sequence number.

Sources: src/iceberg/test/delete_file_index_test.cc232-261

Key Implementation Files

Component	Header File	Implementation	Purpose
Snapshot	src/iceberg/snapshot.h383-506	src/iceberg/snapshot.cc156-277	Snapshot metadata and lazy manifest loading
ManifestGroup	src/iceberg/manifest/manifest_group.h52-140	src/iceberg/manifest/manifest_group.cc40-316	Manifest processing coordination
DeleteFileIndex	src/iceberg/delete_file_index.h103-316	src/iceberg/delete_file_index.cc	Delete file indexing and matching
ManifestReader	src/iceberg/manifest/manifest_reader.h	src/iceberg/manifest/manifest_reader.cc	Reading and filtering manifest entries
FileScanTask	src/iceberg/table_scan.h	src/iceberg/table_scan.cc	Scan task abstraction

Sources: Listed in table

Query Planning and Data Access

Overview

Core Architecture

Query Planning Pipeline

Configuration with TableScanBuilder

Planning with DataTableScan

File Scan Tasks

Manifest Processing

Data Reading

Snapshot and Manifest Loading

Snapshot Structure

Lazy Manifest Loading with SnapshotCache

ManifestGroup: Coordinating Manifest Processing

Construction and Configuration

Multi-Level Filtering Pipeline

Manifest-Level Filtering

Entry-Level Filtering

Delete File Index

Index Structure

Building the Index

Matching Deletes to Data Files

FileScanTask Generation

Task Creation Process

Residual Filter Computation

Filtering Strategies and Optimizations

Ignore Flags

Column Selection and Stats Dropping

Partition Spec Compatibility

Practical Examples

Example 1: Basic Table Scan with Filtering

Example 2: Incremental Processing with IgnoreExisting

Example 3: Delete File Index with Sequence Number Filtering

Key Implementation Files

On this page

Query Planning and Data Access

Overview

Core Architecture

Query Planning Pipeline

Configuration with TableScanBuilder

Planning with DataTableScan

File Scan Tasks

Manifest Processing

Data Reading

Snapshot and Manifest Loading

Snapshot Structure

Lazy Manifest Loading with SnapshotCache

ManifestGroup: Coordinating Manifest Processing

Construction and Configuration

Multi-Level Filtering Pipeline

Manifest-Level Filtering

Entry-Level Filtering

Delete File Index

Index Structure

Building the Index

Matching Deletes to Data Files

FileScanTask Generation

Task Creation Process

Residual Filter Computation

Filtering Strategies and Optimizations

Ignore Flags

Column Selection and Stats Dropping

Partition Spec Compatibility

Practical Examples

Example 1: Basic Table Scan with Filtering

Example 2: Incremental Processing with IgnoreExisting

Example 3: Delete File Index with Sequence Number Filtering

Key Implementation Files

On this page