Rebase Upstream #2

chenzl25 · 2024-09-09T09:03:38Z

No description provided.

* feat: Add website layout Signed-off-by: Xuanwo <github@xuanwo.io> * publish to rust.i.a.o Signed-off-by: Xuanwo <github@xuanwo.io> * Fix license Signed-off-by: Xuanwo <github@xuanwo.io> * Let's try mdbook action Signed-off-by: Xuanwo <github@xuanwo.io> * use cargo install Signed-off-by: Xuanwo <github@xuanwo.io> * disable section Signed-off-by: Xuanwo <github@xuanwo.io> * Add docs for website Signed-off-by: Xuanwo <github@xuanwo.io> * Fix license Signed-off-by: Xuanwo <github@xuanwo.io> * action approved Signed-off-by: Xuanwo <github@xuanwo.io> --------- Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Expressions * Fix comments * Refactor expression to be more similar to iceberg model * Fix typo

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Add roadmap and features status in README.md * Fix * Fix * Add more details according to comments * Revert unnecessary new line break * Nits --------- Co-authored-by: Fokko Driesprong <fokko@apache.org>

Bumps [peaceiris/actions-gh-pages](https://github.com/peaceiris/actions-gh-pages) from 3.9.2 to 3.9.3. - [Release notes](https://github.com/peaceiris/actions-gh-pages/releases) - [Changelog](https://github.com/peaceiris/actions-gh-pages/blob/main/CHANGELOG.md) - [Commits](peaceiris/actions-gh-pages@v3.9.2...v3.9.3) --- updated-dependencies: - dependency-name: peaceiris/actions-gh-pages dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Updates the requirements on [opendal](https://github.com/apache/incubator-opendal) to permit the latest version. - [Release notes](https://github.com/apache/incubator-opendal/releases) - [Changelog](https://github.com/apache/incubator-opendal/blob/main/CHANGELOG.md) - [Commits](apache/opendal@v0.43.0...v0.43.0) --- updated-dependencies: - dependency-name: opendal dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Code complete * Resolve * Done * Fix comments * Fix comments * Fix comments * Fix * Fix comment

* chore: Update reader api status * Restore unnecessary change

* Add formatting for toml files * Update call to taplo * Add command to format and a command to check

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version. - [Release notes](https://github.com/rust-cli/env_logger/releases) - [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md) - [Commits](rust-cli/env_logger@v0.10.0...v0.10.2) --- updated-dependencies: - dependency-name: env_logger dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* init file writer interface * refine --------- Co-authored-by: ZENOTME <st810918843@gmail.com>

* fix: Manifest parsing should consider schema evolution. * Fix ut

* Add * Fix format * Add license header

…175) Updates the requirements on [derive_builder](https://github.com/colin-kiegel/rust-derive-builder) to permit the latest version. - [Release notes](https://github.com/colin-kiegel/rust-derive-builder/releases) - [Commits](colin-kiegel/rust-derive-builder@v0.12.0...v0.12.0) --- updated-dependencies: - dependency-name: derive_builder dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add handwritten serialize * revert expect * remove expect

Co-authored-by: Fokko Driesprong <fokko@apache.org>

* feat: Bump version 0.2.0 to prepare for release. * Update dependencies

* add unit tests * fix type

Updates the requirements on [opendal](https://github.com/apache/opendal) to permit the latest version. - [Release notes](https://github.com/apache/opendal/releases) - [Changelog](https://github.com/apache/opendal/blob/main/CHANGELOG.md) - [Commits](apache/opendal@v0.44.0...v0.44.2) --- updated-dependencies: - dependency-name: opendal dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add memory catalog * fix style * fix style

…ce (#512) * feat: adds ObjectCache, to cache Manifests and ManifestLists * refactor: change obj cache method names and use more readable default usize value * chore: improve error message Co-authored-by: Renjie Liu <liurenjie2008@gmail.com> * fix: change object cache retrieval method visibility Co-authored-by: Renjie Liu <liurenjie2008@gmail.com> * feat: improved error message in object cache get_manifest * test(object-cache): add unit tests for object cache manifest and manifest list retrieval * fix: ensure that object cache insertions are weighted by size * test: fix test typo * fix: ensure object cache weight is that of the wrapped item, not the Arc --------- Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

This is in line with the paths above, and also with the previous version: https://dist.apache.org/repos/dist/release/iceberg/

Signed-off-by: Xuanwo <github@xuanwo.io>

* fix: ensure that RestCatalog passes user config to FileIO * docs: added some doc comments to clarify override order for config

Both licenses can be moved to the `allowed` section: - **adler32** [ships](https://github.com/remram44/adler32-rs/blob/master/LICENSE) with a **zlib** license and is a category A-license - **unicode-ident** ships with a **UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE** which is also a category A-license The **ring** license is a bit [more involved](https://github.com/briansmith/ring/blob/main/LICENSE) and carries a lot of history, I think it is best to keep that as an exception for now, since the OpenSSL license is also not explicitly listed on the ASF page. I don't see anything alarming in the `LICENSE` file. ASF page on the subject: https://www.apache.org/legal/resolved.html#category-a

Signed-off-by: Xuanwo <github@xuanwo.io>

) * feat(timestamp_ns): first commit * feat(timestamp_ns): Add mappings for timestamp_ns/timestamptz_ns * feat(timestamp_ns): Remove unused dep * feat(timestamp_ns): Fix unit test * feat(timestamp_ns): Fix test_all_type_for_write() * feat(timestamp_ns): fix test_transform_days_literal * feat(timestamp_ns): fix math for timestamptz_nanos * chore: formatting * chore: formatting * chore: Appease clippy --------- Co-authored-by: Timothy Maloney <tmaloney@influxdata.com>

* correct partition-id to field id in PartitionSpec * correct partition-id to field id in PartitionSpec * correct partition-id to field id in PartitionSpec * xx

--- updated-dependencies: - dependency-name: typed-builder dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* bucket transform rust binding * format * poetry x maturin * ignore poetry.lock in license check * update bindings_python_ci to use makefile * newline * python-poetry/poetry#9135 * use hatch instead of poetry * refactor * revert licenserc change * adopt review feedback * comments * unused dependency * adopt review comment * newline * I like this approach a lot better * more tests

* feat(scan): add row group and page index row selection filtering * fix(row selection): off-by-one error * feat: remove row selection to defer to a second PR * feat: better min/max val conversion in RowGroupMetricsEvaluator * test(row_group_filtering): first three tests * test(row_group_filtering): next few tests * test: add more tests for RowGroupMetricsEvaluator * chore: refactor test assertions to silence clippy lints * refactor: consolidate parquet stat min/max parsing in one place

* feat: SQL Catalog - namespaces Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * feat: use transaction for updates and creates Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: pull out query param builder to fn Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * feat: add drop and tests Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: String to str, remove pub and optimise query builder Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: nested match, remove ok() Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: remove pub, add set, add comments Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: refactor list_namespaces slightly Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: add default properties to all new namespaces Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: remove check for nested namespace Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * chore: add more comments to the CatalogConfig to explain bind styles Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> * fix: edit test for nested namespaces Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com> --------- Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

Signed-off-by: Xuanwo <github@xuanwo.io>

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.24.3 to 1.24.5. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.24.3...v1.24.5) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Partition compatability * Partition compatability * Rename compatible_with -> is_compatible_with

* SortOrder methods should take schema ref if possible * Fix test type * with_order_id should not take reference

…chTransformer (apache#1821) ## Which issue does this PR close? Partially address apache#1749. ## What changes are included in this PR? This PR adds partition spec handling to `FileScanTask` and `RecordBatchTransformer` to correctly implement the Iceberg spec's "Column Projection" rules for fields "not present" in data files. ### Problem Statement Prior to this PR, `iceberg-rust`'s `FileScanTask` had no mechanism to pass partition information to `RecordBatchTransformer`, causing two issues: 1. **Incorrect handling of bucket partitioning**: Couldn't distinguish identity transforms (which should use partition metadata constants) from non-identity transforms like bucket/truncate/year/month (which must read from data file). For example, `bucket(4, id)` stores `id_bucket = 2` (bucket number) in partition metadata, but actual `id` values (100, 200, 300) are only in the data file. iceberg-rust was incorrectly treating bucket-partitioned source columns as constants, breaking runtime filtering and returning incorrect query results. 2. **Field ID conflicts in add_files scenarios**: When importing Hive tables via `add_files`, partition columns could have field IDs conflicting with Parquet data columns. Example: Parquet has field_id=1→"name", but Iceberg expects field_id=1→"id" (partition). Per spec, the correct field is "not present" and requires name mapping fallback. ### Iceberg Specification Requirements Per the Iceberg spec (https://iceberg.apache.org/spec/#column-projection), when a field ID is "not present" in a data file, it must be resolved using these rules: 1. Return the value from partition metadata if an **Identity Transform** exists 2. Use `schema.name-mapping.default` metadata to map field id to columns without field id 3. Return the default value if it has a defined `initial-default` 4. Return null in all other cases **Why this matters:** - **Identity transforms** (e.g., `identity(dept)`) store actual column values in partition metadata that can be used as constants without reading the data file - **Non-identity transforms** (e.g., `bucket(4, id)`, `day(timestamp)`) store transformed values in partition metadata (e.g., bucket number 2, not the actual `id` values 100, 200, 300) and must read source columns from the data file ### Changes Made 1. **Added partition fields to `FileScanTask`** (`scan/task.rs`): - `partition: Option<Struct>` - Partition data from manifest entry - `partition_spec: Option<Arc<PartitionSpec>>` - For transform-aware constant detection - `name_mapping: Option<Arc<NameMapping>>` - Name mapping from table metadata 2. **Implemented `constants_map()` function** (`arrow/record_batch_transformer.rs`): - Replicates Java's `PartitionUtil.constantsMap()` behavior - Only includes fields where transform is `Transform::Identity` - Used to determine which fields use partition metadata constants vs. reading from data files 3. **Enhanced `RecordBatchTransformer`** (`arrow/record_batch_transformer.rs`): - Added `build_with_partition_data()` method to accept partition spec, partition data, and name mapping - Implements all 4 spec rules for column resolution with identity-transform awareness - Detects field ID conflicts by verifying both field ID AND name match - Falls back to name mapping when field IDs are missing/conflicting (spec rule #2) 4. **Updated `ArrowReader`** (`arrow/reader.rs`): - Uses `build_with_partition_data()` when partition information is available - Falls back to `build()` when not available 5. **Updated manifest entry processing** (`scan/context.rs`): - Populates partition fields in `FileScanTask` from manifest entry data ### Tests Added 1. **`bucket_partitioning_reads_source_column_from_file`** - Verifies that bucket-partitioned source columns are read from data files (not treated as constants from partition metadata) 2. **`identity_partition_uses_constant_from_metadata`** - Verifies that identity-transformed fields correctly use partition metadata constants 3. **`test_bucket_partitioning_with_renamed_source_column`** - Verifies field-ID-based mapping works despite column rename 4. **`add_files_partition_columns_without_field_ids`** - Verifies name mapping resolution for Hive table imports without field IDs (spec rule #2) 5. **`add_files_with_true_field_id_conflict`** - Verifies correct field ID conflict detection with name mapping fallback (spec rule #2) 6. **`test_all_four_spec_rules`** - Integration test verifying all 4 spec rules work together ## Are these changes tested? Yes, there are 6 new unit tests covering all 4 Iceberg spec rules. This also resolved approximately 50 Iceberg Java tests when running with DataFusion Comet's experimental apache/datafusion-comet#2528 PR. --------- Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

…chTransformer (apache#1821) Partially address apache#1749. This PR adds partition spec handling to `FileScanTask` and `RecordBatchTransformer` to correctly implement the Iceberg spec's "Column Projection" rules for fields "not present" in data files. Prior to this PR, `iceberg-rust`'s `FileScanTask` had no mechanism to pass partition information to `RecordBatchTransformer`, causing two issues: 1. **Incorrect handling of bucket partitioning**: Couldn't distinguish identity transforms (which should use partition metadata constants) from non-identity transforms like bucket/truncate/year/month (which must read from data file). For example, `bucket(4, id)` stores `id_bucket = 2` (bucket number) in partition metadata, but actual `id` values (100, 200, 300) are only in the data file. iceberg-rust was incorrectly treating bucket-partitioned source columns as constants, breaking runtime filtering and returning incorrect query results. 2. **Field ID conflicts in add_files scenarios**: When importing Hive tables via `add_files`, partition columns could have field IDs conflicting with Parquet data columns. Example: Parquet has field_id=1→"name", but Iceberg expects field_id=1→"id" (partition). Per spec, the correct field is "not present" and requires name mapping fallback. Per the Iceberg spec (https://iceberg.apache.org/spec/#column-projection), when a field ID is "not present" in a data file, it must be resolved using these rules: 1. Return the value from partition metadata if an **Identity Transform** exists 2. Use `schema.name-mapping.default` metadata to map field id to columns without field id 3. Return the default value if it has a defined `initial-default` 4. Return null in all other cases **Why this matters:** - **Identity transforms** (e.g., `identity(dept)`) store actual column values in partition metadata that can be used as constants without reading the data file - **Non-identity transforms** (e.g., `bucket(4, id)`, `day(timestamp)`) store transformed values in partition metadata (e.g., bucket number 2, not the actual `id` values 100, 200, 300) and must read source columns from the data file 1. **Added partition fields to `FileScanTask`** (`scan/task.rs`): - `partition: Option<Struct>` - Partition data from manifest entry - `partition_spec: Option<Arc<PartitionSpec>>` - For transform-aware constant detection - `name_mapping: Option<Arc<NameMapping>>` - Name mapping from table metadata 2. **Implemented `constants_map()` function** (`arrow/record_batch_transformer.rs`): - Replicates Java's `PartitionUtil.constantsMap()` behavior - Only includes fields where transform is `Transform::Identity` - Used to determine which fields use partition metadata constants vs. reading from data files 3. **Enhanced `RecordBatchTransformer`** (`arrow/record_batch_transformer.rs`): - Added `build_with_partition_data()` method to accept partition spec, partition data, and name mapping - Implements all 4 spec rules for column resolution with identity-transform awareness - Detects field ID conflicts by verifying both field ID AND name match - Falls back to name mapping when field IDs are missing/conflicting (spec rule risingwavelabs#2) 4. **Updated `ArrowReader`** (`arrow/reader.rs`): - Uses `build_with_partition_data()` when partition information is available - Falls back to `build()` when not available 5. **Updated manifest entry processing** (`scan/context.rs`): - Populates partition fields in `FileScanTask` from manifest entry data 1. **`bucket_partitioning_reads_source_column_from_file`** - Verifies that bucket-partitioned source columns are read from data files (not treated as constants from partition metadata) 2. **`identity_partition_uses_constant_from_metadata`** - Verifies that identity-transformed fields correctly use partition metadata constants 3. **`test_bucket_partitioning_with_renamed_source_column`** - Verifies field-ID-based mapping works despite column rename 4. **`add_files_partition_columns_without_field_ids`** - Verifies name mapping resolution for Hive table imports without field IDs (spec rule 5. **`add_files_with_true_field_id_conflict`** - Verifies correct field ID conflict detection with name mapping fallback (spec rule risingwavelabs#2) 6. **`test_all_four_spec_rules`** - Integration test verifying all 4 spec rules work together Yes, there are 6 new unit tests covering all 4 Iceberg spec rules. This also resolved approximately 50 Iceberg Java tests when running with DataFusion Comet's experimental apache/datafusion-comet#2528 PR. --------- Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

…chTransformer (apache#1821) (#107) Partially address apache#1749. This PR adds partition spec handling to `FileScanTask` and `RecordBatchTransformer` to correctly implement the Iceberg spec's "Column Projection" rules for fields "not present" in data files. Prior to this PR, `iceberg-rust`'s `FileScanTask` had no mechanism to pass partition information to `RecordBatchTransformer`, causing two issues: 1. **Incorrect handling of bucket partitioning**: Couldn't distinguish identity transforms (which should use partition metadata constants) from non-identity transforms like bucket/truncate/year/month (which must read from data file). For example, `bucket(4, id)` stores `id_bucket = 2` (bucket number) in partition metadata, but actual `id` values (100, 200, 300) are only in the data file. iceberg-rust was incorrectly treating bucket-partitioned source columns as constants, breaking runtime filtering and returning incorrect query results. 2. **Field ID conflicts in add_files scenarios**: When importing Hive tables via `add_files`, partition columns could have field IDs conflicting with Parquet data columns. Example: Parquet has field_id=1→"name", but Iceberg expects field_id=1→"id" (partition). Per spec, the correct field is "not present" and requires name mapping fallback. Per the Iceberg spec (https://iceberg.apache.org/spec/#column-projection), when a field ID is "not present" in a data file, it must be resolved using these rules: 1. Return the value from partition metadata if an **Identity Transform** exists 2. Use `schema.name-mapping.default` metadata to map field id to columns without field id 3. Return the default value if it has a defined `initial-default` 4. Return null in all other cases **Why this matters:** - **Identity transforms** (e.g., `identity(dept)`) store actual column values in partition metadata that can be used as constants without reading the data file - **Non-identity transforms** (e.g., `bucket(4, id)`, `day(timestamp)`) store transformed values in partition metadata (e.g., bucket number 2, not the actual `id` values 100, 200, 300) and must read source columns from the data file 1. **Added partition fields to `FileScanTask`** (`scan/task.rs`): - `partition: Option<Struct>` - Partition data from manifest entry - `partition_spec: Option<Arc<PartitionSpec>>` - For transform-aware constant detection - `name_mapping: Option<Arc<NameMapping>>` - Name mapping from table metadata 2. **Implemented `constants_map()` function** (`arrow/record_batch_transformer.rs`): - Replicates Java's `PartitionUtil.constantsMap()` behavior - Only includes fields where transform is `Transform::Identity` - Used to determine which fields use partition metadata constants vs. reading from data files 3. **Enhanced `RecordBatchTransformer`** (`arrow/record_batch_transformer.rs`): - Added `build_with_partition_data()` method to accept partition spec, partition data, and name mapping - Implements all 4 spec rules for column resolution with identity-transform awareness - Detects field ID conflicts by verifying both field ID AND name match - Falls back to name mapping when field IDs are missing/conflicting (spec rule #2) 4. **Updated `ArrowReader`** (`arrow/reader.rs`): - Uses `build_with_partition_data()` when partition information is available - Falls back to `build()` when not available 5. **Updated manifest entry processing** (`scan/context.rs`): - Populates partition fields in `FileScanTask` from manifest entry data 1. **`bucket_partitioning_reads_source_column_from_file`** - Verifies that bucket-partitioned source columns are read from data files (not treated as constants from partition metadata) 2. **`identity_partition_uses_constant_from_metadata`** - Verifies that identity-transformed fields correctly use partition metadata constants 3. **`test_bucket_partitioning_with_renamed_source_column`** - Verifies field-ID-based mapping works despite column rename 4. **`add_files_partition_columns_without_field_ids`** - Verifies name mapping resolution for Hive table imports without field IDs (spec rule 5. **`add_files_with_true_field_id_conflict`** - Verifies correct field ID conflict detection with name mapping fallback (spec rule #2) 6. **`test_all_four_spec_rules`** - Integration test verifying all 4 spec rules work together Yes, there are 6 new unit tests covering all 4 Iceberg spec rules. This also resolved approximately 50 Iceberg Java tests when running with DataFusion Comet's experimental apache/datafusion-comet#2528 PR. --------- Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com> Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

Xuanwo and others added 30 commits December 26, 2023 08:45

feat: Expression system. (#132)

6375fb8

* feat: Expressions * Fix comments * Refactor expression to be more similar to iceberg model * Fix typo

website: Fix typo in book.toml (#136)

70b49e4

Signed-off-by: Xuanwo <github@xuanwo.io>

Set ghp_path and ghp_branch properties (#138)

43697b9

chore: Upgrade toolchain to 1.75.0 (#140)

38d1977

feat: Add roadmap and features status in README.md (#134)

6fa2768

* feat: Add roadmap and features status in README.md * Fix * Fix * Add more details according to comments * Revert unnecessary new line break * Nits --------- Co-authored-by: Fokko Driesprong <fokko@apache.org>

Infra: Remove publish: section from .asf.yaml (#141)

7dfb0f2

docs: Change homepage to rust.i.a.o (#146)

b703146

feat: Introduce basic file scan planning. (#129)

c91aeae

* Code complete * Resolve * Done * Fix comments * Fix comments * Fix comments * Fix * Fix comment

chore: Update contributing guide. (#163)

18f63fe

chore: Update reader api status (#162)

5d1a02d

* chore: Update reader api status * Restore unnecessary change

#154 : Add homepage to Cargo.toml (#160)

35f3481

Add formatting for toml files (#167)

bcf2c5c

* Add formatting for toml files * Update call to taplo * Add command to format and a command to check

feat: init file writer interface (#168)

a9104dc

* init file writer interface * refine --------- Co-authored-by: ZENOTME <st810918843@gmail.com>

fix: Manifest parsing should consider schema evolution. (#171)

e4f55d1

* fix: Manifest parsing should consider schema evolution. * Fix ut

docs: Add release guide for iceberg-rust (#147)

21c933a

fix: Ignore negative statistics value (#173)

d9d6cfc

feat: Add user guide for website. (#178)

be65c89

* Add * Fix format * Add license header

Replace unwrap (#183)

9ae9e13

feat: add handwritten serialize (#185)

6929b79

* add handwritten serialize * revert expect * remove expect

Fix avro schema names for manifest and manifest_list (#182)

390cd51

Co-authored-by: Fokko Driesprong <fokko@apache.org>

feat: Bump hive_metastore to use pure rust thrift impl volo (#174)

09765db

feat: Bump version 0.2.0 to prepare for release. (#181)

e008105

* feat: Bump version 0.2.0 to prepare for release. * Update dependencies

fix: default_partition_spec using the partion_spec_id set (#190)

149c7fa

* add unit tests * fix type

Docs: Add required Cargo version to install guide (#191)

645f9dd

FANNG1 and others added 26 commits August 17, 2024 00:42

test: refactor datafusion test with memory catalog (#557)

fcf14ca

* add memory catalog * fix style * fix style

add clean job in Makefile (#561)

1731766

docs: Fix build website permission changed (#564)

4440af6

Update the paths (#569)

94adc42

This is in line with the paths above, and also with the previous version: https://dist.apache.org/repos/dist/release/iceberg/

docs: Add links for released crates (#570)

a4894f9

Signed-off-by: Xuanwo <github@xuanwo.io>

Python: Use hatch for dependency management (#572)

9778fe6

Ensure that RestCatalog passes user config to FileIO (#476)

e387bdb

* fix: ensure that RestCatalog passes user config to FileIO * docs: added some doc comments to clarify override order for config

website: Update links for 0.3.0 (#573)

4da12c5

Signed-off-by: Xuanwo <github@xuanwo.io>

fix: correct partition-id to field-id in UnboundPartitionField (#576)

ba66665

* correct partition-id to field id in PartitionSpec * correct partition-id to field id in PartitionSpec * correct partition-id to field id in PartitionSpec * xx

fix: Update sqlx from 0.8.0 to 0.8.1 (#584)

f9c92b7

chore(deps): Bump crate-ci/typos from 1.23.6 to 1.24.1 (#583)

da08e8d

chore: bump crate-ci/typos to 1.24.3 (#598)

ae75f96

feat: Add more fields in FileScanTask (#609)

cbbd086

Signed-off-by: Xuanwo <github@xuanwo.io>

fix: Less Panics for Snapshot timestamps (#614)

ede4720

feat: partition compatibility (#612)

5812399

* Partition compatability * Partition compatability * Rename compatible_with -> is_compatible_with

feat: SortOrder methods should take schema ref if possible (#613)

a5aba9a

* SortOrder methods should take schema ref if possible * Fix test type * with_order_id should not take reference

feat: add client.region (#623)

f78c59b

fix: Correctly calculate highest_field_id in schema (#590)

e08c0e5

chenzl25 closed this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase Upstream #2

Rebase Upstream #2

Uh oh!

chenzl25 commented Sep 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Rebase Upstream #2

Rebase Upstream #2

Uh oh!

Conversation

chenzl25 commented Sep 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants