Use single file write when an extension is present in the path.#13079
Merged
alamb merged 3 commits intoapache:mainfrom Nov 1, 2024
Merged
Use single file write when an extension is present in the path.#13079alamb merged 3 commits intoapache:mainfrom
alamb merged 3 commits intoapache:mainfrom
Conversation
Contributor
|
Thanks @dhegberg -- I plan to review this later today |
alamb
approved these changes
Nov 1, 2024
Contributor
alamb
left a comment
There was a problem hiding this comment.
Thank you @dhegberg -- this is a really nice PR -- I think the code and tests are well written.
Thank you 🙏
cc @progval
I also tried it locally with datafusion-cli and it works as expected 👌
> copy (values (1), (2)) to '/tmp/foo' STORED AS parquet;
+-------+
| count |
+-------+
| 2 |
+-------+
1 row(s) fetched.
Elapsed 0.030 seconds.
>
\q
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/datafusion-cli$ ls -ltr /tmp/foo
total 8
-rw-r--r--@ 1 andrewlamb wheel 342B Nov 1 12:31 MrzgxU8HT1fn3wTB_0.parquet
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/datafusion-cli$| } | ||
|
|
||
| #[test] | ||
| fn test_file_extension() { |
| Ok(()) | ||
| } | ||
|
|
||
| #[tokio::test] |
Contributor
|
Here is a small follow on PR to add some more docs #13216 (really get the great writeup you did on this PR into the code) |
Contributor
|
I suspect this introduced a regression - would appreciate your opinion on #13323 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #9684.
Rationale for this change
Dataframe's
write_parquet()was identified as incorrectly identifying paths without an extension as a single file output.This change updates
start_demuxer_taskto respect the suggested behaviour:What changes are included in this PR?
file_extension()toListingTableUrlto return an Optional extensionstart_demuxer_task()to require the presence of an extension from theListingTableUrlto setsingle_file_outputto truefile_extensiontodefault_extensionto indicate usage will be ignored ifsingle_file_outputis triggered.Are these changes tested?
file_extension()Dataframe.write_parquet()for paths with and without extensions.start_demuxer_tasksince there was no direct testing originally. I can revise and test this directly if preferred.Testing via
cargo test -- --test-threads=1Are there any user-facing changes?