-
Notifications
You must be signed in to change notification settings - Fork 707
feat: implement content-md5 for s3 #6508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
core/src/services/s3/config.rs
Outdated
| /// When enabled, OpenDAL will calculate and include the Content-MD5 header | ||
| /// for PUT operations (single uploads, multipart parts, and append operations). | ||
| /// This header provides an additional layer of data integrity verification. | ||
| pub enable_content_md5: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the correct approch to implement data integrity verification.
Maybe we can take a look over #5549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about implementing content_md5 as one of checksum_algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
core/src/services/s3/core.rs
Outdated
| .for_each(|b| crc = crc32c::crc32c_append(crc, &b)); | ||
| Some(BASE64_STANDARD.encode(crc.to_be_bytes())) | ||
| } | ||
| Some(ChecksumAlgorithm::Md5) => Some(format_content_md5(body.to_bytes().as_ref())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a better way to handle this, like calculating in a stream instead of calling body.to_bytes() first? Our users might send data up to 5GiB in a single request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 made a non-copy version format_content_md5_iter that works on iterator
Signed-off-by: Ruihang Xia <[email protected]>
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Which issue does this PR close?
Part of #5550.
Rationale for this change
Content-MD5is required in some S3-compatible servicesWhat changes are included in this PR?
Add
Content-MD5support for S3 backend.Test with
oliAre there any user-facing changes?
New option for S3 backend