Skip to content

Conversation

@waynexia
Copy link
Member

Which issue does this PR close?

Part of #5550.

Rationale for this change

Content-MD5 is required in some S3-compatible services

What changes are included in this PR?

Add Content-MD5 support for S3 backend.

Test with oli

image

Are there any user-facing changes?

New option for S3 backend

@waynexia waynexia requested a review from Xuanwo as a code owner August 15, 2025 01:09
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. releases-note/feat The PR implements a new feature or has a title that begins with "feat" labels Aug 15, 2025
/// When enabled, OpenDAL will calculate and include the Content-MD5 header
/// for PUT operations (single uploads, multipart parts, and append operations).
/// This header provides an additional layer of data integrity verification.
pub enable_content_md5: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the correct approch to implement data integrity verification.

Maybe we can take a look over #5549

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about implementing content_md5 as one of checksum_algorithm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

@waynexia waynexia changed the title feat: implement content-md5 for s3 [WIP] feat: implement content-md5 for s3 Aug 19, 2025
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 30, 2025
@waynexia waynexia changed the title [WIP] feat: implement content-md5 for s3 feat: implement content-md5 for s3 Oct 30, 2025
@waynexia waynexia requested a review from Xuanwo October 30, 2025 03:28
.for_each(|b| crc = crc32c::crc32c_append(crc, &b));
Some(BASE64_STANDARD.encode(crc.to_be_bytes()))
}
Some(ChecksumAlgorithm::Md5) => Some(format_content_md5(body.to_bytes().as_ref())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a better way to handle this, like calculating in a stream instead of calling body.to_bytes() first? Our users might send data up to 5GiB in a single request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 made a non-copy version format_content_md5_iter that works on iterator

Signed-off-by: Ruihang Xia <[email protected]>
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 30, 2025
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 30, 2025
@Xuanwo Xuanwo merged commit 7f9bef2 into apache:main Oct 30, 2025
331 checks passed
@waynexia waynexia deleted the content-md5 branch October 30, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants