Skip to content

Conversation

@antoin-m
Copy link
Contributor

@antoin-m antoin-m commented Apr 23, 2020

Hi 👋

This is a fix for #49
According to the docs, here is how the hash functions should be used: https://golang.org/pkg/crypto/sha256/#New
Here is the same test (hashing a 1GB file) with the fix applied:

root@ef78500374de:/workspaces/script# go test
PASS
ok      github.com/bitfield/script      3.748s

IMHO there should be a sinks.CheckSum(hash.Hash) method instead. This way you could hash files with any algorithm you like. A checksum doesn't have to be cryptographically secure as mentioned #39 (comment), it's just a way to verify a file's integrity. People often share MD5 hashes because they're less CPU intensive to compute.

@bitfield
Copy link
Owner

Nice catch, @antoin-m! Thanks for the fix.

@thomaspoignant, as the author of this code in #40, did you have a specific reason for doing it this way in SHA256Sums(), but not in SHA256Sum()? How do you feel about this change? It seems sensible to me, especially since it makes the two methods identical.

Copy link
Owner

@bitfield bitfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done!

@bitfield
Copy link
Owner

IMHO there should be a sinks.CheckSum(hash.Hash) method

This is a good suggestion, but I think it fails the 'one-liner' test. It should be possible to do most of what you want to do with just a single-line chain of script methods. This would require you to create the hash separately, and then use it in your pipeline, which is a little awkward.

I agree that there are other valid uses for checksums which don't require strong hashing, but I'm sorry to say that in my experience, if an MD5Sum() method (for example) were available, people would end up using it for something they really shouldn't. I'm fine with making people take a little extra trouble to do something which is potentially a little bit dangerous.

@thomaspoignant
Copy link
Contributor

@thomaspoignant, as the author of this code in #40, did you have a specific reason for doing it this way in SHA256Sums(), but not in SHA256Sum()? How do you feel about this change? It seems sensible to me, especially since it makes the two methods identical.

It was different in the sink cause you can have something else than a file, you can do something like Echo("my text").SHA256Sum() it will return the hash of the content of the pipe.
SHA256Sums() is another hand is more specific to files.

But @antoin-m is right putting the whole file in memory can be a problem, and I have check the doc and what I said was not clear.

@bitfield do you think we should keep the 2 functions?

@antoin-m
Copy link
Contributor Author

This is a good suggestion, but I think it fails the 'one-liner' test. It should be possible to do most of what you want to do with just a single-line chain of script methods. This would require you to create the hash separately, and then use it in your pipeline, which is a little awkward.

Echo("my text").CheckSum(md5.New()) is still a one liner to me. But yeah I get your point.

@bitfield
Copy link
Owner

bitfield commented May 1, 2020

I think it's worth adding that 90% of all programmers are beginners (source: I made this figure up) and so a well-designed library or language will make the simple and obvious thing the safe and reliable thing. If you want to do something potentially unsafe, it's fine to have to do a little more work to enable it.

@bitfield
Copy link
Owner

bitfield commented May 1, 2020

Great job @antoin-m, thanks again!

@bitfield bitfield merged commit 3469354 into bitfield:master May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants