-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s2: Improve small block compression speed w/o asm #1048
Conversation
Test with `-tags=noasm` Uses uint16 hash table and shorter hash. ``` benchmark old MB/s new MB/s speedup BenchmarkEncodeSnappyBlockParallel/12-txt1_128b/s2-snappy-16 2518.04 4181.41 1.66x BenchmarkEncodeSnappyBlockParallel/12-txt1_128b/s2-snappy-better-16 4.59 1187.18 258.64x BenchmarkEncodeSnappyBlockParallel/12-txt1_128b/snappy-noasm-16 3362.43 3487.90 1.04x BenchmarkEncodeSnappyBlockParallel/13-txt1_1000b/s2-snappy-16 8053.34 8670.50 1.08x BenchmarkEncodeSnappyBlockParallel/13-txt1_1000b/s2-snappy-better-16 34.60 3270.45 94.52x BenchmarkEncodeSnappyBlockParallel/13-txt1_1000b/snappy-noasm-16 4901.85 4969.43 1.01x BenchmarkEncodeSnappyBlockParallel/14-txt1_10000b/s2-snappy-16 9349.04 8486.48 0.91x BenchmarkEncodeSnappyBlockParallel/14-txt1_10000b/s2-snappy-better-16 359.32 3851.49 10.72x BenchmarkEncodeSnappyBlockParallel/14-txt1_10000b/snappy-noasm-16 5361.99 5511.97 1.03x ```
📝 WalkthroughWalkthroughThe pull request introduces specialized encoding functions for handling smaller data blocks (up to 64KB) in the Snappy compression library. The changes modify the encoding logic across multiple files in the Changes
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
s2/encode_all.go (2)
838-845
: Consider using the debug constant for validation.The validation code uses
if false
instead of thedebug
constant used elsewhere. Consider maintaining consistency with the debug approach used in other functions.- if false { + if debug {
Line range hint
286-880
: Excellent optimization strategy for small block compression!The implementation introduces specialized encoders for small blocks (≤64KB) with several well-thought-out optimizations:
- Memory efficiency: Using
uint16
instead ofuint32
for hash tables- Performance tuning: Adjusted step sizes for match finding
- Maintainability: Consistent implementation across both standard and Snappy-compatible encoders
- Safety: Preserved all validation and safety checks
The changes align perfectly with the PR objective of improving small block compression speed without assembly optimizations.
s2/encode_better.go (3)
523-523
: Consider documenting the skip factor change.The skip factor has been changed from
>>7
to>>6
, which means more aggressive scanning. This might affect the compression ratio vs speed trade-off.
553-605
: Remove unused repeat matching code.The repeat matching code is effectively disabled with
if false
. Either remove the dead code or enable it if it's intended for future use.- if false && repeat > 0 && cv&repeatMask == load64(src, s-repeat)&repeatMask { - base := s + checkRep - // ... (remove entire block) - }
481-888
: Excellent architectural approach for small block optimization!The introduction of specialized 64K functions with the following characteristics is well-designed:
- Memory efficiency through right-sized uint16 tables
- Clear separation between S2 and Snappy-compatible implementations
- Maintains format compatibility while improving performance
- Optimized for the common case of small blocks (≤64KB)
This aligns perfectly with the PR's objective of improving small block compression speed without assembly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
s2/encode_all.go
(3 hunks)s2/encode_better.go
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: build (1.23.x, windows-latest)
- GitHub Check: build (1.23.x, macos-latest)
- GitHub Check: build (1.23.x, ubuntu-latest)
- GitHub Check: build (1.22.x, windows-latest)
- GitHub Check: build (1.22.x, macos-latest)
- GitHub Check: build (1.22.x, ubuntu-latest)
- GitHub Check: fuzz-zstd ("noasm,nounsafe")
- GitHub Check: build (1.21.x, windows-latest)
- GitHub Check: fuzz-zstd (nounsafe)
- GitHub Check: build (1.21.x, macos-latest)
- GitHub Check: fuzz-s2 ("noasm,nounsafe")
- GitHub Check: build (1.21.x, ubuntu-latest)
- GitHub Check: fuzz-other ("noasm,nounsafe")
- GitHub Check: build-special
- GitHub Check: fuzz-s2 (nounsafe)
- GitHub Check: fuzz-other (nounsafe)
- GitHub Check: Analyze (go)
🔇 Additional comments (8)
s2/encode_all.go (4)
286-296
: Excellent memory optimization using uint16 for small blocks!The specialized encoder uses
uint16
for the hash table since offsets in a 64KB block can't exceed 65535, reducing memory usage while maintaining the same compression ratio.
691-699
: LGTM! Consistent optimization strategy across both encoders.The Snappy-compatible encoder follows the same optimization pattern, using
uint16
for the hash table.
49-54
: LGTM! Size-based encoding selection looks good.The conditional logic cleanly routes small blocks (< 64KB) to the optimized encoder while maintaining the original implementation for larger blocks.
Let's verify that all callers can handle both encoding paths:
✅ Verification successful
Size-based encoding selection is properly integrated across the codebase
All callers either use the correct size-based selection logic or properly handle both variants in test scenarios. No integration issues found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for direct calls to encodeBlockGo and encodeBlockGo64K rg -A 2 'encodeBlockGo(?:64K)?\(' --type goLength of output: 1128
321-321
: Verify the impact of modified step size.The step size for finding the next match position has been optimized from
>>6
to>>5
, potentially affecting the compression speed/ratio tradeoff.Let's verify the performance impact:
s2/encode_better.go (4)
351-351
: LGTM! Good optimization to prevent excessive searching.Adding
maxSkip
with a value of 100 effectively prevents pathological cases where the encoder might search too far ahead, improving worst-case performance without significantly impacting compression ratio.
490-502
: Excellent memory optimization for small blocks!The use of smaller hash tables is well-suited for blocks up to 64KB:
- Reduced
lTableBits
from 17 to 16- Reduced
sTableBits
from 14 to 13- Changed table types from
uint32
touint16
This reduces memory usage by approximately 50% without compromising compression for small blocks.
741-748
: Good table size optimization for Snappy compatibility.The table sizes are well-optimized for Snappy-compatible encoding:
- Further reduced
lTableBits
to 15 (vs 16 in non-Snappy version)- Kept
sTableBits
at 13- Using
uint16
tables maintains the memory optimization
840-841
: Verify the impact of removing repeat optimization.The function uses
emitCopyNoRepeat
instead of the repeat optimization. While this maintains Snappy compatibility, we should verify the performance impact.Run the following script to compare performance between repeat and no-repeat versions:
✅ Verification successful
Removing repeat optimization is an intentional design choice for Snappy compatibility
The change to use
emitCopyNoRepeat
is part of S2's design to maintain Snappy compatibility while still delivering better performance. According to the documentation and benchmarks, S2 consistently outperforms Snappy in both speed and compression ratio, even without the repeat optimization. The codebase shows this is a deliberate trade-off that doesn't significantly impact S2's primary goals of high throughput and better compression.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find and analyze performance tests comparing repeat vs no-repeat versions # Search for benchmarks comparing the two versions rg -A 5 'BenchmarkEncode.*Snappy.*64K' # Search for any existing performance comparisons rg -i 'repeat.*vs.*no.*repeat'Length of output: 73
Script:
#!/bin/bash # Find benchmark files fd "_test.go$" # Look for encode implementation details rg -B 2 -A 2 "func.*[Ee]mit[Cc]opy" # Search for documentation about encoding fd "README|DOC" --extension md # Look for performance related comments rg -i "performance|benchmark|optimization" --type goLength of output: 35140
Script:
#!/bin/bash # Look for documentation about repeat optimization rg -B 2 -A 2 -i "repeat.*optimization" s2/ # Check for comments near the implementation cat s2/encode_go.go | grep -B 5 -A 5 "emitCopyNoRepeat" # Look for performance notes in README cat s2/README.mdLength of output: 59346
Test with
-tags=noasm
Uses uint16 hash table, which reduced the zero cost by a lot.
(yes, benchmarks are a bit noisy)
Summary by CodeRabbit
Release Notes
New Features
Improvements
Performance