-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flate: Simplify matchlen (remove asm) #1045
Conversation
With unsafe, there is no benefit from matchlen assembly. Remove it.
📝 WalkthroughWalkthroughThe pull request introduces modifications to the compression and matching logic in the Changes
Sequence DiagramsequenceDiagram
participant Encoder as Fast Encoder
participant MatchLen as Match Length Calculator
Encoder->>MatchLen: Calculate match length
MatchLen-->>MatchLen: Compare 8-byte chunks
MatchLen-->>MatchLen: Compare remaining bytes
MatchLen-->>Encoder: Return match length
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
flate/fast_encoder.go (1)
154-179
: Optimizematchlen
with 64-bit comparisonsThe
matchlen
function has been efficiently rewritten to compare 8 bytes at a time usingle.Load64
and detect differences withbits.TrailingZeros64
. This improves performance by reducing the number of iterations required for matching sequences.Consider adding comments to explain the use of
bits.TrailingZeros64(diff) >> 3
for clarity, as this bit manipulation may not be immediately evident to all readers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
flate/fast_encoder.go
(3 hunks)flate/level1.go
(1 hunks)flate/matchlen_amd64.go
(0 hunks)flate/matchlen_amd64.s
(0 hunks)flate/matchlen_generic.go
(0 hunks)
💤 Files with no reviewable changes (3)
- flate/matchlen_generic.go
- flate/matchlen_amd64.s
- flate/matchlen_amd64.go
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: fuzz-other ("noasm,nounsafe")
- GitHub Check: fuzz-other (nounsafe)
- GitHub Check: build (1.23.x, windows-latest)
- GitHub Check: build (1.23.x, macos-latest)
- GitHub Check: fuzz-zstd ("noasm,nounsafe")
- GitHub Check: build (1.23.x, ubuntu-latest)
- GitHub Check: fuzz-zstd (nounsafe)
- GitHub Check: build (1.22.x, windows-latest)
- GitHub Check: build (1.22.x, macos-latest)
- GitHub Check: fuzz-s2 ("noasm,nounsafe")
- GitHub Check: build (1.22.x, ubuntu-latest)
- GitHub Check: fuzz-s2 (nounsafe)
- GitHub Check: build (1.21.x, windows-latest)
- GitHub Check: build (1.21.x, macos-latest)
- GitHub Check: build (1.21.x, ubuntu-latest)
- GitHub Check: build-special
🔇 Additional comments (4)
flate/fast_encoder.go (3)
10-10
: Importmath/bits
for optimized bit operationsThe addition of
"math/bits"
is appropriate for utilizing bit manipulation functions likebits.TrailingZeros64
, which enhances performance in match length calculations.
162-164
: Ensure safe unaligned memory accessWhen using
le.Load64
for loading 8-byte chunks, ensure that unaligned memory access is safe on all target architectures. While x86 architectures handle unaligned access gracefully, others like ARM may raise exceptions.Please confirm that
le.Load64
safely handles unaligned memory access across all supported architectures.
201-222
: OptimizematchlenLong
with bulk comparisonsThe
matchlenLong
function now efficiently handles long matches by processing 8 bytes at a time, similar tomatchlen
. This change enhances performance for larger data sets.flate/level1.go (1)
122-122
: Simplify match length calculation usingmatchlenLong
Replacing the custom inlined match extension logic with a call to
e.matchlenLong
streamlines the code and reduces duplication. This enhances maintainability and leverages the optimized function fromfast_encoder.go
.
* flate: Fix matchlen L5+L6 Regression from #1045
With unsafe, there is no benefit from matchlen assembly. Remove it.
Summary by CodeRabbit
Performance Improvements
Code Optimization
Architecture Changes