Add rewrite rule to unfuse (text . drop) #301
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
While profiling some project, the following function turned out to be the largest bottleneck:
The direct cause of this bottleneck is fusion. The original
take
anddrop
functions work by calculating a new offset and length. A new view ofText
can be returned without copying the underlying array. However, theslice
function is rewritten, and the streaming implementation unnecessarily copies the data. This makes it much slower and more memory consuming.With
take
anddrop
being fast individually, putting them together should not lead to a substantial decrease in performance.Proposed solution
Add a rewrite rule that specifically rewrites
take len . drop offset
back to an unfused version, just like theTEXT take -> unfused
andTEXT drop -> unfused
do for their respective functions individually.I would argue that a rewrite rule for this very specific pair of functions is justified because it represents the very common substring operation. Since I don't think it would be wise to add
substring
to the interface ofData.Text
, optimizing its de facto implementation would be the next best thing.Benchmark
A project hosting a benchmark can be found at Channable/haskell-string-slicing-benchmarks. It benchmarks the original function, the rule added by this pull request and three other solutions. See
Bench.hs
.Below are the results of the benchmark that showed the most significant difference between fused and unfused:
Benchmark results for above mentioned slice
Benchmark results when the rule of this PR is included
That's a huge difference, both in runtime and memory.
Open questions
last
,tail
,init
,null
to name a few). Any pipeline involving only such functions might be better off not being fused. Could a different approach to fusion improve performance here?take n . drop m
is very a common operation. Are there other common operations that would benefit from such rules?