Skip to content

Add rewrite rule to unfuse (text . drop) #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 17, 2020

Conversation

FPtje
Copy link
Contributor

@FPtje FPtje commented Oct 14, 2020

Problem

While profiling some project, the following function turned out to be the largest bottleneck:

{-# INLINE slice #-}
-- | Substring from @offset@ to @offset + len@
slice :: Int -> Int -> Text -> Text
slice offset len = T.take len . T.drop offset

The direct cause of this bottleneck is fusion. The original take and drop functions work by calculating a new offset and length. A new view of Text can be returned without copying the underlying array. However, the slice function is rewritten, and the streaming implementation unnecessarily copies the data. This makes it much slower and more memory consuming.

With take and drop being fast individually, putting them together should not lead to a substantial decrease in performance.

Proposed solution

Add a rewrite rule that specifically rewrites take len . drop offset back to an unfused version, just like the TEXT take -> unfused and TEXT drop -> unfused do for their respective functions individually.

I would argue that a rewrite rule for this very specific pair of functions is justified because it represents the very common substring operation. Since I don't think it would be wise to add substring to the interface of Data.Text, optimizing its de facto implementation would be the next best thing.

Benchmark

A project hosting a benchmark can be found at Channable/haskell-string-slicing-benchmarks. It benchmarks the original function, the rule added by this pull request and three other solutions. See Bench.hs.

Below are the results of the benchmark that showed the most significant difference between fused and unfused:

Benchmark results for above mentioned slice
benchmarking long-5000-1000/naiveSlice
time                 258.8 μs   (257.7 μs .. 259.7 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 260.4 μs   (260.3 μs .. 260.5 μs)
std dev              340.3 ns   (284.6 ns .. 483.1 ns)
allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              1380479.830 (1380478.707 .. 1380481.163)
  y                  2550.560   (2209.409 .. 2956.324)
Benchmark results when the rule of this PR is included
benchmarking long-5000-1000/sliceWithRule
time                 4.941 μs   (4.937 μs .. 4.945 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.941 μs   (4.938 μs .. 4.944 μs)
std dev              9.959 ns   (8.231 ns .. 12.75 ns)
allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              64.000     (63.980 .. 64.025)
  y                  2556.215   (2301.293 .. 2812.799)

That's a huge difference, both in runtime and memory.

Open questions

  • There are many more subject-to-fusion functions that do not need to create a copy (last, tail, init, null to name a few). Any pipeline involving only such functions might be better off not being fused. Could a different approach to fusion improve performance here?
  • take n . drop m is very a common operation. Are there other common operations that would benefit from such rules?

@phadej
Copy link
Contributor

phadej commented Oct 15, 2020

No regression tests, no merge.

EDIT: also, please put the explanations why the change is made into the commit messages.

@MaxGabriel
Copy link

No regression tests, no merge.

EDIT: also, please put the explanations why the change is made into the commit messages.

Are these requirements documented somewhere? Could they be added to a DEVELOPMENT.md or pull_request_template.md so that potential contributors can learn this up-front?

@phadej
Copy link
Contributor

phadej commented Oct 15, 2020

Are these requirements documented somewhere? Could they be added to a DEVELOPMENT.md or pull_request_template.md so that potential contributors can learn this up-front?

Writing good commit messages and adding tests for your changes should be a common practice. But sure, we can be explicit.

The combination of take . drop is commonly used as a substring
operation. The original take and drop functions both work by calculating
just a new offset and length. When fused, the streaming implementation
unnecessarily copies the data. This makes it much slower and more memory
consuming.
FPtje added a commit to channable/text that referenced this pull request Oct 17, 2020
The combination of (take . drop) should not be fused. This test makes
sure they are not by comparing the underlying array before and after
applying those functions.

See haskell#301.
@FPtje FPtje force-pushed the feat-unfuse-take-drop branch from da3b8cd to f4dff92 Compare October 17, 2020 13:48
@FPtje
Copy link
Contributor Author

FPtje commented Oct 17, 2020

The original commit has been rewritten to contain a more thorough explanation, and a regression test was added to make sure fusion doesn't happen.

The combination of (take . drop) should not be fused. This test makes
sure they are not by comparing the underlying array before and after
applying those functions.

See haskell#301.
@FPtje FPtje force-pushed the feat-unfuse-take-drop branch from f4dff92 to 8c0a8da Compare October 17, 2020 14:50
@phadej phadej merged commit 7d3130b into haskell:master Oct 17, 2020
@phadej
Copy link
Contributor

phadej commented Oct 17, 2020

I wonder whether take n . drop m . drop p or drop n . take m would need own RULES too, but let someone first run into those problems.

emilypi pushed a commit to emilypi/text that referenced this pull request Dec 9, 2020
The combination of (take . drop) should not be fused. This test makes
sure they are not by comparing the underlying array before and after
applying those functions.

See haskell#301.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants