-
Notifications
You must be signed in to change notification settings - Fork 219
Speed up Dhall.Map.unorderedTraverseWithKey_ #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(cherry picked from commit 5223b2c)
This reduces the runtime of the `deep-nested-large-record` benchmark by about 50%.
This benchmark actually reports a ~20% loss of performance for the unorderedTraverseWithKey_ change. Since we eventually care about dhall's performance we it's better not to be mislead by a questionable micro-benchmark.
@@ -459,7 +458,8 @@ traverseWithKey f m = | |||
-} | |||
unorderedTraverseWithKey_ | |||
:: Ord k => Applicative f => (k -> a -> f ()) -> Map k a -> f () | |||
unorderedTraverseWithKey_ f = traverse_ (uncurry f) . toList | |||
unorderedTraverseWithKey_ f (Map m _) = | |||
Data.Map.foldlWithKey' (\acc k v -> acc *> f k v) (pure ()) m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
David Feuer points out in haskell/containers#422 (comment) that traversals should usually be lazy.
I suspect though that in the case of typechecking, strictness is beneficial – we want the complete traversal to be fast.
Maybe we should add a prime to the function name to mark it as strict?!
And I should also investigate whether some strictness in traverseWithKey
would aid performance too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with David that this seems fishy. You're making the f
computation strict, which is usually not a particular interesting thing to do. Also, foldl
seems wrong here, foldr
would make more sense. Then you end up building:
f k v *> (f k v *> (f k v *> ..))
Which in an Applicative
context means "run the first argument", then look at the remaining computation steps.
Can you give a right fold a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With foldlWithKey' (\acc k v -> acc *> f k v) (pure ())
:
benchmarking issue 412
time 785.1 μs (780.1 μs .. 791.4 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 783.6 μs (780.7 μs .. 790.3 μs)
std dev 13.88 μs (8.613 μs .. 22.55 μs)
benchmarking union performance
time 164.4 μs (163.8 μs .. 165.0 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 162.0 μs (161.1 μs .. 162.7 μs)
std dev 2.592 μs (2.108 μs .. 3.282 μs)
With foldrWithKey (\k v r -> f k v *> r) (pure ())
:
benchmarking issue 412
time 1.012 ms (1.005 ms .. 1.020 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.014 ms (1.009 ms .. 1.022 ms)
std dev 20.36 μs (12.62 μs .. 33.92 μs)
benchmarking union performance
time 174.7 μs (173.5 μs .. 176.0 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 174.6 μs (173.9 μs .. 175.7 μs)
std dev 2.686 μs (1.559 μs .. 4.326 μs)
I should find some more benchmarks I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you throw in numbers for foldrWithKey'
for good measure? And maybe foldlWithKey
(not strict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foldlWithKey (\acc k v -> acc *> f k v) (pure ())
(lazy):
benchmarking issue 412
time 790.2 μs (786.8 μs .. 794.2 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 790.4 μs (788.0 μs .. 795.8 μs)
std dev 11.74 μs (6.787 μs .. 20.89 μs)
benchmarking union performance
time 164.4 μs (163.5 μs .. 165.3 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 161.9 μs (161.0 μs .. 162.9 μs)
std dev 3.137 μs (2.541 μs .. 4.593 μs)
variance introduced by outliers: 13% (moderately inflated)
Map.foldrWithKey' (\k v r -> f k v *> r) (pure ())
(strict):
benchmarking issue 412
time 804.5 μs (801.5 μs .. 808.0 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 806.3 μs (803.9 μs .. 810.5 μs)
std dev 10.09 μs (5.878 μs .. 14.98 μs)
benchmarking union performance
time 164.4 μs (163.6 μs .. 165.2 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 162.1 μs (161.4 μs .. 163.1 μs)
std dev 3.031 μs (2.210 μs .. 5.024 μs)
variance introduced by outliers: 12% (moderately inflated)
Is there anything left to do here? I'd be fine with merging this as is |
Maybe it turns out that the strict fold causes type errors to be reported with too much delay. If so we can still switch to |
This reduces the runtime of the
deep-nested-large-record
benchmark by about 50%.Note that previously, contrary to its name and documentation, this function traversed a
Dhall.Map
in insertion order due to its use ofDhall.Map.toList
! With this change, the traversal is changed to ascending key order.Also remove the map-operations benchmark:
This benchmark actually reports a ~20% loss of performance for the
unorderedTraverseWithKey_
change.Since we eventually care about dhall's performance it's better not to be mislead by an overoptimizing micro-benchmark.