Add hashmaps/microbenchmarks/insertHeavy benchmark #338

utdemir · 2021-07-23T05:24:03Z

This PR adds a benchmark for Data.HashMap.Mutable.Linear which inserts a bunch of keys in random order. Here're some details of the benchmark:

The keys are unique, so there are no updates.
The hashmap is initialized to be twice as big as the input size to avoid resizes.
The keys are newtype wrappers around Int's for which the hash value is salted with a large number.

Here's the result of the benchmark on GHC HEAD (378c0bba7d):

$ cabal bench --benchmark-options 'hashmaps/microbenchmarks/insertHeavy'
...
Benchmark mutable-data: RUNNING...
benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Mutable.Linear
time                 45.20 ms   (44.88 ms .. 45.62 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 44.56 ms   (44.11 ms .. 44.90 ms)
std dev              778.4 μs   (502.6 μs .. 1.047 ms)

benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Strict
time                 34.17 ms   (33.29 ms .. 34.97 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 34.37 ms   (33.89 ms .. 35.07 ms)
std dev              1.218 ms   (857.9 μs .. 1.675 ms)
variance introduced by outliers: 11% (moderately inflated)

One thing I found when I'm implementing this is that, when you remove the salt from the hash function (and therefore making it identity for our Int keys) we are a lot faster. The fact that it gets faster makes sense, since there are no hash collisions (the largest Int is smaller than our capacity in this benchmark). But the fact that it gets almost twice as fast is curious, since I don't expect that much collisions. I had a look at our probe sequence lengths and found that most of the keys indeed did not have collisions; which means that even a few collisions slows downs our implementation disproportionately.

Also, this has two minor improvements to the existing benchmark suite (see the individual commits). Another note about the implementation is that it looks a bit different than the existing HashMap benchmarks. It has two reasons:

The benchmarks I'm planning to write are much smaller than the existing ones (i.e. focusing on just one function), to reduce the surface area.
When writing the benchmarks I find it useful to have the functions I'm comparing closer together. The existing benchmarks were grouped the opposite way (grouped by the hashmap implementation). Another solution would be to group the existing benchmarks the way I want, but I'm erring the side of smaller diffs.

aspiwack

The influence of the salt is rather unexpected indeed.

For reference, hashWithSalt salt i (for i an integer) is implemented as (salt * 16777619) xor i.

utdemir added 3 commits July 23, 2021 16:20

Do not require IO to initialize HashMap benchmarks

c2f74fe

Add hashmaps/microbenchmarks/insertHeavy benchmark

ed87007

Rename benchmark suite name to be more CLI friendly

daa7309

aspiwack approved these changes Jul 23, 2021

View reviewed changes

utdemir merged commit 5b755ac into master Jul 24, 2021

utdemir deleted the ud/hashmap-insert-benchmark branch July 24, 2021 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hashmaps/microbenchmarks/insertHeavy benchmark #338

Add hashmaps/microbenchmarks/insertHeavy benchmark #338

Uh oh!

utdemir commented Jul 23, 2021 •

edited

Loading

Uh oh!

aspiwack left a comment

Uh oh!

Uh oh!

Add hashmaps/microbenchmarks/insertHeavy benchmark #338

Add hashmaps/microbenchmarks/insertHeavy benchmark #338

Uh oh!

Conversation

utdemir commented Jul 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aspiwack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

utdemir commented Jul 23, 2021 •

edited

Loading