Skip to content

Add hashmaps/microbenchmarks/insertHeavy benchmark #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 24, 2021

Conversation

utdemir
Copy link
Contributor

@utdemir utdemir commented Jul 23, 2021

This PR adds a benchmark for Data.HashMap.Mutable.Linear which inserts a bunch of keys in random order. Here're some details of the benchmark:

  • The keys are unique, so there are no updates.
  • The hashmap is initialized to be twice as big as the input size to avoid resizes.
  • The keys are newtype wrappers around Int's for which the hash value is salted with a large number.

Here's the result of the benchmark on GHC HEAD (378c0bba7d):

$ cabal bench --benchmark-options 'hashmaps/microbenchmarks/insertHeavy'
...
Benchmark mutable-data: RUNNING...
benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Mutable.Linear
time                 45.20 ms   (44.88 ms .. 45.62 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 44.56 ms   (44.11 ms .. 44.90 ms)
std dev              778.4 μs   (502.6 μs .. 1.047 ms)

benchmarked hashmaps/microbenchmarks/insertHeavy/Data.HashMap.Strict
time                 34.17 ms   (33.29 ms .. 34.97 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 34.37 ms   (33.89 ms .. 35.07 ms)
std dev              1.218 ms   (857.9 μs .. 1.675 ms)
variance introduced by outliers: 11% (moderately inflated)

One thing I found when I'm implementing this is that, when you remove the salt from the hash function (and therefore making it identity for our Int keys) we are a lot faster. The fact that it gets faster makes sense, since there are no hash collisions (the largest Int is smaller than our capacity in this benchmark). But the fact that it gets almost twice as fast is curious, since I don't expect that much collisions. I had a look at our probe sequence lengths and found that most of the keys indeed did not have collisions; which means that even a few collisions slows downs our implementation disproportionately.


Also, this has two minor improvements to the existing benchmark suite (see the individual commits). Another note about the implementation is that it looks a bit different than the existing HashMap benchmarks. It has two reasons:

  • The benchmarks I'm planning to write are much smaller than the existing ones (i.e. focusing on just one function), to reduce the surface area.
  • When writing the benchmarks I find it useful to have the functions I'm comparing closer together. The existing benchmarks were grouped the opposite way (grouped by the hashmap implementation). Another solution would be to group the existing benchmarks the way I want, but I'm erring the side of smaller diffs.

Copy link
Member

@aspiwack aspiwack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The influence of the salt is rather unexpected indeed.

For reference, hashWithSalt salt i (for i an integer) is implemented as (salt * 16777619) xor i.

@utdemir utdemir merged commit 5b755ac into master Jul 24, 2021
@utdemir utdemir deleted the ud/hashmap-insert-benchmark branch July 24, 2021 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants