Some more tweaks for rehandshake queue#2576
Conversation
This cuts ~6.67MB from our reserved capacity. It's still larger than we could have if we stored IPv4 addresses packed together, but I think the new overhead is sufficient to remove the FIXME, we're ~3x worse than optimal for IPv4-only workloads now. On Linux most of the backing memory doesn't get allocated anyway.
adb1c6f to
f3f7183
Compare
This shrinks the p100 handshake time in practice (mostly fixing the comment) while still preserving random peer ordering between different hosts. In test, this seems to not *quite* accomplish our expectation of p100 time being exactly the handshake period (I see 47 minutes @ p100 with a 30 minute period) but that's still better than ~62 minutes we see otherwise and the average fluctuates much less (+/- roughly 1 minute).
| // the comparison key to ensure that we find duplicate entries correctly (just by hash | ||
| // might have different addresses interleave). | ||
| self.queue | ||
| .sort_unstable_by_key(|peer| (self.hasher.hash_one(peer), *peer)); |
There was a problem hiding this comment.
won't this hash the address on every comparison? I wonder if it would be better to use sort_unstable_by and check to see if the peers match before hashing? I guess that'll only reduce the hashing operations for duplicates which i'm not sure how often that'll happen.
There was a problem hiding this comment.
It will, but I think matching addresses are very unlikely in practice -- we should only have that happen transiently after a rehandshake (for ~10 minutes) -- so with 500k peers and a 24 hour period, (500_000/(60*24)*10)/500_000 * 100 = 0.69 = ~0.7% of peers (3472) should have duplicate entries at any time.
I think in practice this also runs ~once a handshake period (so once every 24 hours) so a bit of extra CPU is not a big deal for the hashing. We could cache them (sort_unstable_by_key_cached) but I think the extra memory spend isn't worth it.
|
|
||
| // Initializes the hasher with random keys, ensuring we handshake with peers in a | ||
| // different, random order from different hosts. | ||
| hasher: RandomState::new(), |
There was a problem hiding this comment.
do we need to rotate this state over time? the current behavior is the hosts that get the lowest hash values will always be prioritized over higher ones.
There was a problem hiding this comment.
I'm not sure what you mean by prioritized. We run through the full list before resuming, so while low (or high -- I never remember which order sorting goes in?) go first/last, I'm not sure that means much? The hash of a given peer address will be different on ~every endpoint (should be randomly mapped given a reasonably good hash, which SipHash seems to be).
I think rotating the state gets us to the random shuffling we were doing before, which is bad for p100 ages, and doesn't actually add a lot of value -- I can't think of why we'd expect rotating it to improve on something vs. not rotating, since they are randomly picked to start. Do you have a model for why shuffling each time would be better?
) * Store SocketAddressV6 in rehandshake queue This cuts ~6.67MB from our reserved capacity. It's still larger than we could have if we stored IPv4 addresses packed together, but I think the new overhead is sufficient to remove the FIXME, we're ~3x worse than optimal for IPv4-only workloads now. On Linux most of the backing memory doesn't get allocated anyway. * Avoid shuffling handshake queue on each run This shrinks the p100 handshake time in practice (mostly fixing the comment) while still preserving random peer ordering between different hosts. In test, this seems to not *quite* accomplish our expectation of p100 time being exactly the handshake period (I see 47 minutes @ p100 with a 30 minute period) but that's still better than ~62 minutes we see otherwise and the average fluctuates much less (+/- roughly 1 minute).
Release Summary:
Resolved issues:
n/a
Description of changes:
Call-outs:
n/a
Testing:
Just CI for the SocketAddr changes, the handshake queue reordering change is tested in our internal test bed and produces stable results (see commit message for details).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.