Skip to content

Some more tweaks for rehandshake queue#2576

Merged
camshaft merged 2 commits intoaws:mainfrom
Mark-Simulacrum:fix-rehandshake-nits
Mar 28, 2025
Merged

Some more tweaks for rehandshake queue#2576
camshaft merged 2 commits intoaws:mainfrom
Mark-Simulacrum:fix-rehandshake-nits

Conversation

@Mark-Simulacrum
Copy link
Collaborator

@Mark-Simulacrum Mark-Simulacrum commented Mar 28, 2025

Release Summary:

  • refactor(s2n-quic-dc): shrink rehandshake queue memory footprint
  • fix(s2n-quic-dc): avoid new handshake queue order on every refill

Resolved issues:

n/a

Description of changes:

  • This cuts ~6.67MB from our reserved capacity. It's still larger than we could have if we stored IPv4 addresses packed together, but I think the new overhead is sufficient to remove the FIXME, we're ~3x worse than optimal for IPv4-only workloads now. On Linux most of the backing memory doesn't get allocated anyway.
  • Handshake queue ordering is now based on a hash, not full randomness. This ensures the same peer hashes into the same slot over time, while nicely absorbing changes in the peer set. The hasher is randomly seeded.

Call-outs:

n/a

Testing:

Just CI for the SocketAddr changes, the handshake queue reordering change is tested in our internal test bed and produces stable results (see commit message for details).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This cuts ~6.67MB from our reserved capacity. It's still larger than we
could have if we stored IPv4 addresses packed together, but I think the
new overhead is sufficient to remove the FIXME, we're ~3x worse than
optimal for IPv4-only workloads now. On Linux most of the backing memory
doesn't get allocated anyway.
@Mark-Simulacrum Mark-Simulacrum marked this pull request as ready for review March 28, 2025 14:44
This shrinks the p100 handshake time in practice (mostly fixing the
comment) while still preserving random peer ordering between different
hosts. In test, this seems to not *quite* accomplish our expectation of
p100 time being exactly the handshake period (I see 47 minutes @ p100
with a 30 minute period) but that's still better than ~62 minutes we see
otherwise and the average fluctuates much less (+/- roughly 1 minute).
@Mark-Simulacrum Mark-Simulacrum changed the title Store SocketAddressV6 in rehandshake queue Some more tweaks for rehandshake queue Mar 28, 2025
// the comparison key to ensure that we find duplicate entries correctly (just by hash
// might have different addresses interleave).
self.queue
.sort_unstable_by_key(|peer| (self.hasher.hash_one(peer), *peer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this hash the address on every comparison? I wonder if it would be better to use sort_unstable_by and check to see if the peers match before hashing? I guess that'll only reduce the hashing operations for duplicates which i'm not sure how often that'll happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, but I think matching addresses are very unlikely in practice -- we should only have that happen transiently after a rehandshake (for ~10 minutes) -- so with 500k peers and a 24 hour period, (500_000/(60*24)*10)/500_000 * 100 = 0.69 = ~0.7% of peers (3472) should have duplicate entries at any time.

I think in practice this also runs ~once a handshake period (so once every 24 hours) so a bit of extra CPU is not a big deal for the hashing. We could cache them (sort_unstable_by_key_cached) but I think the extra memory spend isn't worth it.


// Initializes the hasher with random keys, ensuring we handshake with peers in a
// different, random order from different hosts.
hasher: RandomState::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to rotate this state over time? the current behavior is the hosts that get the lowest hash values will always be prioritized over higher ones.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by prioritized. We run through the full list before resuming, so while low (or high -- I never remember which order sorting goes in?) go first/last, I'm not sure that means much? The hash of a given peer address will be different on ~every endpoint (should be randomly mapped given a reasonably good hash, which SipHash seems to be).

I think rotating the state gets us to the random shuffling we were doing before, which is bad for p100 ages, and doesn't actually add a lot of value -- I can't think of why we'd expect rotating it to improve on something vs. not rotating, since they are randomly picked to start. Do you have a model for why shuffling each time would be better?

@camshaft camshaft merged commit 6a81117 into aws:main Mar 28, 2025
116 checks passed
@Mark-Simulacrum Mark-Simulacrum deleted the fix-rehandshake-nits branch March 28, 2025 19:43
dougch pushed a commit that referenced this pull request May 19, 2025
)

* Store SocketAddressV6 in rehandshake queue

This cuts ~6.67MB from our reserved capacity. It's still larger than we
could have if we stored IPv4 addresses packed together, but I think the
new overhead is sufficient to remove the FIXME, we're ~3x worse than
optimal for IPv4-only workloads now. On Linux most of the backing memory
doesn't get allocated anyway.

* Avoid shuffling handshake queue on each run

This shrinks the p100 handshake time in practice (mostly fixing the
comment) while still preserving random peer ordering between different
hosts. In test, this seems to not *quite* accomplish our expectation of
p100 time being exactly the handshake period (I see 47 minutes @ p100
with a 30 minute period) but that's still better than ~62 minutes we see
otherwise and the average fluctuates much less (+/- roughly 1 minute).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants