Skip to content

Conversation

@psFried
Copy link
Contributor

@psFried psFried commented Feb 6, 2024

Rolls up a few minor improvements to gazctl shards prune. Individual commit messages have more details, but the main goal is to add more detailed audit logging, and to continue pruning after failing to remove a fragment.

I've been using this script to compare the audit logs produced by this, to try to detect any journals or fragments that were removed in an earlier prune operation, but not in a subsequent one. So far, I've not found any.


This change is Reviewable

If someone accidentally runs `shards prune` with a selector that doesn't
include all the shards that use a forked recovery log, it's possible for it to
try to prune the current fragment for a journal. This adds a check for that
condition so that we can provide visibility and stop the prune operation.

Attempting to prune an unpersisted fragment would fail anyway, but this
at least gives a clearer error message.
Adds a lot more information to the logs when fragments are deleted. The intent
is to run with JSON logs, to enable automated audits and analysis of prune
operations.  Because we're logging out so much more information as part of the
logs, I also turned down the log level on some of the other messages.

I also added in some additional warning logs for conditions that should never
be encountered.
This allows recovery log pruning to continue after encountering an error
removing a fragment.  We now operate Gazette clusters that use a variety of
different storage buckets, and it seems unavoidable that some of them might
have permissions misconfigured or return an error for some other reason.  In
that case, we'll now log a warning and continue the prune operation. Gazctl
will still exit non-zero if it has encountered any errors removing fragments,
to ensure it never fails silently.
@psFried psFried requested a review from jgraettinger February 6, 2024 12:49
Copy link
Contributor

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@psFried psFried merged commit 338b339 into master Feb 6, 2024
@psFried psFried deleted the phil/pruning-paranoia branch February 6, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants