Remove :failed category when receiving a non-success webhook log
requested to merge gitlab-community/gitlab-org/gitlab:396577_remove_failed_method_from_hook_log_execution into master
What does this MR do and why?
Previously the autodisabling webhook feature treated 4xx
errors and 5xx
differently.
- Webhooks experiencing 4 consecutive
4xx
errors would be permanently disabled. - Webhooks experiencing 4 consecutive
5xx
errors would be temporarily disabled for a period of time before being automatically re-enabled. If their next attmempt was also a5xx
failure, it would become temporarily disabled again for a longer period of time, and so on until it was being temporarily disabled for periods of 24 hours.
This MR treats all failures identically, with a progression from temporarily disabled to becoming permanently disabled:
- Webhooks experiencing 4 consecutive errors (
4xx
or5xx
) will be temporarily disabled following the same logic as above. - Webhooks experiencing 40 consecutive errors will be permanently disabled.
This allows all failures to "self-heal", and also all webhooks to become permanently disabled after a period of time.
See
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
4xx
errors
Webhook experiencing Scenario | master |
This branch |
---|---|---|
4 consecutive errors |
![]() |
![]() |
39 consecutive errors | Not possible |
![]() |
40 consecutive errors | Not possible |
![]() |
Existing record being migrated:
Scenario | master |
This branch, before migration has run | This branch, after migration has run |
---|---|---|---|
4 consecutive errors |
![]() |
![]() |
![]() |
5xx
errors
Webhook experiencing Scenario | master |
This branch |
---|---|---|
4 consecutive errors |
![]() |
![]() |
39 consecutive errors |
![]() |
![]() |
40 consecutive errors |
![]() |
![]() |
Webhook disabled labels
master |
This branch |
---|---|
![]() |
![]() |
![]() |
![]() |
How to set up and validate locally
- Enable the auto-disabling webhook feature
Feature.enable(:auto_disabling_web_hooks)
- Create a project webhook:
- Choose a project, and go to Settings > Webhooks.
- For URL, enter
https://httpstat.us/500
which will return a500
error response. - For Trigger, select Issues events.
- Select Add webhook.
- Go to Issues > List and choose an issue.
- Close, reopen, and close the issue to generate 3 failures.
- View the project webhooks again at Settings > Webhooks. Your webhook should not be labelled as disabled. Select Edit. You should not see any warning banners.
- Reopen the issue to generate 1 more failure.
- View the project webhooks again at Settings > Webhooks. Your webhook should now be labelled a Temporary disabled. Select Edit. You should see a warning banner saying your webhook is temporarily disabled for 1 minute.
- Any subsequent close or reopens of the issue during this time period should not execute any more webhooks (no new Recent events should appear at the bottom of the webhook edit page).
- After the duration the webhook should be re-enabled again.
- Generate a 5th failure event by opening or closing an issue and the webhook should temporarily disable again, this time for 2 minutes.
- On the rails console, update the webhook to be at the failure threshold where the next failure will cause it to be permanently disabled:
webhook = WebHook.find(id) # Where `id` is the webhook ID - visible in the URI of its edit page webhook.update!(recent_failures: 39)
- Generate another failure event by opening or closing an issue and the webhook should become permanently disabled.
- View the project webhooks again at Settings > Webhooks. Your webhook should now be labelled a Disabled. Select Edit. You should see a warning banner saying your webhook is disabled.
Test the migration
- Rollback the migration on this branch if you have migrated it up
- Switch to
master
branch. Create a webhook, usinghttps://httpstat.us/404
as the webhook URL to generate404
status failures. Subscribe to issue events. - Close and open an issue to generate 4 failures, the webhook will become permanently disabled. Confirm this in the UI.
- Switch to this branch
- Run the migrations
- View the webhook again. It will still be permanently disabled.
Migration
There are 118k records that would be updated on GitLab.com.
Raw SQL and query plans
-- app/models/concerns/each_batch.rb:65:in `each_batch'
SELECT "web_hooks"."id"
FROM "web_hooks"
WHERE (recent_failures > 3)
AND "web_hooks"."disabled_until" IS NULL
ORDER BY "web_hooks"."id" ASC
LIMIT 1
-- app/models/concerns/each_batch.rb:84:in `block in each_batch'
SELECT "web_hooks"."id"
FROM "web_hooks"
WHERE (recent_failures > 3)
AND "web_hooks"."disabled_until" IS NULL
AND "web_hooks"."id" >= 1
ORDER BY "web_hooks"."id" ASC
LIMIT 1
OFFSET 10000
-- The actual ids used for > and < would depend on the above result, but giving the widest possible range in this example:
-- db/post_migrate/20250317021451_migrate_old_disabled_web_hook_to_new_state.rb:21:in `block in up'
UPDATE "web_hooks"
SET "recent_failures" = 40,
"backoff_count" = 37,
"disabled_until" = '2025-03-18 02:55:06'
WHERE (recent_failures > 3)
AND "web_hooks"."disabled_until" IS NULL
AND "web_hooks"."id" >= 1
AND "web_hooks"."id" < 10000 */
Edited by Luke Duncalfe