Skip to content

Remove :failed category when receiving a non-success webhook log

What does this MR do and why?

Previously the autodisabling webhook feature treated 4xx errors and 5xx differently.

  • Webhooks experiencing 4 consecutive 4xx errors would be permanently disabled.
  • Webhooks experiencing 4 consecutive 5xx errors would be temporarily disabled for a period of time before being automatically re-enabled. If their next attmempt was also a 5xx failure, it would become temporarily disabled again for a longer period of time, and so on until it was being temporarily disabled for periods of 24 hours.

This MR treats all failures identically, with a progression from temporarily disabled to becoming permanently disabled:

  • Webhooks experiencing 4 consecutive errors (4xx or 5xx) will be temporarily disabled following the same logic as above.
  • Webhooks experiencing 40 consecutive errors will be permanently disabled.

This allows all failures to "self-heal", and also all webhooks to become permanently disabled after a period of time.

See

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Webhook experiencing 4xx errors

Scenario master This branch
4 consecutive errors image image
39 consecutive errors Not possible image
40 consecutive errors Not possible image

Existing record being migrated:

Scenario master This branch, before migration has run This branch, after migration has run
4 consecutive errors image image image

Webhook experiencing 5xx errors

Scenario master This branch
4 consecutive errors image image
39 consecutive errors image image
40 consecutive errors image image

Webhook disabled labels

master This branch
image image
image image

How to set up and validate locally

  1. Enable the auto-disabling webhook feature
    Feature.enable(:auto_disabling_web_hooks)
  2. Create a project webhook:
    1. Choose a project, and go to Settings > Webhooks.
    2. For URL, enter https://httpstat.us/500 which will return a 500 error response.
    3. For Trigger, select Issues events.
    4. Select Add webhook.
  3. Go to Issues > List and choose an issue.
    1. Close, reopen, and close the issue to generate 3 failures.
  4. View the project webhooks again at Settings > Webhooks. Your webhook should not be labelled as disabled. Select Edit. You should not see any warning banners.
  5. Reopen the issue to generate 1 more failure.
  6. View the project webhooks again at Settings > Webhooks. Your webhook should now be labelled a Temporary disabled. Select Edit. You should see a warning banner saying your webhook is temporarily disabled for 1 minute.
  7. Any subsequent close or reopens of the issue during this time period should not execute any more webhooks (no new Recent events should appear at the bottom of the webhook edit page).
  8. After the duration the webhook should be re-enabled again.
  9. Generate a 5th failure event by opening or closing an issue and the webhook should temporarily disable again, this time for 2 minutes.
  10. On the rails console, update the webhook to be at the failure threshold where the next failure will cause it to be permanently disabled:
    webhook = WebHook.find(id) # Where `id` is the webhook ID - visible in the URI of its edit page
    webhook.update!(recent_failures: 39)
  11. Generate another failure event by opening or closing an issue and the webhook should become permanently disabled.
  12. View the project webhooks again at Settings > Webhooks. Your webhook should now be labelled a Disabled. Select Edit. You should see a warning banner saying your webhook is disabled.

Test the migration

  1. Rollback the migration on this branch if you have migrated it up
  2. Switch to master branch. Create a webhook, using https://httpstat.us/404 as the webhook URL to generate 404 status failures. Subscribe to issue events.
  3. Close and open an issue to generate 4 failures, the webhook will become permanently disabled. Confirm this in the UI.
  4. Switch to this branch
  5. Run the migrations
  6. View the webhook again. It will still be permanently disabled.

Migration

There are 118k records that would be updated on GitLab.com.

Raw SQL and query plans

-- app/models/concerns/each_batch.rb:65:in `each_batch'
SELECT "web_hooks"."id"
FROM "web_hooks"
WHERE (recent_failures > 3)
  AND "web_hooks"."disabled_until" IS NULL
ORDER BY "web_hooks"."id" ASC
LIMIT 1

Query plan.


-- app/models/concerns/each_batch.rb:84:in `block in each_batch'
SELECT "web_hooks"."id"
FROM "web_hooks"
WHERE (recent_failures > 3)
  AND "web_hooks"."disabled_until" IS NULL
  AND "web_hooks"."id" >= 1
ORDER BY "web_hooks"."id" ASC
LIMIT 1
OFFSET 10000

Query plan


-- The actual ids used for > and < would depend on the above result, but giving the widest possible range in this example:
-- db/post_migrate/20250317021451_migrate_old_disabled_web_hook_to_new_state.rb:21:in `block in up'
UPDATE "web_hooks"
SET "recent_failures" = 40,
    "backoff_count" = 37,
    "disabled_until" = '2025-03-18 02:55:06'
WHERE (recent_failures > 3)
  AND "web_hooks"."disabled_until" IS NULL
  AND "web_hooks"."id" >= 1
  AND "web_hooks"."id" < 10000 */

Query plan

Edited by Luke Duncalfe

Merge request reports

Loading