ENG-2756: Skip watchdog cancellation for pending tasks awaiting upstream dependencies by JadeCara · Pull Request #7525 · ethyca/fides

JadeCara · 2026-02-27T22:56:39Z

Description Of Changes

Fixes a false-positive in the requeue_interrupted_tasks watchdog that was incorrectly canceling privacy requests during normal DSR execution.

Root cause: The watchdog queries for request tasks in pending or in_processing status and checks for a cached Celery task ID in Redis. Tasks only get a cache key when they're dispatched via queue_request_task(), which only happens once all upstream tasks are complete. Tasks that are legitimately pending and waiting for upstream dependencies have never been dispatched and therefore have no cache key — the watchdog was treating these as "stuck" and canceling the entire privacy request.

This was most visible on the erasure phase of multi-connector DSRs: the access terminator completes and re-queues the privacy request for erasure, but the new erasure-phase tasks start as pending with no cache keys, and the watchdog fires during that transition window.

Fix: Before canceling, check if the task is pending with incomplete upstream dependencies. If so, skip it — it's legitimately waiting. Only cancel if upstream tasks are complete but the task was never dispatched (truly stuck).

Code Changes

src/fides/api/service/privacy_request/request_service.py - Added upstream dependency check in requeue_interrupted_tasks before the cancel path for tasks with no cached subtask ID
tests/task/test_requeue_interrupted_tasks.py - Added two new tests covering the pending-awaiting-upstream (skip) and pending-upstream-complete-no-cache-key (cancel) cases

Steps to Confirm

Run the existing test suite for the watchdog — no regressions in tests/task/test_requeue_interrupted_tasks.py is the primary verification for this fix.

pytest tests/task/test_requeue_interrupted_tasks.py

The two new tests added in this PR directly cover the fixed behavior:

test_pending_task_awaiting_upstream_is_not_canceled — verifies the watchdog skips legitimately waiting tasks
test_pending_task_with_complete_upstream_and_no_cache_key_is_canceled — verifies truly stuck tasks are still canceled

Pre-Merge Checklist

Made with Cursor

…encies The requeue_interrupted_tasks monitor was false-positiving on pending request tasks that haven't been dispatched to Celery yet because their upstream dependencies aren't complete. These tasks legitimately have no cache key (cache_task_tracking_key is only called at dispatch time), so the monitor was incorrectly treating them as stuck and canceling the entire privacy request. Fix: before canceling, check if the task is pending and its upstream tasks are incomplete — if so, skip it. Only cancel if upstream is done but the task was never dispatched (truly stuck). Fixes ENG-2756. Made-with: Cursor

vercel · 2026-02-27T22:56:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ignored	Preview	Mar 2, 2026 9:24pm
fides-privacy-center	Ignored		Mar 2, 2026 9:24pm

Made-with: Cursor

greptile-apps · 2026-02-27T23:11:26Z

Greptile Summary

This PR fixes a false-positive in the requeue_interrupted_tasks watchdog that was incorrectly canceling privacy requests whose tasks were legitimately pending while waiting for upstream dependencies to finish. The fix refactors _get_request_task_ids_in_progress to yield (task_id, status, awaiting_upstream) tuples and adds an upstream-dependency guard before the cancel path, correctly mirroring the existing RequestTask.upstream_tasks_complete() model logic via an efficient pre-built status lookup dictionary.

Core logic (request_service.py): _get_request_task_ids_in_progress now loads all tasks for the privacy request in a single query, builds a (collection_address, action_type) → status lookup dict, and yields an awaiting_upstream flag derived from that dict — avoiding extra per-task DB round-trips while correctly handling missing upstream records (returns None, which fails the completion check, same safe default as the model method).
Guard condition (requeue_interrupted_tasks): A pending task without a cache key is now only canceled if awaiting_upstream is False; if it's still waiting on upstreams it is skipped via continue, preventing the false-positive.
Tests (test_requeue_interrupted_tasks.py): Two new integration tests directly cover the two key branches; existing unit-test mocks in test_request_service.py are updated to the new tuple return format.
Minor style issues: A single-character loop variable t is used in a dict comprehension in request_service.py, and both new tests manually delete database records in finally blocks despite the database being automatically cleared between runs.

Confidence Score: 4/5

Safe to merge — the fix correctly addresses the false-positive cancel path with sound logic and good test coverage; only minor style concerns remain.
The upstream-completion logic correctly mirrors the existing model method, the single-query approach is efficient, existing tests are updated, and two new integration tests directly exercise the fixed code paths. Minor deductions for a single-character variable name and unnecessary manual DB teardown in the new tests.
No files require special attention beyond the minor style notes on request_service.py and test_requeue_interrupted_tasks.py.

Important Files Changed

Filename	Overview
src/fides/api/service/privacy_request/request_service.py	Core fix: refactors `_get_request_task_ids_in_progress` from a list of IDs to a generator of `(id, status, awaiting_upstream)` tuples; correctly mirrors `RequestTask.upstream_tasks_complete()` logic using a pre-built status lookup dict; adds the upstream-awaiting guard before the cancel path. Minor style issue with single-character variable `t`.
tests/task/test_requeue_interrupted_tasks.py	Adds two well-structured integration tests covering the pending-awaiting-upstream (skip) and pending-upstream-complete-no-cache-key (cancel) scenarios; manual record deletion in `finally` blocks is unnecessary and inconsistent with automated DB teardown.
tests/ops/service/privacy_request/test_request_service.py	Updates two existing unit-test mocks to return the new tuple format `(id, status, awaiting_upstream)` instead of plain strings; straightforward and correct.
changelog/7525-fix-watchdog-false-positive-pending-tasks.yaml	New changelog entry correctly typed as `Fixed` with a clear description of the bug addressed.

_{Last reviewed commit: 7f09aff}

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

JadeCara · 2026-02-27T23:16:26Z

@greptile please review

erosselli · 2026-03-02T17:48:34Z

changelog/7525-fix-watchdog-false-positive-pending-tasks.yaml

can we name the file 7525-descriptive-slug ?

Done — renamed to 7525-fix-watchdog-false-positive-pending-tasks.yaml.

erosselli · 2026-03-02T17:49:46Z

src/fides/api/service/privacy_request/request_service.py

                                should_requeue = False
                                break

+                            # A pending task that hasn't been dispatched to Celery yet will


nit: this method is already huge, should we maybe wrap the new logic in a method that we call here?

Addressed both this and the query optimization comment below — reworked _get_request_task_ids_in_progress to load full RequestTask objects and pre-compute an awaiting_upstream flag. This moves the pending-task logic out of the main method and eliminates the per-iteration re-query of RequestTask by ID. The upstream_tasks_complete() call still happens per pending task within the helper, but the extra object lookup is gone.

Follow-up in 51c48a6: went further and replaced the per-task upstream_tasks_complete() DB calls with a single column-projection query + in-memory lookup dict. Also switched to a generator to avoid building the full result list, and queries only the 5 columns needed (avoids loading large JSON blobs on RequestTask).

erosselli · 2026-03-02T17:51:58Z

src/fides/api/service/privacy_request/request_service.py

+                                request_task_obj = (
+                                    db.query(RequestTask)
+                                    .filter(RequestTask.id == request_task_id)
+                                    .first()


to avoid the extra query (or more than one if upstream_tasks_complete also runs its own query) in each iteration, don't we want to rework _get_request_task_ids_in_progress -- or write a separate method -- that returns something like task_id, status, has_incomplete_upstream_tasks instead?

Done — see reply above. _get_request_task_ids_in_progress now returns (task_id, status, awaiting_upstream) tuples with upstream completion pre-computed, eliminating the per-iteration RequestTask lookup.

Follow-up in 51c48a6: fully addressed — _get_request_task_ids_in_progress now does a single db.query() with column projection (id, status, collection_address, action_type, upstream_tasks), builds a (collection_address, action_type) → status lookup, and computes upstream completion in Python. Zero per-task DB queries regardless of how many pending tasks.

- Rename changelog to 7525-fix-watchdog-false-positive-pending-tasks.yaml - Move pending-task upstream check into _get_request_task_ids_in_progress - Return (task_id, status, awaiting_upstream) tuples to eliminate per-iteration DB query - Update test mocks to match new 3-tuple signature Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-tasks-2

Replace per-task upstream_tasks_complete() DB calls with a single column-projection query and in-memory lookup. Uses a generator to avoid building the full result list and queries only the 5 columns needed (avoiding large JSON blobs on RequestTask). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…f github.com:ethyca/fides into ENG-2756-fix-watchdog-false-positive-pending-tasks-2

- Move pending+awaiting check before async task DB query to skip unnecessary hit for tasks already known to be legitimately waiting - Parametrize stuck-task cancellation test to cover both scenarios: complete upstream (existing) and root task with no upstream (new) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Documents that the in-memory status_by_address lookup mirrors RequestTask.upstream_tasks_complete() — missing upstream records are treated as incomplete (same safe default). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JadeCara · 2026-03-02T20:54:29Z

@greptile please review

…-tasks-2

greptile-apps · 2026-03-02T21:06:49Z

Additional Comments (2)

src/fides/api/service/privacy_request/request_service.py, line 488
Single-character variable name in dict comprehension

The loop variable t is a single character, which violates the project convention of using full names for variables. Consider a more descriptive name to improve readability.

    status_by_address: dict[tuple[str, str], ExecutionLogStatus] = {
        (task.collection_address, task.action_type): task.status for task in all_tasks
    }

Context Used: Rule from dashboard - Use full names for variables, not 1 to 2 characters (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

tests/task/test_requeue_interrupted_tasks.py, line 647
Manual record deletion in tests

Both new tests manually delete database records in finally blocks (downstream_task.delete(db), upstream_task.delete(db), privacy_request.delete(db)). The database is automatically cleared between test runs, so these explicit deletions are unnecessary. The same pattern also appears at tests/task/test_requeue_interrupted_tasks.py:736-741.

The standard approach in this test file is to create records in pytest fixtures (see the existing in_progress_privacy_request / in_progress_request_task fixtures). Consider extracting the setup into parametrized fixtures so the teardown is handled automatically.

Context Used: Rule from dashboard - Do not manually delete database records in test fixtures or at the end of tests, as the database is ... (source)

- Rename single-char `t` to `task` in status_by_address dict comprehension - Remove manual try/finally cleanup in tests (db is cleared between runs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-tasks-2

erosselli

approved with one more suggestion

erosselli · 2026-03-02T21:21:37Z

src/fides/api/service/privacy_request/request_service.py

+                awaiting_upstream = not all(
+                    status_by_address.get((addr, task.action_type))
+                    in COMPLETED_EXECUTION_LOG_STATUSES
+                    for addr in upstream_addrs
+                )


can't we use any so we short-circuit when we find one , rather than iterating over all and then negating with not ?

Suggested change

awaiting_upstream = not all(

status_by_address.get((addr, task.action_type))

in COMPLETED_EXECUTION_LOG_STATUSES

for addr in upstream_addrs

)

awaiting_upstream = any(

status_by_address.get((addr, task.action_type))

not in COMPLETED_EXECUTION_LOG_STATUSES

for addr in upstream_addrs

)

Clearer intent: "any upstream is incomplete" reads more directly than "not all upstreams are complete". Same short-circuit behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Jade Wibbels added 2 commits February 27, 2026 15:56

add changelog for ENG-2756

7a47bb8

Made-with: Cursor

reduced extra db calls

6f04dfd

JadeCara marked this pull request as ready for review February 27, 2026 23:07

JadeCara requested a review from a team as a code owner February 27, 2026 23:07

JadeCara requested review from erosselli and removed request for a team February 27, 2026 23:07

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

fix tests

4c41245

erosselli reviewed Mar 2, 2026

View reviewed changes

Jade Wibbels and others added 3 commits March 2, 2026 11:52

linting

6b4836c

Merge branch 'main' into ENG-2756-fix-watchdog-false-positive-pending…

bee207c

…-tasks-2

vercel bot deployed to Preview – fides-plus-nightly March 2, 2026 19:28 View deployment

Jade Wibbels and others added 2 commits March 2, 2026 12:58

Merge branch 'ENG-2756-fix-watchdog-false-positive-pending-tasks-2' o…

3eea615

…f github.com:ethyca/fides into ENG-2756-fix-watchdog-false-positive-pending-tasks-2

vercel bot deployed to Preview – fides-plus-nightly March 2, 2026 20:04 View deployment

Jade Wibbels and others added 3 commits March 2, 2026 13:11

linting

1d86ed0

Merge branch 'main' into ENG-2756-fix-watchdog-false-positive-pending…

7f09aff

…-tasks-2

Jade Wibbels and others added 2 commits March 2, 2026 14:16

Style cleanup: rename loop variable and remove redundant test cleanup

75013c9

- Rename single-char `t` to `task` in status_by_address dict comprehension - Remove manual try/finally cleanup in tests (db is cleared between runs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into ENG-2756-fix-watchdog-false-positive-pending…

a1ec9a4

…-tasks-2

erosselli approved these changes Mar 2, 2026

View reviewed changes

vercel bot deployed to Preview – fides-plus-nightly March 2, 2026 21:22 View deployment

Use any() instead of not all() for upstream completion check

5af4ff2

Clearer intent: "any upstream is incomplete" reads more directly than "not all upstreams are complete". Same short-circuit behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JadeCara added this pull request to the merge queue Mar 2, 2026

Merged via the queue into main with commit 859b345 Mar 2, 2026
57 checks passed

JadeCara deleted the ENG-2756-fix-watchdog-false-positive-pending-tasks-2 branch March 2, 2026 22:08

Conversation

JadeCara commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JadeCara commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JadeCara commented Mar 2, 2026

Uh oh!

greptile-apps bot commented Mar 2, 2026

Uh oh!

erosselli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JadeCara commented Feb 27, 2026 •

edited

Loading

vercel bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading