Skip to content

ENG-2737: RequestTask.status refactor#7680

Merged
nreyes-dev merged 9 commits intomainfrom
nreyes/eng-2737
Mar 24, 2026
Merged

ENG-2737: RequestTask.status refactor#7680
nreyes-dev merged 9 commits intomainfrom
nreyes/eng-2737

Conversation

@nreyes-dev
Copy link
Copy Markdown
Contributor

@nreyes-dev nreyes-dev commented Mar 17, 2026

Ticket ENG-2737

Description Of Changes

The frontend was determining task status (error/polling/awaiting processing) by scanning execution logs returned in the verbose privacy request response. With long-lived polling tasks that produce 50+ execution logs, the 50-log limit could cause a completed task to still appear as "Awaiting Polling" in the UI.

This adds a task_status_by_dataset_name field to the verbose response, populated directly from RequestTask.status. The frontend now reads task status from this field instead of inferring it from execution log history.

Code Changes

  • Added task_status_by_dataset_name helper on PrivacyRequest model — queries RequestTask rows and groups their status by dataset name
  • Added task_status_by_dataset_name to PrivacyRequestVerboseResponse schema
  • Populated the new field in _shared_privacy_request_search when verbose=True
  • Updated usePrivacyRequestEventLogs hook to use the new field for hasPolling, hasError, hasAwaitingProcessing instead of scanning execution logs
  • Added task_status_by_dataset_name to the PrivacyRequestEntity TypeScript type
  • Added backend test covering the new field in the verbose response

Steps to Confirm

  1. Run a privacy request with a polling integration and verify the activity timeline shows correct task status indicators
  2. Confirm the verbose GET /api/v1/privacy-request?verbose=True response includes task_status_by_dataset_name with correct statuses per dataset

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Mar 24, 2026 4:51pm
fides-privacy-center Ignored Ignored Mar 24, 2026 4:51pm

Request Review

@nreyes-dev nreyes-dev marked this pull request as ready for review March 18, 2026 17:12
@nreyes-dev nreyes-dev requested review from a team as code owners March 18, 2026 17:12
@nreyes-dev nreyes-dev requested review from adamsachs and speaker-ender and removed request for a team March 18, 2026 17:12
@greptile-apps

This comment was marked as resolved.

@nreyes-dev nreyes-dev changed the title ENG-2737: RequestTask.status refactor WIP ENG-2737: RequestTask.status refactor Mar 18, 2026
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a major concern from the BE, and this seems like it'll provide very useful functionality so i'm good moving forward with this.

but the update here has (i think) basically exposed some longstanding flaws in this endpoint that could start to degrade performance at scale, so i think we should at least get a follow-up queued up to refactor and remove those problems 👍

PrivacyRequest.execution_and_audit_logs_by_dataset = property(
execution_and_audit_logs_by_dataset_name
)
PrivacyRequest.task_status_by_dataset = property(task_status_by_dataset_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think this effectively runs task_status_by_dataset_name for each privacy request record returned in the response, right? since task_status_by_dataset_name has a that's an n+1 query pattern, which is never great. it's true that the line above, with execution_and_audit_logs_by_dataset_name also has a similar problem already existing, so a this problem has been here for a little while...we're just making it a bit worse, now it's a 2n + 1 query pattern :)

in general, overwriting the class-level property like we're doing here, e.g. PrivacyRequest.task_status_by_dataset, is strange and potentially not thread safe. it just does not seem like the right way to do this.

BUT again, none of that is new with your PR here, you're just following the existing pattern which has some (rather significant) flaws. it does exacerbate those flaws a bit. i don't think this is enough of a reason not to move forward with this, but i think we should probably make a note (and a followup ticket) to address the concerns that have been lurking here, because they will likely come back to bite us if we don't:

  1. not monkeypatch the PrivacyRequest class properties and avoid thread safety issues
  2. remove the n + 1 query pattern(s) from this endpoint

fwiw, here's the roundup from Claude as i dug into this a bit:

Is this N+1?

Yes, it is. Here's the execution flow:

  1. Line 488: paginate(query, params) executes the main query — loads N PrivacyRequest objects
  2. Lines 490-501: Loop over items, setting instance attributes (identity, resume_endpoint, etc.)
  3. Line 503: return paginated — FastAPI serializes using PrivacyRequestVerboseResponse
  4. During serialization: Pydantic reads instance.task_status_by_dataset, which triggers the property getter, which runs a DB query per instance

So for a page of N results: 1 + 2N queries (1 paginated fetch + N for execution_and_audit_logs_by_dataset + N for the new task_status_by_dataset). Classic N+1, and the PR adds a second N+1 on top of the existing
one.

On the class-level property monkey-patching mechanism

This pattern is worth scrutinizing. What's happening:

PrivacyRequest.task_status_by_dataset = property(task_status_by_dataset_name)

This mutates the class itself, not individual instances. A few concerns:

Thread safety. If two concurrent requests hit this endpoint — one with verbose=True, one without — they race on setting the class property. One could overwrite the other's assignment between the property set and
serialization. This is a pre-existing issue (same pattern for execution_and_audit_logs_by_dataset), but every new property added compounds the risk.

Invisible data flow. The DB query that populates this field fires lazily during Pydantic serialization, far from where the "data fetching" logically lives. You can't see it by reading the route handler — you have
to know that the property descriptor triggers a query. This makes the session lifecycle hard to reason about (if the session closes before serialization, you get DetachedInstanceError).

The type annotation is misleading. On the model:
task_status_by_dataset: Optional[property] = None
The runtime type is actually Optional[Dict[str, str]] — property here is the Python descriptor type, not the return type. The annotation exists only to give Pydantic/SQLAlchemy a class-level default of None that
the property descriptor later overwrites.

What a cleaner approach would look like. The loop at lines 490-501 already iterates over every item. You could eagerly compute and set instance attributes there:

if filters.verbose:
for item in paginated.items:
item.task_status_by_dataset = task_status_by_dataset_name(item)

This would be instance-level (no class mutation), explicit (visible data fetching in the route), and wouldn't change the query count (still N queries). It would also eliminate the thread-safety issue. The same
could be done for execution_and_audit_logs_by_dataset.

Or, to actually fix the N+1, both fields could be batch-computed with a single query for all privacy request IDs in the page, then assigned per-instance. But that's a bigger refactor.

Bottom line: the PR correctly follows the existing pattern, so this isn't a criticism of the PR itself — but the pattern is fragile enough that it's worth flagging as tech debt, especially now that there are two
properties using it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I couldn't unsee the ugliness once I saw it so, I ended up fixing all this. Take a look and let me know what you think. I fixed both execution_and_audit_logs_by_dataset and task_status_by_dataset: now we batch into single queries and use eager instance-level assignment instead of class-level property descriptors.

@nreyes-dev nreyes-dev requested a review from adamsachs March 24, 2026 16:57
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, your update makes sense/looks good to me from taking a quick look through it! thanks for making these updates.

i think it should be good to go. i'd recommend a bit of manual testing if you haven't already, just because now the changes are a bit broader 👍

@nreyes-dev
Copy link
Copy Markdown
Contributor Author

nice, your update makes sense/looks good to me from taking a quick look through it! thanks for making these updates.

i think it should be good to go. i'd recommend a bit of manual testing if you haven't already, just because now the changes are a bit broader 👍

I did test it manually, although my local environment doesn't have a lot of privacy requests with different task statuses etc, so... I don't know. Maybe we can also test this later in a test cloud env?

@nreyes-dev nreyes-dev added this pull request to the merge queue Mar 24, 2026
Merged via the queue into main with commit c40fcdf Mar 24, 2026
61 checks passed
@nreyes-dev nreyes-dev deleted the nreyes/eng-2737 branch March 24, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants