Skip to content

Add SQLAlchemy deferred loading#7249

Merged
JadeCara merged 5 commits intomainfrom
ENG-2126-deferred-loading
Jan 26, 2026
Merged

Add SQLAlchemy deferred loading#7249
JadeCara merged 5 commits intomainfrom
ENG-2126-deferred-loading

Conversation

@JadeCara
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara commented Jan 26, 2026

Ticket ENG-2126

Description Of Changes

🎯 Out of memory errors on workers when processing DSRs tasks

One area of opportunity identified was loading large data while querying tasks. This follows the pattern of defering loading of large columns for Privacy Request objects. This PR updates the following:

  • query_with_deferred_data() method for lazy loading
  • Applied to cached_erasure_results_by_collection_key()
  • Applied to cached_consent_results_by_collection_key()
  • Applied to scheduler and service layer queries

Code Changes

  • src/fides/api/models/privacy_request/privacy_request.py - Updated to use query with deferred data
  • src/fides/api/models/privacy_request/request_task.py - Added query_with_deferred_data
  • src/fides/api/service/privacy_request/request_service.py - Updated to use query with deferred data
  • src/fides/api/task/scheduler_utils.py - Updated to use query with deferred data

Steps to Confirm

  1. Start FidesPlus pointed at this endpoint
  2. Run 1 or more access requests, run 1 or more erasure requests.
  3. All tests should pass.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Jan 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Review Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Jan 26, 2026 10:23pm
fides-privacy-center Ignored Ignored Jan 26, 2026 10:23pm

Request Review

@JadeCara JadeCara marked this pull request as ready for review January 26, 2026 20:51
@JadeCara JadeCara requested a review from a team as a code owner January 26, 2026 20:51
@JadeCara JadeCara requested review from galvana and thabofletcher and removed request for a team January 26, 2026 20:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

This PR adds SQLAlchemy deferred loading to prevent out-of-memory (OOM) errors when processing DSR tasks. The changes implement a new query_with_deferred_data() method on RequestTask that defers loading of large JSON columns (_access_data, _data_for_erasures, collection, traversal_details) when only metadata is needed.

Key changes:

  • Added RequestTask.query_with_deferred_data() with configurable deferred loading for access and erasure data columns
  • Applied deferred loading to methods that only need task metadata: get_pending_downstream_tasks(), upstream_tasks_objects(), cached_erasure_results_by_collection_key(), cached_consent_results_by_collection_key(), and requeue_polling_tasks()
  • Refactored use_dsr_3_0_scheduler() to check for RequestTasks first (cheap SQL count) before checking cache, avoiding the expensive get_raw_access_results() call that was causing 622MB+ memory usage per call

The implementation follows established patterns from PrivacyRequest.query_with_deferred_data() and correctly applies deferred loading only where metadata queries are performed, not where actual data payloads are needed.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk, addressing critical OOM issues through deferred loading
  • Score reflects solid implementation following established patterns and addressing a critical performance issue. Minor concern about lack of test coverage for the new deferred loading functionality, but the changes are straightforward and well-documented. The refactoring of use_dsr_3_0_scheduler() is a significant optimization that prevents OOM errors on the hot path.
  • Pay close attention to src/fides/api/task/scheduler_utils.py to ensure the cache key prefix logic is correct

Important Files Changed

Filename Overview
src/fides/api/models/privacy_request/request_task.py Added query_with_deferred_data() method with comprehensive docstring; applied to get_pending_downstream_tasks() and upstream_tasks_objects()
src/fides/api/models/privacy_request/privacy_request.py Applied deferred loading to get_raw_masking_counts() and get_consent_results() to avoid loading large JSON columns
src/fides/api/task/scheduler_utils.py Refactored to check RequestTasks first (fast path), then check cache for DSR 2.0 data instead of loading access results (prevents OOM)

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Co-authored-by: Adrian Galvan <adrian@ethyca.com>
@JadeCara JadeCara added this pull request to the merge queue Jan 26, 2026
Merged via the queue into main with commit 1a30b6c Jan 26, 2026
55 checks passed
@JadeCara JadeCara deleted the ENG-2126-deferred-loading branch January 26, 2026 23:25
JadeCara added a commit that referenced this pull request Jan 29, 2026
Co-authored-by: Jade Wibbels <jade@ethyca.com>
Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants