Skip to content

ENG-2293 - Add is_leaf column to StagedResource and backfill#7263

Merged
vcruces merged 22 commits intomainfrom
ENG-2293
Feb 10, 2026
Merged

ENG-2293 - Add is_leaf column to StagedResource and backfill#7263
vcruces merged 22 commits intomainfrom
ENG-2293

Conversation

@vcruces
Copy link
Copy Markdown
Contributor

@vcruces vcruces commented Jan 28, 2026

Ticket ENG-2293

Description Of Changes

Adds an is_leaf column to the StagedResource model to efficiently identify leaf fields (fields with no children and non-object data types) in detection/discovery results. This column enables faster queries by avoiding expensive runtime calculations of resource_type = 'Field' AND children = [].
The migration includes conditional logic to create the index directly for smaller tables (<1M rows) or defer it to post-upgrade index creation for larger tables to avoid blocking migrations.
A new post-upgrade backfill system is introduced to populate the is_leaf column for existing data. The backfill is designed to be:

  • Idempotent: Safe to run multiple times
  • Resumable: Can be stopped and restarted, will pick up where it left off
  • Non-blocking: Uses small batches with delays and SKIP LOCKED to minimize database impact
  • Error-resilient: Retries transient errors, tracks failures, fails gracefully
    A new admin API endpoint (POST /admin/backfill) is also added for manual triggering of backfills with configurable batch size and delay parameters.

Code Changes

  • Added is_leaf column to StagedResource model with composite index on (monitor_config_id, is_leaf)
  • Added Alembic migration d05acec55c64 to add the column with conditional index creation
  • Added post_upgrade_backfill.py module with batched backfill infrastructure including Redis locking, retry logic, and progress tracking
  • Added POST /admin/backfill endpoint with BACKFILL_EXEC scope for manual backfill triggering
  • Added entry to post_upgrade_index_creation.py for deferred index creation on large tables
  • Added unit tests for backfill logic and admin endpoint

Steps to Confirm

  • Run migrations and verify is_leaf column is added to stagedresource table
  • Verify existing staged resources with is_leaf = NULL get backfilled on application startup
  • Test manual backfill via POST /api/v1/admin/backfill endpoint (requires backfill:exec scope)
  • Verify backfill correctly sets is_leaf = true for fields with no children and non-object data types
  • Verify backfill correctly sets is_leaf = false for databases, schemas, tables, fields with children, or object-type fields

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Jan 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 10, 2026 2:50pm
fides-privacy-center Ignored Ignored Feb 10, 2026 2:50pm

Request Review

@vcruces vcruces force-pushed the ENG-2293 branch 2 times, most recently from 964a8b0 to 46c5634 Compare January 28, 2026 20:36
@vcruces vcruces marked this pull request as ready for review January 28, 2026 22:10
@vcruces vcruces requested a review from a team as a code owner January 28, 2026 22:10
@vcruces vcruces requested review from adamsachs and thabofletcher and removed request for a team and thabofletcher January 28, 2026 22:10
@vcruces vcruces added the do not merge Please don't merge yet, bad things will happen if you do label Jan 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

This PR adds an is_leaf column to the StagedResource model to efficiently identify leaf fields in detection/discovery results, replacing expensive runtime calculations. The implementation includes a sophisticated post-upgrade backfill system designed for production safety.

Key Changes:

  • Database Schema: Adds nullable is_leaf boolean column to stagedresource table with a partial index ix_stagedresource_monitor_leaf_status_urn on (monitor_config_id, is_leaf, diff_status, urn) WHERE is_leaf IS NOT NULL
  • Migration Strategy: Conditional index creation based on table size (<1M rows creates index immediately, >=1M defers to post-upgrade), avoiding blocking migrations on large tables
  • Backfill Infrastructure: New reusable backfill system with Redis locking, exponential retry with backoff, batch processing with delays (default 5000 rows/batch, 1s delay), progress tracking, and backfill_history table for idempotency
  • API: New POST /admin/backfill endpoint with BACKFILL_EXEC scope for manual triggering, GET /admin/backfill for status monitoring
  • Semantic Design: null indicates "not applicable" for non-datastore monitors (not just temporary backfill state), allowing partial index optimization

Architecture Highlights:

  • Batched updates with SKIP LOCKED to minimize lock contention
  • Automatic startup backfill via APScheduler with replace_existing=True preventing duplicate jobs
  • Proper lock release in finally blocks throughout
  • Comprehensive test coverage including edge cases for retry logic, consecutive failures, and lock management

Backfill Logic: Sets is_leaf=True for Field resources with no children AND non-object data types; is_leaf=False for all other datastore resources (Database, Schema, Table, Fields with children/object types); null for non-datastore monitor types.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-architected with comprehensive error handling, proper locking mechanisms, extensive test coverage, and follows database migration best practices. The backfill system is idempotent, resumable, and designed to minimize production impact with batching and delays. One previous comment about the placeholder down_revision has been addressed - it now properly references the actual revision a1b2c3d4e5f7.
  • No files require special attention

Important Files Changed

Filename Overview
src/fides/api/alembic/migrations/versions/xx_2026_02_03_2029_841e0b148993_add_backfill_history.py Adds backfill_history table with backfill_name primary key and completed_at timestamp to track completed backfills
src/fides/api/alembic/migrations/versions/xx_2026_02_03_2033_81d2400b16ab_add_is_leaf_to_stagedresource.py Adds nullable is_leaf boolean column to stagedresource table with conditional index creation based on table size (<1M rows), includes proper downgrade logic that clears backfill tracking
src/fides/api/migrations/post_upgrade_backfill.py Core backfill orchestration module with Redis locking, scheduled task initialization with replace_existing=True (line 154), and proper lock management in try-finally blocks
src/fides/api/migrations/backfill_scripts/backfill_stagedresource_is_leaf.py Implements is_leaf backfill logic using batched decorator, correctly setting is_leaf=True for Fields with no children and non-object data types using SKIP LOCKED
src/fides/api/migrations/backfill_scripts/utils.py Robust backfill infrastructure with retry logic, exponential backoff, Redis locking, progress tracking, and @batched_backfill decorator for resumable operations
src/fides/api/api/v1/endpoints/admin.py Adds POST and GET /admin/backfill endpoints with BACKFILL_EXEC scope, proper lock acquisition, background task execution, and status reporting
tests/ctl/api/test_admin.py Comprehensive tests for backfill endpoints covering lock acquisition, conflict handling, parameter validation, and status reporting, but mock lock could be improved

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@vcruces vcruces force-pushed the ENG-2293 branch 4 times, most recently from 35940be to b52ea74 Compare February 2, 2026 16:01
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! this generally looks good to me, and thank you for establishing a general framework for these 'backfill' scripts :)

i don't see anything problematic with the particular backfill script here, and the framework generally seem functionally solid: i like the API hooks for visibility/manual intervention, and the batching + retry logic, etc.

my most substantive question is around how we imagine future backfill scripts slotting in, -- what's the lifecycle of the scripts, do they get deleted from the codebase?

other than that, i called out a few minor points. let me know what you think!

raise


def backfill_is_leaf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like what you've done to abstract away the particular backfill task from a generic backfill framework. i think one thing that could help formalize that framework a bit would be to pull out the particular backfill task (and its sub-functions) into its own module/file, separate from the generic 'backfill' framework module you've got here. what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea! I’ll move it to another file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the original file into 3 files (added a utils module), and moved shared logic into a decorator for easier reuse across backfills. I also added a README explaining how to use it -> a740f86

Comment on lines +350 to +354
# Backfill is_leaf column (added in migration d05acec55c64)
results.append(backfill_is_leaf(db, batch_size, batch_delay_seconds))

# Add future backfills here:
# results.append(backfill_some_other_column(db, batch_size, batch_delay_seconds))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK nice, this seems pretty easy to slot in more backfill scripts as needed!

generally, do you imagine we'll be removing the backfill_is_leaf after a few releases, when we're sure that all of our clients have completed it? or do we just leave it in there for perpetuity, and rely on it basically being skipped over once it's been completed?

i do wonder whether it's worth establishing a very lightweight db table to keep track of the backfills, just so we can definitively say 'backfill x has been completed on this fides instance' -- basically to persist the BackfillResults. i get a bit worried about us relying on the logical check (get_pending_is_leaf_count) if e.g. we ever change some of the application logic and get_pending_is_leaf_count is no longer valid -- we have to remember to update this backfill script, which isn't really part of the 'normall application code...

i don't know the right answer here, just posing the question to make sure we get this right and don't create too much of a maintenance burden going forward.

Copy link
Copy Markdown
Contributor Author

@vcruces vcruces Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially imagining that we’d remove these backfills as we’re able to, and in this specific case, create a follow-up ticket to remove the script once we’re confident all clients are covered. As you said, as long as the logic doesn’t change, having the script around doesn’t really hurt, but at some point it probably should be disabled or removed.

I do like the idea of tracking backfill completion in the database. It’s definitely safer and avoids relying on application logic that could change over time. That said, I also wonder about the long-term maintenance of an extra table like that (and the risk of it becoming something we forget about).

I’m going to try implementing it and see how much effort it actually is, I don’t expect it to take too long (Cursor helping 😄). I’ll report back with what I find and we can decide from there.

One additional consideration: if we ever roll back a column related to a backfill, we’d also need to remove the corresponding row from the new table

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the database table solution in this commit-> 1096ee8

BACKFILL_LOCK_TTL = 300 # 5 minutes TTL, refreshed every 10 batches


def acquire_backfill_lock() -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're not using our redis_lock constructs in lock.py? or at least the native redis.lock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It initially felt more practical not to use redis_lock since I didn’t want to pass the lock around as a parameter or define it as a global. That said, using redis.lock does give us lock.owned(), which is an advantage.

I’ve updated it to use redis.lock in this commit -> 04772a9

@vcruces vcruces force-pushed the ENG-2293 branch 3 times, most recently from 6b1fd22 to 3fd4eda Compare February 5, 2026 14:06
@vcruces
Copy link
Copy Markdown
Contributor Author

vcruces commented Feb 6, 2026

@greptileai

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@vcruces vcruces force-pushed the ENG-2293 branch 4 times, most recently from 2ac7943 to a980566 Compare February 6, 2026 22:25
@vcruces
Copy link
Copy Markdown
Contributor Author

vcruces commented Feb 6, 2026

@greptileai

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work here! thanks for the care you took in addressing all of my feedback. the batched_backfill decorator is really clean, that was a great improvement to help formalize this as a framework we can build on! 👌

raise


@batched_backfill(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice to have this as a decorator 👏

@vcruces vcruces added this pull request to the merge queue Feb 10, 2026
@vcruces vcruces removed the do not merge Please don't merge yet, bad things will happen if you do label Feb 10, 2026
Merged via the queue into main with commit 3b22f14 Feb 10, 2026
54 of 55 checks passed
@vcruces vcruces deleted the ENG-2293 branch February 10, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants