Traversal optimizations by galvana · Pull Request #7244 · ethyca/fides

galvana · 2026-01-17T01:16:46Z

Description Of Changes

This PR significantly improves the performance of graph traversal operations by replacing O(N²) algorithms with O(N+E) alternatives.

Performance Improvements

Optimization	Before	After	Improvement
`traverse()` in traversal.py	O(N²)	O(N+E)	~100x faster at 3000 nodes
`compute_all_descendants()`	O(N²)	O(N+E)	~100x faster at 3000 nodes
Dataset-level `after` deps	O(N²)	O(N)	Linear lookup

Key Changes:

traverse() in traversal.py - O(N²) → O(N+E)
- Legacy used MatchingQueue.pop_first_match() which scans the queue on each iteration
- New version uses Kahn's algorithm with in-degree tracking for optimal complexity
compute_all_descendants() in create_request_tasks.py - O(N²) → O(N+E)
- Was calling networkx.descendants() for each node in a loop
- Now pre-computes all descendants in a single reverse topological pass
Dataset-level after dependencies - O(N²) → O(N)
- Was scanning all nodes to find collections in a dataset
- Now uses pre-built _collections_by_dataset index for O(1) lookups

Performance comparison

Collections	Legacy O(N²)	Optimized O(N+E)	Speedup
100	~12 ms	~1 ms	12x
1,000	~1.1 s	~7 ms	161x
5,000	~27 s	~30 ms	900x
10,000	~2.2 min	~80 ms	1,680x
25,000	~14 min*	~500 ms	1,688x
50,000	~57 min*	~1 s	3,415x

* Extrapolated from O(N²) growth curve

Code Changes

Added optimized traverse() method using Kahn's algorithm with in-degree tracking
Added compute_all_descendants() function for O(N+E) descendant computation
Pre-indexed edges by node address (edges_by_node) for O(1) edge lookups
Pre-built dataset-to-collections index (_collections_by_dataset) for O(1) lookups
Added skip_verification parameter to BaseTraversal to avoid redundant traversal during reachability checks
Retained _traverse_legacy() method for test comparison purposes

Tests

New test file test_traversal_optimization_comparison.py validates that the optimized algorithm produces identical results to the legacy implementation:

Test Class	Purpose
`TestTraversalComparison`	Validates equivalence on deterministic graph topologies (linear chains, star graphs, diamond graphs, graphs with `after` dependencies)
`TestRandomGraphEquivalence`	Validates equivalence on randomly generated graphs with varying densities and sizes
`TestTraversalErrorEquivalence`	Ensures both algorithms raise the same errors for invalid graphs (disconnected nodes, unreachable collections)

Steps to Confirm

Existing tests should pass.

Pre-Merge Checklist

vercel · 2026-01-17T01:16:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ready	Preview, Comment	Feb 11, 2026 0:46am

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
fides-privacy-center	Ignored		Feb 11, 2026 0:46am

greptile-apps · 2026-01-24T07:05:43Z

Greptile Overview

Greptile Summary

This PR implements significant performance optimizations for graph traversal operations, achieving ~100x speedup at 3000 nodes by replacing O(N²) algorithms with O(N+E) alternatives.

Key optimizations:

traverse() method: Replaced MatchingQueue.pop_first_match() scanning with Kahn's algorithm variant using in-degree tracking for O(N+E) complexity
compute_all_descendants(): Added function to pre-compute all descendants in single reverse topological pass instead of calling networkx.descendants() per node
Dataset indexing: Pre-built _collections_by_dataset index for O(1) dataset-to-collections lookups
Edge indexing: Pre-indexed edges by node address (edges_by_node) for O(1) edge lookups
Verification optimization: Added skip_verification parameter to avoid redundant traversal during reachability checks

Code quality:

Comprehensive test suite validates equivalence between optimized and legacy implementations across multiple graph topologies
Legacy _traverse_legacy() method retained for reference and comparison
Clear documentation of algorithmic approach and complexity analysis

Confidence Score: 4/5

This PR is safe to merge with minimal risk - well-tested performance optimization
Score reflects thorough testing approach and algorithmic correctness. The optimized implementation is validated against the legacy implementation across multiple test scenarios. The only minor style issue is a single-character variable name in the legacy code (which is intentionally kept for reference). The changes are well-documented with clear complexity analysis.
No files require special attention - the implementation is solid and well-tested

Important Files Changed

Filename	Overview
src/fides/api/graph/traversal.py	Replaced O(N²) traversal algorithm with O(N+E) Kahn's algorithm variant using in-degree tracking; added pre-indexing optimizations for edges and dataset collections; retained legacy implementation for comparison
src/fides/api/task/create_request_tasks.py	Added `compute_all_descendants()` function for O(N+E) descendant computation; updated task creation functions to pre-compute descendants once instead of calling networkx.descendants per node
src/fides/api/graph/node_filters.py	Updated `OptionalIdentityFilter` to use `skip_verification` parameter and explicit `traverse()` call to avoid redundant verification overhead
tests/ops/graph/test_traversal_optimization_comparison.py	Comprehensive test suite comparing optimized and legacy traversal algorithms across deterministic topologies, random graphs, and error cases

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-24T07:05:51Z

Additional Comments (1)

src/fides/api/graph/traversal.py
single-character variable name n - use descriptive name like current_node

Context Used: Rule from dashboard - Use full names for variables, not 1 to 2 characters (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

adamsachs

nice work! this looks great, and it looks like we can get some very serious optimizations for very little (if any) trade-off. the main thing to look out for, i think, would be a significant increase in space complexity, i.e. is there any chance we'd significantly increase our memory footprint with massive task graphs?

i left a few relatively minor comments along the way that'd be nice to address, mainly re: code cleanup. there does still seem to be a test failing that's due to a change here, so we'll want to address that before merging.

sorry it took me so long to get to this - i wanted to give it the time it deserved!

src/fides/api/graph/traversal.py

src/fides/api/task/create_request_tasks.py

adamsachs · 2026-02-10T03:02:23Z

src/fides/api/graph/traversal.py

+        take action on completed traversal.
+        """
+        total_nodes = len(self.traversal_node_dict)
+        logger.debug("Starting traversal of {} nodes", total_nodes)


OK, i think this all makes sense - i definitely understand the algorithm after staring at the code for a little while, though i can't say i've fully thought through all the edge cases...the test coverage does make me feel better! it's a nice algorithm :)

to help make the traversal logic a bit more tractable - have you considered breaking it up into some helper functions? it just seems like we'll wantto be able to take a look at the function and spend ~a minute to understand the high-level steps/procedure of the algorithm. as it's written, that's pretty hard to do - the closest i can get is just reading the comments. it'd be much easier to reason about, i think, if some of these sub-operations/steps were just broken out into a helper so that you don't have the cognitive overload as you look through this main function.

i know the 'legacy' version of the function didn't break anything up, but that doesn't seem like something we want to emulate - i don't think it's ever good to be disabling the lint check for too many locals...is there a reason we want to keep the traversal function so monolithic here if we're completely overhauling it?

I think I'll leave this for a follow up ticket. If we can get this in for the upcoming release it would be a huge win! We can clean it up later.

k, sounds good!

somewhat relatedly...if we're going to keep the legacy traverse function in the code base for the time being, i wonder if it's worth just keeping it as a fallback to be used in actual execution -- hidden behind a config. this would only be temporary, for a few releases. just to derisk if there is any regression with the new algorithm, it'd be easy to switch back without requiring a patch release.

thoughts? maybe it's overkill, if you're fully confident in the new approach!

It doesn't hurt to add a setting. I want to use ConfigProxy but I know there's a weird bug there that we haven't been able to reproduce. I'll just use the regular FidesConfig

adamsachs · 2026-02-10T03:05:49Z

tests/ops/graph/test_traversal_optimization_comparison.py

+    return True, "Results are equivalent"
+
+
+class TestTraversalComparison:


nit: probably able to dedupe these tests with a single parameterized test?

tests/ops/graph/test_traversal_optimization_comparison.py

Traversal optimizations

364ae4b

galvana added 3 commits January 16, 2026 17:40

N² to N

11c4ded

Adding tests

e93cd7a

Cleaning up tests

f02bbbe

galvana marked this pull request as ready for review January 24, 2026 07:03

galvana requested a review from a team as a code owner January 24, 2026 07:03

galvana requested review from adamsachs and removed request for a team January 24, 2026 07:03

Adding change log

a658673

greptile-apps bot reviewed Jan 24, 2026

View reviewed changes

galvana added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Jan 24, 2026

Merge branch 'main' into traversal-optimizations

9a1c013

vercel bot deployed to Preview – fides-plus-nightly January 26, 2026 22:15 View deployment

Merge branch 'main' into traversal-optimizations

2fa342e

vercel bot deployed to Preview – fides-plus-nightly January 29, 2026 01:01 View deployment

Merge branch 'main' into traversal-optimizations

98e4b24

vercel bot deployed to Preview – fides-plus-nightly February 5, 2026 17:24 View deployment

Merge branch 'main' into traversal-optimizations

d7ea344

vercel bot deployed to Preview – fides-plus-nightly February 6, 2026 19:22 View deployment

adamsachs approved these changes Feb 10, 2026

View reviewed changes

Merge branch 'main' into traversal-optimizations

35d6088

vercel bot deployed to Preview – fides-plus-nightly February 10, 2026 03:41 View deployment

Adrian Galvan and others added 2 commits February 10, 2026 16:36

Adding setting to rollback to legacy traversal

f53bb93

Merge branch 'main' into traversal-optimizations

9a88bb7

galvana enabled auto-merge February 11, 2026 00:41

vercel bot deployed to Preview – fides-plus-nightly February 11, 2026 00:46 View deployment

galvana added this pull request to the merge queue Feb 11, 2026

Merged via the queue into main with commit 868acf5 Feb 11, 2026
52 of 55 checks passed

galvana deleted the traversal-optimizations branch February 11, 2026 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traversal optimizations#7244

Traversal optimizations#7244
galvana merged 12 commits intomainfrom
traversal-optimizations

galvana commented Jan 17, 2026 •

edited by atlassian bot

Loading

Uh oh!

vercel bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

adamsachs left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamsachs Feb 10, 2026

Uh oh!

galvana Feb 10, 2026

Uh oh!

adamsachs Feb 10, 2026

Uh oh!

galvana Feb 11, 2026

Uh oh!

adamsachs Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return True, "Results are equivalent"


		class TestTraversalComparison:

Conversation

galvana commented Jan 17, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Performance Improvements

Performance comparison

Code Changes

Tests

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Jan 24, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 24, 2026

Uh oh!

adamsachs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamsachs Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

galvana Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

galvana Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galvana commented Jan 17, 2026 •

edited by atlassian bot

Loading

vercel bot commented Jan 17, 2026 •

edited

Loading