Conversation
…te loop When update_existing_connection_configs_for_connector_type processes multiple connection configs of the same type, each call to upsert_dataset_config_from_template was reading and then writing SaasTemplateDataset mid-loop. The first iteration's write would set the stored baseline to the new template, so subsequent iterations used the new template as their merge baseline instead of the old one — causing the 3-way merge to misidentify unchanged customer fields as customer modifications and silently skip applying template updates on those configs. Fix reads the stored baseline once before the loop and passes it as an optional snapshot parameter through update_saas_instance → upsert_dataset_config_from_template. All iterations now merge against the same correct pre-update baseline. The write-back on each iteration is idempotent (all write the same value) so it is left in place. Affects any deployment with multiple connection configs of the same SaaS connector type, regardless of whether they belong to the same system or different systems. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCapture a pre-update snapshot of Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler
participant ConnSvc as ConnectionService
participant Updater as saas_config_updater.py
participant DB as Database/SaasTemplateDataset
participant Merger as merge_datasets
Scheduler->>ConnSvc: update_existing_connection_configs_for_connector_type(connector_type, template)
ConnSvc->>DB: read SaasTemplateDataset (snapshot)
DB-->>ConnSvc: stored_dataset_json
ConnSvc->>ConnSvc: iterate configs -> call update_saas_instance(..., stored_dataset_json)
ConnSvc->>Updater: update_saas_instance(..., stored_dataset_json)
Updater->>Merger: upsert_dataset_config_from_template(..., stored_dataset_json)
Merger-->>Updater: merged dataset (uses provided stored_dataset_json)
Updater->>DB: persist updated DatasetConfig
DB-->>ConnSvc: ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…loop update_saas_configs (run on every server startup) had the same mid-loop SaasTemplateDataset write corruption as update_existing_connection_configs_for_ connector_type. Snapshot the stored baseline before the per-connector-type loop and pass it through to update_saas_instance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes a real correctness bug in the SaaS template batch-update path: when multiple connection configs of the same connector type are updated together, Key issues found:
Confidence Score: 3/5
Important Files Changed
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/service/test_connection_service.py (1)
1680-1794: Good test coverage for the baseline snapshot fix.The test correctly validates that both connection configs receive the template update when processed in a batch. This directly exercises the fix for the baseline corruption issue.
Consider adding cleanup for created test resources.
Other tests in this file (e.g.,
test_upsert_dataset_with_mergeat line 926) explicitly clean up created resources. This test creates twoConnectionConfigand twoDatasetConfigobjects but doesn't delete them, which could leave test data in the database and potentially affect other tests.🧹 Suggested cleanup addition
assert "products" in collection_names, ( f"Template update did not land on {instance_key}: " f"collections={collection_names}" ) + + # Clean up + for suffix in ("alpha", "beta"): + instance_key = f"multi_config_{suffix}" + connection_config = ConnectionConfig.get_by( + db, field="key", value=f"multi_config_connection_{suffix}" + ) + if connection_config: + connection_config.delete(db)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/service/test_connection_service.py` around lines 1680 - 1794, This test creates two ConnectionConfig and two DatasetConfig records (via ConnectionConfig.create and DatasetConfig.create_or_update) and a SaasTemplateDataset (via SaasTemplateDataset.get_or_create) but doesn't remove them; add cleanup at the end of test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs to delete the created resources by filtering for the instance fides_key values ("multi_config_alpha", "multi_config_beta") and the connector_type and removing the matching DatasetConfig and ConnectionConfig records and the SaasTemplateDataset so subsequent tests are not affected.src/fides/service/connection/connection_service.py (1)
847-859: Add.lower()to snapshot lookup for consistency with case-insensitive connection config query.The connection config query uses
connector_type.lower()(line 837), but the snapshot retrieval usesconnector_typedirectly (line 852). AlthoughSaaSConfigenforces lowercase normalization on the type field, adding.lower()here improves consistency with the defensive case-insensitive pattern already established for the connection config query.🔧 Suggested fix
stored_dataset_template = SaasTemplateDataset.get_by( - self.db, field="connection_type", value=connector_type + self.db, field="connection_type", value=connector_type.lower() )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/service/connection/connection_service.py` around lines 847 - 859, The snapshot lookup should use a lowercased connector_type to match the case-insensitive query earlier: change the SaasTemplateDataset.get_by call that assigns stored_dataset_template to pass connector_type.lower() instead of connector_type, and keep the subsequent stored_dataset_json_snapshot logic unchanged so the snapshot remains a dict when present; update references to connector_type in that get_by call only (SaasTemplateDataset.get_by, stored_dataset_template, stored_dataset_json_snapshot).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 847-859: The snapshot lookup should use a lowercased
connector_type to match the case-insensitive query earlier: change the
SaasTemplateDataset.get_by call that assigns stored_dataset_template to pass
connector_type.lower() instead of connector_type, and keep the subsequent
stored_dataset_json_snapshot logic unchanged so the snapshot remains a dict when
present; update references to connector_type in that get_by call only
(SaasTemplateDataset.get_by, stored_dataset_template,
stored_dataset_json_snapshot).
In `@tests/service/test_connection_service.py`:
- Around line 1680-1794: This test creates two ConnectionConfig and two
DatasetConfig records (via ConnectionConfig.create and
DatasetConfig.create_or_update) and a SaasTemplateDataset (via
SaasTemplateDataset.get_or_create) but doesn't remove them; add cleanup at the
end of
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs
to delete the created resources by filtering for the instance fides_key values
("multi_config_alpha", "multi_config_beta") and the connector_type and removing
the matching DatasetConfig and ConnectionConfig records and the
SaasTemplateDataset so subsequent tests are not affected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bdd5a19d-0fc1-417a-868e-d288488c4113
📒 Files selected for processing (3)
src/fides/api/util/saas_config_updater.pysrc/fides/service/connection/connection_service.pytests/service/test_connection_service.py
- Make test actually exercise the bug by introducing a new "orders" collection in v2 that is absent from v1, so the assertion is not trivially satisfied - Add test cleanup for created ConnectionConfig, DatasetConfig, and SaasTemplateDataset resources - Use connector_type.lower() in snapshot lookup for consistency with case-insensitive connection config query Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/fides/service/connection/connection_service.py (1)
701-824:⚠️ Potential issue | 🟠 MajorBaseline can still advance on partial batch failures.
update_existing_connection_configs_for_connector_typecontinues after exceptions (Line 877), butupsert_dataset_config_from_templatestill persistsSaasTemplateDataseton each successful iteration (Line 812). If one config fails mid-batch, the shared baseline is already moved forward, and retrying failed configs can merge against the wrong baseline.🛠 Suggested fix direction
diff --git a/src/fides/service/connection/connection_service.py b/src/fides/service/connection/connection_service.py @@ - def upsert_dataset_config_from_template( + def upsert_dataset_config_from_template( self, connection_config: ConnectionConfig, template: ConnectorTemplate, template_values: SaasConnectionTemplateValues, stored_dataset_json: Optional[dict] = None, + persist_template_baseline: bool = True, ) -> DatasetConfig: @@ - if stored_dataset_template: + if persist_template_baseline and stored_dataset_template: stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)[ "dataset" ][0] stored_dataset_template.save(db=self.db) @@ - def update_saas_instance( + def update_saas_instance( self, connection_config: ConnectionConfig, template: ConnectorTemplate, saas_config_instance: SaaSConfig, stored_dataset_json: Optional[dict] = None, + persist_template_baseline: bool = True, ) -> None: @@ self.upsert_dataset_config_from_template( connection_config, template, template_vals, stored_dataset_json=stored_dataset_json, + persist_template_baseline=persist_template_baseline, ) @@ - for connection_config in connection_configs: + all_updates_succeeded = True + for connection_config in connection_configs: ... try: self.update_saas_instance( connection_config, template, saas_config_instance, stored_dataset_json=stored_dataset_json_snapshot, + persist_template_baseline=False, ) except Exception: + all_updates_succeeded = False logger.exception(...) + + if all_updates_succeeded and stored_dataset_template: + stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)["dataset"][0] + stored_dataset_template.save(db=self.db)Also applies to: 847-881
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/service/connection/connection_service.py` around lines 701 - 824, The bug is that upsert_dataset_config_from_template persists SaasTemplateDataset (stored_dataset_template.dataset_json and save) on each successful iteration, advancing the shared baseline during batch updates in update_existing_connection_configs_for_connector_type; to fix, avoid mutating/persisting stored_dataset_template inside upsert_dataset_config_from_template when a pre-read snapshot is supplied: if stored_dataset_json is not None (the batch caller provided the baseline), skip the block that assigns stored_dataset_template.dataset_json and calls stored_dataset_template.save(db=self.db), so the shared baseline only changes when callers intentionally allow a DB write (or add an explicit flag to control persistence) — update the logic around stored_dataset_template and its save call in upsert_dataset_config_from_template to respect stored_dataset_json.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 701-824: The bug is that upsert_dataset_config_from_template
persists SaasTemplateDataset (stored_dataset_template.dataset_json and save) on
each successful iteration, advancing the shared baseline during batch updates in
update_existing_connection_configs_for_connector_type; to fix, avoid
mutating/persisting stored_dataset_template inside
upsert_dataset_config_from_template when a pre-read snapshot is supplied: if
stored_dataset_json is not None (the batch caller provided the baseline), skip
the block that assigns stored_dataset_template.dataset_json and calls
stored_dataset_template.save(db=self.db), so the shared baseline only changes
when callers intentionally allow a DB write (or add an explicit flag to control
persistence) — update the logic around stored_dataset_template and its save call
in upsert_dataset_config_from_template to respect stored_dataset_json.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9acbe206-0abe-4d5b-b980-b8cf454ce6cd
📒 Files selected for processing (2)
src/fides/service/connection/connection_service.pytests/service/test_connection_service.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 847-859: The snapshot lookup uses connector_type.lower() but the
earlier lookup in upsert_dataset_config_from_template uses connector_type
without normalization, causing inconsistent get_by exact-match behavior on
SaasTemplateDataset; update the lookup in upsert_dataset_config_from_template
(and any other SaasTemplateDataset.get_by calls like in
update_existing_connection_configs_for_connector_type) to normalize
connector_type with .lower() so both paths use the same lowercase key when
calling SaasTemplateDataset.get_by and producing stored_dataset_json_snapshot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1af8e17e-64f5-4a67-9458-82d0e8773a2a
📒 Files selected for processing (1)
src/fides/service/connection/connection_service.py
Vagoasdf
left a comment
There was a problem hiding this comment.
Overall the logic is sound and good. Just a few nitpicks for consistency, and some suggestions to avoid possible unintended effects
- Normalize connector_type with .lower() in upsert_dataset_config_from_template and saas_config_updater for defensive consistency - Add explicit CtlDataset cleanup in test (not cascade-deleted with ConnectionConfig) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/service/test_connection_service.py (1)
1836-1862: Consider using a pytest fixture for cleanup resilience.The inline cleanup won't execute if the test fails before reaching it, potentially leaving test resources in the database. A fixture with
yieldortry/finallywould ensure cleanup regardless of test outcome.That said, this is consistent with the cleanup pattern used in other tests in this file.
♻️ Optional: Convert to fixture-based cleanup
`@pytest.fixture` def multi_config_test_resources(self, db: Session, stored_dataset: Dict[str, Any]): """Creates test resources and ensures cleanup.""" connector_type = "multi_config_connector" created_configs = [] # ... setup code ... yield { "connector_type": connector_type, "configs": created_configs, # ... other resources ... } # Cleanup runs even on test failure for suffix in ("alpha", "beta"): # ... cleanup code ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/service/test_connection_service.py` around lines 1836 - 1862, The inline cleanup loop in the test (removing DatasetConfig, CtlDataset, ConnectionConfig, and SaasTemplateDataset for keys like "multi_config_{suffix}" and "multi_config_connection_{suffix}") can be skipped if the test fails early; refactor by extracting the resource setup/teardown into a pytest fixture (or wrap the test body in try/finally) that yields control and always performs cleanup: create a fixture (e.g., multi_config_test_resources) that sets up the configs, yields the created resource identifiers (connector_type, created config keys), and in its teardown iterates suffix in ("alpha","beta") to find and delete DatasetConfig (DatasetConfig.filter / .first), delete associated CtlDataset (query CtlDataset by id and db.delete/db.commit), delete ConnectionConfig (ConnectionConfig.get_by / .delete), and remove SaasTemplateDataset (SaasTemplateDataset.get_by / .delete) so cleanup runs even on test failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fides/api/util/saas_config_updater.py`:
- Around line 60-62: The comparison in the query uses
ConnectionConfig.saas_config["type"].astext == connector_type.lower(), which
only lowercases the literal side and breaks case-insensitive matching; update
the left side to use sa_func.lower(ConnectionConfig.saas_config["type"].astext)
so both sides are compared in lowercase (matching the approach in
connection_service.py), and ensure sa_func is imported/available in
saas_config_updater.py if not already.
---
Nitpick comments:
In `@tests/service/test_connection_service.py`:
- Around line 1836-1862: The inline cleanup loop in the test (removing
DatasetConfig, CtlDataset, ConnectionConfig, and SaasTemplateDataset for keys
like "multi_config_{suffix}" and "multi_config_connection_{suffix}") can be
skipped if the test fails early; refactor by extracting the resource
setup/teardown into a pytest fixture (or wrap the test body in try/finally) that
yields control and always performs cleanup: create a fixture (e.g.,
multi_config_test_resources) that sets up the configs, yields the created
resource identifiers (connector_type, created config keys), and in its teardown
iterates suffix in ("alpha","beta") to find and delete DatasetConfig
(DatasetConfig.filter / .first), delete associated CtlDataset (query CtlDataset
by id and db.delete/db.commit), delete ConnectionConfig (ConnectionConfig.get_by
/ .delete), and remove SaasTemplateDataset (SaasTemplateDataset.get_by /
.delete) so cleanup runs even on test failures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1c34aa90-872c-41d9-b06e-f4eaf96f62b1
📒 Files selected for processing (2)
src/fides/api/util/saas_config_updater.pytests/service/test_connection_service.py
The baseline snapshot held a reference to the ORM object's dict, which SQLAlchemy could refresh in-place during a session flush. deepcopy detaches the snapshot so it stays stable across loop iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns saas_config_updater.py with connection_service.py by lowercasing the DB column value too, not just the Python side. Prevents misses if a stored saas_config["type"] has mixed case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test cleanup was deleting CtlDataset first, which caused SQLAlchemy to NULL out datasetconfig.ctl_dataset_id — violating its NOT NULL constraint. Reorder to delete ConnectionConfig (cascading to DatasetConfig) first, then the orphaned CtlDataset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
83643c4 to
9cab58b
Compare
…te loop (#7578) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Ticket ENG-2947
Summary
SaasTemplateDatasetstores one row per connector type as the merge baseline for the 3-way dataset merge (customer edits vs old template vs new template)update_existing_connection_configs_for_connector_typeloops over all connection configs of a given type and callsupsert_dataset_config_from_templatefor each — which both reads and writesSaasTemplateDataseton every iterationFix: read
SaasTemplateDatasetonce before the loop and pass it as an optionalstored_dataset_jsonsnapshot throughupdate_saas_instance→upsert_dataset_config_from_template. All iterations merge against the same correct pre-update baseline. Single-integration callers (instantiate_connection) are unchanged — they pass nothing and the existing read-from-DB path applies.Changes
upsert_dataset_config_from_template: new optionalstored_dataset_jsonparam; uses it as merge baseline when provided, otherwise readsSaasTemplateDatasetas beforeupdate_saas_instance: threads the optionalstored_dataset_jsonparam throughupdate_existing_connection_configs_for_connector_type: snapshotsSaasTemplateDatasetonce before the loop, passes snapshot to each iterationTest plan
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs— creates two connection configs of the same type, runs the batch update, and asserts both received the template update (fails without the fix on whichever config is processed second)test_upsert_dataset_with_mergeandtest_update_existing_connection_configs_case_insensitivecontinue to passSummary by CodeRabbit
Bug Fixes
Tests