Add circuit breaker to redis version cache#7536
Merged
Conversation
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
Contributor
Greptile SummaryThis PR implements a circuit breaker pattern to prevent repeated slow timeouts when Redis is down, improving application resilience during Redis outages.
The implementation is clean, well-documented, and solves a real production problem where Redis outages could cause cascading timeouts. Confidence Score: 5/5
Important Files Changed
Last reviewed commit: a316875 |
adamsachs
approved these changes
Mar 2, 2026
Contributor
adamsachs
left a comment
There was a problem hiding this comment.
OK, i think this should hopefully help the problem we're seeing. but i'd at least recommend we fix a major gap in our logging that is preventing us from learning more about what's actually happening on our nightly environment! i've called that out in a comment
src/fides/api/service/connectors/saas/connector_registry_service.py
Outdated
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
[TODO: fill in ticket number]
Description Of Changes
When Redis is unavailable, the
redis_version_cacheddecorator's fallback logic works correctly (returns stale cache or calls the function directly), but each invocation still attempts a Redis connection before falling back. During startup,update_saas_configscalledConnectorRegistry.connector_types()and thenConnectorRegistry.get_connector_template()per connector type -- each of which called_get_combined_templates(), triggering a Redis version check viaCustomConnectorTemplateLoader. This meant ~50+ redundant Redis calls in a loop, even though the Redis cache only serves custom templates (not the file-based ones). With the standalone Redis client having no socket timeouts (defaulting toNone/ block indefinitely), this could cause startup to hang for an extended period when Redis is down.Code Changes
src/fides/api/util/cache.py: Addedsocket_connect_timeout=10.0andsocket_timeout=10.0to the standalone Redis client constructor, matching the cluster client which already had these. Prevents indefinite blocking on a single Redis call.src/fides/api/util/redis_version_cache.py: Added a circuit breaker to_get_redis_versionand_bump_redis_version. After the first Redis failure, subsequent calls within a 30-second cooldown window raise immediately without contacting Redis. The circuit resets automatically after the cooldown expires or upon a successful Redis call.src/fides/api/service/connectors/saas/connector_registry_service.py: Addedget_all_templates()public method toConnectorRegistryto allow callers to load all templates once.src/fides/api/util/saas_config_updater.py: Changedupdate_saas_configsto callConnectorRegistry.get_all_templates()once before the loop instead of callingconnector_types()+get_connector_template()per iteration. This reduces Redis version checks from ~50+ to exactly 1.tests/ops/util/test_redis_version_cache.py: AddedTestCircuitBreakerclass with 4 tests (circuit opens after failure, resets after success, closes after cooldown, prevents repeated calls in a startup loop). Updated the autouse fixture to reset circuit breaker state between tests.Steps to Confirm
pytest --no-cov tests/ops/util/test_redis_version_cache.py -v-- all 19 tests should pass (15 existing + 4 new)pytest --no-cov tests/ops/service/connectors/test_connector_registry_service.py -vpytest --no-cov tests/ops/api/v1/endpoints/test_connector_template_endpoints.py -vdocker stop fides-redis), restart the fides server, and confirm it starts up without hanging. Theredis_version_cachedecorator should log a debug message about Redis being unavailable and fall back gracefully. Restart Redis afterward (docker start fides-redis) and confirm normal caching resumes.CHANGELOG.mdupdatedmaindowngrade()migration is correct and works