Replace BaseHTTPMiddleware with pure ASGI middlewares#7230
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Greptile OverviewGreptile SummaryThis PR refactors all middleware from Key Changes:
Previous Issues Addressed:
Confidence Score: 4/5
Important Files Changed
|
|
@greptile please re-review |
|
@greptile please re-review |
adamsachs
left a comment
There was a problem hiding this comment.
great job finding what seems to be a serious performance improvement. from what i can tell, your pure ASGI replacements achieve parity with the existing middleware functionality 👍
the major tradeoff here is readability/maintainability: the pure ASGI versions are more verbose and 'clunky', as expected, since we're deliberately not using the BaseHTTPMiddleware abstraction anymore. but could we regain some of the benefits of that abstraction by just making our own? that's my main feedback here - it seems like there is quite a bit of room to make our 'own' base http middleware abstraction that's lighter weight (and more performant) than starlette's, but at least gives us some of benefits of a base class: shared/consistent code, less boilerplate, etc. that's not necessarily a blocker, but i'd like to at least know we're planning a quick followup, or else we risk never doing that. unless there's a specific reason you've decided not to make that abstraction?
beyond that, just a couple of nits i spotted along the way.
generally, what do you think are the biggest risks? i know we have some pretty solid test coverage, but some middleware things (e.g. rate limiting) can be pretty hard to cover in an automated test suite, so i'd be curious to know if there's anything you're particularly worried about and whether we can get ahead of some 'manual' (or more integration) testing to de-risk that.
src/fides/api/asgi_middleware.py
Outdated
| if scope["type"] != "http": | ||
| await self.app(scope, receive, send) | ||
| return | ||
|
|
||
| start_time = perf_counter() | ||
| status_code = 500 # Default in case of exception | ||
|
|
||
| # Extract request info from scope | ||
| method = scope.get("method", "UNKNOWN") | ||
| path = scope.get("path", "/") | ||
|
|
||
| # Get Fides-Client header | ||
| headers = dict(scope.get("headers", [])) | ||
| fides_client = headers.get(b"fides-client", b"unknown").decode("latin-1") |
There was a problem hiding this comment.
i think there's quite a bit of room here for a useful abstraction/standardization, specifically around the basic parsing/validation of the scope object: e.g. filtering out non-HTTP requests, retrieving the headers dict, etc.
should we not create a base class here that these can all inherit from, so that we get some of that boiler plate 'for free' on each middleware class, and so that its standardized? or at least some helper methods to reduce code duplication? i know we don't want to reinvent the BaseHTTPMiddleware wholesale, but i think it'd be prudent to build in a few more abstractions here, unless there's a compelling reason not to...
src/fides/api/util/rate_limit.py
Outdated
There was a problem hiding this comment.
weird, it has a bunch of tests but does seem unused..
There was a problem hiding this comment.
well you removed a use of it, so that would sorta make sense :)
| # Validate header before SlowAPI processes the request | ||
| fastapi_app.add_middleware(RateLimitIPValidationMiddleware) | ||
| # Required for default rate limiting to work | ||
| fastapi_app.add_middleware(SlowAPIMiddleware) |
There was a problem hiding this comment.
i think this is also a basehttpmiddleware. looks like they have a pure ASGI version though, SlowAPIASGIMiddleware (see laurentS/slowapi#113 for some background, i haven't looked into whether there are any tradeoffs/reasons to not switch over...)
There was a problem hiding this comment.
oh interesting! I kind of skipped over the external packages ones. Since we don't enable them by default, I might look into changing it as a follow-up PR
There was a problem hiding this comment.
oh ok so now we use our self-rolled rate limiting middleware instead of this one, by default?
because we do have rate limiting by default, right?
There was a problem hiding this comment.
So we might have it on in fides cloud but it requires setting rate_limit_client_ip_header , which is empty by default
There was a problem hiding this comment.
I have sanity checked: if configured, we use both our rate limit one (which I think just does IP validation? perhaps confusingly named? ) and the slow API one.
I'll look into replacing the SlowAPI one with the ASGI version in a follow up PR
| @@ -112,6 +118,19 @@ | |||
| GZipMiddleware, minimum_size=1000, compresslevel=5 | |||
There was a problem hiding this comment.
this one looks good, seems to already be pure ASGI 👍
vcruces
left a comment
There was a problem hiding this comment.
LGTM, really nice work! I imagine this wasn’t easy. Thanks for adding so many tests, they give confidence for something that seems hard to test 😓
tests/api/test_logging.py
Outdated
| sent_messages.append(message) | ||
|
|
||
| # Call the middleware - it should catch the exception and send a 500 response | ||
| await middleware(scope, receive, send) |
There was a problem hiding this comment.
I might be mistaken and this may just be similar logic, but is this logic being repeated in a few places? If so, would it be worth extracting it for reuse?
| @@ -0,0 +1,115 @@ | |||
| # pylint: disable=missing-docstring, redefined-outer-name | |||
There was a problem hiding this comment.
Given that we already have tests/ops/api/v1/test_middleware.py, would it make sense to rename this one (or the other)? I find it a bit confusing to tell what each file contains
There was a problem hiding this comment.
oh that's some git mishap , I meant to have moved test_middleware into the /middleware folder but I see that both files are kept , I'll fix. thanks for the callout!
adamsachs
left a comment
There was a problem hiding this comment.
looks good, thank you for taking the time to put that base class in place - i think it makes all this code a lot cleaner, and will continue to pay dividends as we add more middleware!
src/fides/api/asgi_middleware.py
Outdated
| ASGIApp = Callable[[Scope, Receive, Send], Awaitable[None]] | ||
|
|
||
|
|
||
| class BaseASGIMiddleware: |
There was a problem hiding this comment.
love it! even more extensive than i imagined, this makes the middleware implementations so much cleaner.
we should submit a PR to https://github.com/Kludex/starlette that adds this to the core library :)
| return | ||
| await self.handle_http(scope, receive, send) | ||
|
|
||
| async def handle_http(self, scope: Scope, receive: Receive, send: Send) -> None: |
There was a problem hiding this comment.
you prefer to leave this with a default impl rather than as an abstractmethod (and making this class an ABC)? not saying that's wrong, just wanted to check it's a deliberate (or at least defensible) choice :)
There was a problem hiding this comment.
hmm yeah abstract method might be better, I did debate about it since this seemed like a reasonable default, but also the original BaseHTTPMiddleware does raise NotImplemented for the dispatch method.
There was a problem hiding this comment.
sure that works, makes sense to model BaseHTTPMiddleware closely in that respect 👍
Ticket ENG-2409
Description Of Changes
While debugging performance issues for the consent v3 APIs, we also ran some benchmarks on a no-op
/pingendpoint and found it couldn't handle a large number of RPS. Trying to find the cause, we found that theBaseHTTPMiddlewareclass (used by all our middlewares!) is extremely slow . Even adding a no-opBaseHTTPMiddlewarethat just calls the next on the stack majorly affects performance.This PR replace our middleware implementation with a pure ASGI middleware that does not rely on
BaseHTTPMiddlewareclass. This is meant to be just a refactor of the underlying implementation, but all middleware behavior should stay the same.Steps to Confirm
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works