feat: Migrate POST /setup/tag endpoint to FastAPI#257
feat: Migrate POST /setup/tag endpoint to FastAPI#2570xmaster7 wants to merge 5 commits intoopenml:mainfrom
Conversation
WalkthroughAdds a new database module 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The POST /setup/tag endpoint currently takes
setup_idfrom the body buttagwill default to a query parameter (noBody()annotation), which may not match the legacy API expectations; consider explicitly declaringtag: Annotated[str, Body(...)]or introducing a request model for clarity and consistency. - In
get_tags_for, you only need the tag values but are doingSELECT *; narrowing this toSELECT tagwill reduce unnecessary data transfer and make the intent clearer. - The
create_tag_exists_errorhelper returns HTTP 500 for an already-tagged entity, which semantically represents a client conflict rather than a server error; consider using a 4xx status (e.g., 409) unless you must preserve legacy behavior tied to code 473.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The POST /setup/tag endpoint currently takes `setup_id` from the body but `tag` will default to a query parameter (no `Body()` annotation), which may not match the legacy API expectations; consider explicitly declaring `tag: Annotated[str, Body(...)]` or introducing a request model for clarity and consistency.
- In `get_tags_for`, you only need the tag values but are doing `SELECT *`; narrowing this to `SELECT tag` will reduce unnecessary data transfer and make the intent clearer.
- The `create_tag_exists_error` helper returns HTTP 500 for an already-tagged entity, which semantically represents a client conflict rather than a server error; consider using a 4xx status (e.g., 409) unless you must preserve legacy behavior tied to code 473.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/routers/openml/setups.py (1)
15-20: Add a Python unit test for the103 Authentication failedpath.Given migration-suite limitations, this path should be validated in standard unit tests for this router module.
If useful, I can draft the exact pytest cases for the unauthenticated path and expected payload shape.
Based on learnings: "Migration tests for OpenML API endpoints that rely on php_api for data staging cannot test authentication failure paths (Code 103) when the legacy PHP API returns XML instead of JSON. These paths should be tested in standard Python unit tests after dependent endpoints are migrated."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/setups.py` around lines 15 - 20, Add a unit test that invokes create_authentication_failed_error() (or the router function that returns the authentication failure) and asserts it raises/returns an HTTPException with status_code HTTPStatus.PRECONDITION_FAILED and detail matching {"code":"103","message":"Authentication failed"}; implement the test in pytest by importing create_authentication_failed_error from src.routers.openml.setups, calling it (or catching the raised HTTPException if the router endpoint raises one), and asserting both the status_code and exact detail payload shape and types to ensure the 103 authentication-failed path is validated independent of migration-suite integration tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/database/setups.py`:
- Around line 20-32: The get_tags_for function currently selects all columns and
lacks deterministic ordering; update the SQL in connection.execute to select
only the tag column (e.g., SELECT tag FROM setup_tag) and add an explicit ORDER
BY (for example ORDER BY tag ASC) while keeping the parameterized filter (WHERE
setup_id = :setup_id) so you return a stable, minimal list of tags for the given
setup_id.
In `@src/routers/openml/setups.py`:
- Around line 58-64: The current check-then-insert in the route using
database.setups.get_tags_for and database.setups.tag is race-prone; add a UNIQUE
constraint on (setup_id, tag) at the DB schema level and update the tag write
path to handle duplicate-key errors by translating them into the existing
create_tag_exists_error(setup_id, tag) response. Concretely, add the unique
index in migrations, then catch the DB-specific unique-violation exception
either inside database.setups.tag or around its call in this router and on that
exception raise/create the same "already tagged" error instead of propagating
the DB error.
---
Nitpick comments:
In `@src/routers/openml/setups.py`:
- Around line 15-20: Add a unit test that invokes
create_authentication_failed_error() (or the router function that returns the
authentication failure) and asserts it raises/returns an HTTPException with
status_code HTTPStatus.PRECONDITION_FAILED and detail matching
{"code":"103","message":"Authentication failed"}; implement the test in pytest
by importing create_authentication_failed_error from src.routers.openml.setups,
calling it (or catching the raised HTTPException if the router endpoint raises
one), and asserting both the status_code and exact detail payload shape and
types to ensure the 103 authentication-failed path is validated independent of
migration-suite integration tests.
| def get_tags_for(id_: int, connection: Connection) -> list[str]: | ||
| """Get all tags for a specific setup.""" | ||
| rows = connection.execute( | ||
| text( | ||
| """ | ||
| SELECT * | ||
| FROM setup_tag | ||
| WHERE id = :setup_id | ||
| """, | ||
| ), | ||
| parameters={"setup_id": id_}, | ||
| ) | ||
| return [row.tag for row in rows] |
There was a problem hiding this comment.
Return tags in a deterministic order and avoid over-fetching.
SELECT * without ordering can yield unstable tag ordering in responses and tests. Fetch only tag and add an explicit ORDER BY.
💡 Suggested query refinement
def get_tags_for(id_: int, connection: Connection) -> list[str]:
"""Get all tags for a specific setup."""
rows = connection.execute(
text(
"""
- SELECT *
+ SELECT tag
FROM setup_tag
WHERE id = :setup_id
+ ORDER BY tag
""",
),
parameters={"setup_id": id_},
)
return [row.tag for row in rows]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/database/setups.py` around lines 20 - 32, The get_tags_for function
currently selects all columns and lacks deterministic ordering; update the SQL
in connection.execute to select only the tag column (e.g., SELECT tag FROM
setup_tag) and add an explicit ORDER BY (for example ORDER BY tag ASC) while
keeping the parameterized filter (WHERE setup_id = :setup_id) so you return a
stable, minimal list of tags for the given setup_id.
| tags = database.setups.get_tags_for(setup_id, expdb_db) | ||
| if tag.casefold() in [t.casefold() for t in tags]: | ||
| raise create_tag_exists_error(setup_id, tag) | ||
|
|
||
| # 5. APPLY THE TAG | ||
| database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db) | ||
|
|
There was a problem hiding this comment.
Duplicate prevention is race-prone under concurrent requests.
The current check-then-insert flow can still insert duplicates when two requests run at the same time. Enforce uniqueness in the DB (e.g., unique key on setup/tag) and map duplicate-write failure to the existing “already tagged” error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/setups.py` around lines 58 - 64, The current
check-then-insert in the route using database.setups.get_tags_for and
database.setups.tag is race-prone; add a UNIQUE constraint on (setup_id, tag) at
the DB schema level and update the tag write path to handle duplicate-key errors
by translating them into the existing create_tag_exists_error(setup_id, tag)
response. Concretely, add the unique index in migrations, then catch the
DB-specific unique-violation exception either inside database.setups.tag or
around its call in this router and on that exception raise/create the same
"already tagged" error instead of propagating the DB error.
…ster7/server-api into feature/setup-tag-migration
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/routers/openml/setups.py (1)
59-65:⚠️ Potential issue | 🟠 MajorDuplicate check is still race-prone under concurrency.
The check-then-insert flow can still admit duplicates when requests race. Enforce uniqueness at the DB level (
setup_tag(id, tag)) and translate duplicate-key failures tocreate_tag_exists_error(...)in the write path.💡 Suggested router-side handling (plus DB unique constraint in migrations)
+from sqlalchemy.exc import IntegrityError @@ - tags = database.setups.get_tags_for(setup_id, expdb_db) - if tag.casefold() in [t.casefold() for t in tags]: + tags = database.setups.get_tags_for(setup_id, expdb_db) + if tag.casefold() in {t.casefold() for t in tags}: raise create_tag_exists_error(setup_id, tag) - # 5. APPLY THE TAG - database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db) + # 5. APPLY THE TAG + try: + database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db) + except IntegrityError as exc: + # Requires UNIQUE(id, tag) at DB level + raise create_tag_exists_error(setup_id, tag) from exc - return { - "setup_tag": {"id": str(setup_id), "tag": [*tags, tag]}, - } + updated_tags = database.setups.get_tags_for(setup_id, expdb_db) + return {"setup_tag": {"id": str(setup_id), "tag": updated_tags}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/setups.py` around lines 59 - 65, The current check-then-insert (using database.setups.get_tags_for then database.setups.tag) is race-prone; add a DB-level unique constraint on the setup_tag(id, tag) column and modify the write path that calls database.setups.tag to catch the database unique-violation/duplicate-key exception (e.g., IntegrityError/UniqueViolation from your DB driver) and translate it into create_tag_exists_error(setup_id, tag). You can keep the case-insensitive pre-check for fast-fail using database.setups.get_tags_for, but rely on the DB constraint as the source of truth and handle the duplicate-key error in the same scope where database.setups.tag is invoked so concurrent inserts return create_tag_exists_error instead of creating duplicates.
🧹 Nitpick comments (1)
src/routers/openml/setups.py (1)
37-37: Consolidate thetagparameter to use onlySystemString64as the validation constraint.The current annotation has two conflicting
Body()instances in the metadata, which creates ambiguity during constraint processing. This pattern is inconsistent with similar parameters elsewhere (e.g.,datasets.py). UseAnnotated[str, SystemString64]to align with the codebase convention, or explicitly merge the constraints if theembed=Falsebehavior is required:tag: Annotated[str, SystemString64]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/setups.py` at line 37, The tag parameter currently includes conflicting metadata (Body(..., embed=False) and SystemString64); remove the Body() metadata and consolidate to use only SystemString64 for validation by changing the annotation for tag to use Annotated[str, SystemString64] (or if embed=False is actually required, move that constraint into a single explicit validator instead of leaving two metadata entries) so the parameter in the function/signature that declares tag uses only SystemString64 for constraint processing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/routers/openml/setups.py`:
- Around line 59-65: The current check-then-insert (using
database.setups.get_tags_for then database.setups.tag) is race-prone; add a
DB-level unique constraint on the setup_tag(id, tag) column and modify the write
path that calls database.setups.tag to catch the database
unique-violation/duplicate-key exception (e.g., IntegrityError/UniqueViolation
from your DB driver) and translate it into create_tag_exists_error(setup_id,
tag). You can keep the case-insensitive pre-check for fast-fail using
database.setups.get_tags_for, but rely on the DB constraint as the source of
truth and handle the duplicate-key error in the same scope where
database.setups.tag is invoked so concurrent inserts return
create_tag_exists_error instead of creating duplicates.
---
Nitpick comments:
In `@src/routers/openml/setups.py`:
- Line 37: The tag parameter currently includes conflicting metadata (Body(...,
embed=False) and SystemString64); remove the Body() metadata and consolidate to
use only SystemString64 for validation by changing the annotation for tag to use
Annotated[str, SystemString64] (or if embed=False is actually required, move
that constraint into a single explicit validator instead of leaving two metadata
entries) so the parameter in the function/signature that declares tag uses only
SystemString64 for constraint processing.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/database/setups.pysrc/routers/openml/setups.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/database/setups.py
fixes #64
Migrated the
POST /setup/tagendpoint to the new fastapi backend. Added auth and ownership checks so only admins or the original uploader can tag a setup, which fixes the issue mentioned in #19. Also hooked up the DB queries for thealgorithm_setupandsetup_tagtables.