Skip to content

feat: Migrate POST /setup/tag endpoint to FastAPI#257

Open
0xmaster7 wants to merge 5 commits intoopenml:mainfrom
0xmaster7:feature/setup-tag-migration
Open

feat: Migrate POST /setup/tag endpoint to FastAPI#257
0xmaster7 wants to merge 5 commits intoopenml:mainfrom
0xmaster7:feature/setup-tag-migration

Conversation

@0xmaster7
Copy link

fixes #64

Migrated the POST /setup/tag endpoint to the new fastapi backend. Added auth and ownership checks so only admins or the original uploader can tag a setup, which fixes the issue mentioned in #19. Also hooked up the DB queries for the algorithm_setup and setup_tag tables.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

Adds a new database module src/database/setups.py with three helpers: get(id_, connection), get_tags_for(id_, connection), and tag(id_, tag_, *, user_id, connection). Introduces a new router src/routers/openml/setups.py exposing POST /setup/tag that enforces authentication, verifies setup existence, checks authorization (admin or uploader), detects duplicate tags (case-insensitive), inserts a new tag, and returns updated tags. Adds two HTTP error constructor helpers for authentication failure and existing-tag cases. Registers the new router in src/main.py.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: migrating the POST /setup/tag endpoint to FastAPI, which is the primary objective of this PR.
Description check ✅ Passed The description directly relates to the changeset by explaining the endpoint migration, authentication/ownership checks, and database query implementation, all of which are present in the code changes.
Linked Issues check ✅ Passed The PR successfully implements the POST /setup/tag endpoint (#64) with auth and ownership checks (#19), adding three database helpers and the endpoint router with proper validation logic.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the POST /setup/tag endpoint and its supporting infrastructure, with no extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The POST /setup/tag endpoint currently takes setup_id from the body but tag will default to a query parameter (no Body() annotation), which may not match the legacy API expectations; consider explicitly declaring tag: Annotated[str, Body(...)] or introducing a request model for clarity and consistency.
  • In get_tags_for, you only need the tag values but are doing SELECT *; narrowing this to SELECT tag will reduce unnecessary data transfer and make the intent clearer.
  • The create_tag_exists_error helper returns HTTP 500 for an already-tagged entity, which semantically represents a client conflict rather than a server error; consider using a 4xx status (e.g., 409) unless you must preserve legacy behavior tied to code 473.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The POST /setup/tag endpoint currently takes `setup_id` from the body but `tag` will default to a query parameter (no `Body()` annotation), which may not match the legacy API expectations; consider explicitly declaring `tag: Annotated[str, Body(...)]` or introducing a request model for clarity and consistency.
- In `get_tags_for`, you only need the tag values but are doing `SELECT *`; narrowing this to `SELECT tag` will reduce unnecessary data transfer and make the intent clearer.
- The `create_tag_exists_error` helper returns HTTP 500 for an already-tagged entity, which semantically represents a client conflict rather than a server error; consider using a 4xx status (e.g., 409) unless you must preserve legacy behavior tied to code 473.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/routers/openml/setups.py (1)

15-20: Add a Python unit test for the 103 Authentication failed path.

Given migration-suite limitations, this path should be validated in standard unit tests for this router module.

If useful, I can draft the exact pytest cases for the unauthenticated path and expected payload shape.
Based on learnings: "Migration tests for OpenML API endpoints that rely on php_api for data staging cannot test authentication failure paths (Code 103) when the legacy PHP API returns XML instead of JSON. These paths should be tested in standard Python unit tests after dependent endpoints are migrated."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/setups.py` around lines 15 - 20, Add a unit test that
invokes create_authentication_failed_error() (or the router function that
returns the authentication failure) and asserts it raises/returns an
HTTPException with status_code HTTPStatus.PRECONDITION_FAILED and detail
matching {"code":"103","message":"Authentication failed"}; implement the test in
pytest by importing create_authentication_failed_error from
src.routers.openml.setups, calling it (or catching the raised HTTPException if
the router endpoint raises one), and asserting both the status_code and exact
detail payload shape and types to ensure the 103 authentication-failed path is
validated independent of migration-suite integration tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/database/setups.py`:
- Around line 20-32: The get_tags_for function currently selects all columns and
lacks deterministic ordering; update the SQL in connection.execute to select
only the tag column (e.g., SELECT tag FROM setup_tag) and add an explicit ORDER
BY (for example ORDER BY tag ASC) while keeping the parameterized filter (WHERE
setup_id = :setup_id) so you return a stable, minimal list of tags for the given
setup_id.

In `@src/routers/openml/setups.py`:
- Around line 58-64: The current check-then-insert in the route using
database.setups.get_tags_for and database.setups.tag is race-prone; add a UNIQUE
constraint on (setup_id, tag) at the DB schema level and update the tag write
path to handle duplicate-key errors by translating them into the existing
create_tag_exists_error(setup_id, tag) response. Concretely, add the unique
index in migrations, then catch the DB-specific unique-violation exception
either inside database.setups.tag or around its call in this router and on that
exception raise/create the same "already tagged" error instead of propagating
the DB error.

---

Nitpick comments:
In `@src/routers/openml/setups.py`:
- Around line 15-20: Add a unit test that invokes
create_authentication_failed_error() (or the router function that returns the
authentication failure) and asserts it raises/returns an HTTPException with
status_code HTTPStatus.PRECONDITION_FAILED and detail matching
{"code":"103","message":"Authentication failed"}; implement the test in pytest
by importing create_authentication_failed_error from src.routers.openml.setups,
calling it (or catching the raised HTTPException if the router endpoint raises
one), and asserting both the status_code and exact detail payload shape and
types to ensure the 103 authentication-failed path is validated independent of
migration-suite integration tests.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a0fed5 and 4451c7d.

📒 Files selected for processing (3)
  • src/database/setups.py
  • src/main.py
  • src/routers/openml/setups.py

Comment on lines 20 to 32
def get_tags_for(id_: int, connection: Connection) -> list[str]:
"""Get all tags for a specific setup."""
rows = connection.execute(
text(
"""
SELECT *
FROM setup_tag
WHERE id = :setup_id
""",
),
parameters={"setup_id": id_},
)
return [row.tag for row in rows]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return tags in a deterministic order and avoid over-fetching.

SELECT * without ordering can yield unstable tag ordering in responses and tests. Fetch only tag and add an explicit ORDER BY.

💡 Suggested query refinement
 def get_tags_for(id_: int, connection: Connection) -> list[str]:
     """Get all tags for a specific setup."""
     rows = connection.execute(
         text(
             """
-            SELECT *
+            SELECT tag
             FROM setup_tag
             WHERE id = :setup_id
+            ORDER BY tag
             """,
         ),
         parameters={"setup_id": id_},
     )
     return [row.tag for row in rows]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database/setups.py` around lines 20 - 32, The get_tags_for function
currently selects all columns and lacks deterministic ordering; update the SQL
in connection.execute to select only the tag column (e.g., SELECT tag FROM
setup_tag) and add an explicit ORDER BY (for example ORDER BY tag ASC) while
keeping the parameterized filter (WHERE setup_id = :setup_id) so you return a
stable, minimal list of tags for the given setup_id.

Comment on lines +58 to +64
tags = database.setups.get_tags_for(setup_id, expdb_db)
if tag.casefold() in [t.casefold() for t in tags]:
raise create_tag_exists_error(setup_id, tag)

# 5. APPLY THE TAG
database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate prevention is race-prone under concurrent requests.

The current check-then-insert flow can still insert duplicates when two requests run at the same time. Enforce uniqueness in the DB (e.g., unique key on setup/tag) and map duplicate-write failure to the existing “already tagged” error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/setups.py` around lines 58 - 64, The current
check-then-insert in the route using database.setups.get_tags_for and
database.setups.tag is race-prone; add a UNIQUE constraint on (setup_id, tag) at
the DB schema level and update the tag write path to handle duplicate-key errors
by translating them into the existing create_tag_exists_error(setup_id, tag)
response. Concretely, add the unique index in migrations, then catch the
DB-specific unique-violation exception either inside database.setups.tag or
around its call in this router and on that exception raise/create the same
"already tagged" error instead of propagating the DB error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/routers/openml/setups.py (1)

59-65: ⚠️ Potential issue | 🟠 Major

Duplicate check is still race-prone under concurrency.

The check-then-insert flow can still admit duplicates when requests race. Enforce uniqueness at the DB level (setup_tag(id, tag)) and translate duplicate-key failures to create_tag_exists_error(...) in the write path.

💡 Suggested router-side handling (plus DB unique constraint in migrations)
+from sqlalchemy.exc import IntegrityError
@@
-    tags = database.setups.get_tags_for(setup_id, expdb_db)
-    if tag.casefold() in [t.casefold() for t in tags]:
+    tags = database.setups.get_tags_for(setup_id, expdb_db)
+    if tag.casefold() in {t.casefold() for t in tags}:
         raise create_tag_exists_error(setup_id, tag)
 
-    # 5. APPLY THE TAG
-    database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db)
+    # 5. APPLY THE TAG
+    try:
+        database.setups.tag(setup_id, tag, user_id=user.user_id, connection=expdb_db)
+    except IntegrityError as exc:
+        # Requires UNIQUE(id, tag) at DB level
+        raise create_tag_exists_error(setup_id, tag) from exc
 
-    return {
-        "setup_tag": {"id": str(setup_id), "tag": [*tags, tag]},
-    }
+    updated_tags = database.setups.get_tags_for(setup_id, expdb_db)
+    return {"setup_tag": {"id": str(setup_id), "tag": updated_tags}}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/setups.py` around lines 59 - 65, The current
check-then-insert (using database.setups.get_tags_for then database.setups.tag)
is race-prone; add a DB-level unique constraint on the setup_tag(id, tag) column
and modify the write path that calls database.setups.tag to catch the database
unique-violation/duplicate-key exception (e.g., IntegrityError/UniqueViolation
from your DB driver) and translate it into create_tag_exists_error(setup_id,
tag). You can keep the case-insensitive pre-check for fast-fail using
database.setups.get_tags_for, but rely on the DB constraint as the source of
truth and handle the duplicate-key error in the same scope where
database.setups.tag is invoked so concurrent inserts return
create_tag_exists_error instead of creating duplicates.
🧹 Nitpick comments (1)
src/routers/openml/setups.py (1)

37-37: Consolidate the tag parameter to use only SystemString64 as the validation constraint.

The current annotation has two conflicting Body() instances in the metadata, which creates ambiguity during constraint processing. This pattern is inconsistent with similar parameters elsewhere (e.g., datasets.py). Use Annotated[str, SystemString64] to align with the codebase convention, or explicitly merge the constraints if the embed=False behavior is required:

tag: Annotated[str, SystemString64]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/setups.py` at line 37, The tag parameter currently
includes conflicting metadata (Body(..., embed=False) and SystemString64);
remove the Body() metadata and consolidate to use only SystemString64 for
validation by changing the annotation for tag to use Annotated[str,
SystemString64] (or if embed=False is actually required, move that constraint
into a single explicit validator instead of leaving two metadata entries) so the
parameter in the function/signature that declares tag uses only SystemString64
for constraint processing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/routers/openml/setups.py`:
- Around line 59-65: The current check-then-insert (using
database.setups.get_tags_for then database.setups.tag) is race-prone; add a
DB-level unique constraint on the setup_tag(id, tag) column and modify the write
path that calls database.setups.tag to catch the database
unique-violation/duplicate-key exception (e.g., IntegrityError/UniqueViolation
from your DB driver) and translate it into create_tag_exists_error(setup_id,
tag). You can keep the case-insensitive pre-check for fast-fail using
database.setups.get_tags_for, but rely on the DB constraint as the source of
truth and handle the duplicate-key error in the same scope where
database.setups.tag is invoked so concurrent inserts return
create_tag_exists_error instead of creating duplicates.

---

Nitpick comments:
In `@src/routers/openml/setups.py`:
- Line 37: The tag parameter currently includes conflicting metadata (Body(...,
embed=False) and SystemString64); remove the Body() metadata and consolidate to
use only SystemString64 for validation by changing the annotation for tag to use
Annotated[str, SystemString64] (or if embed=False is actually required, move
that constraint into a single explicit validator instead of leaving two metadata
entries) so the parameter in the function/signature that declares tag uses only
SystemString64 for constraint processing.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4451c7d and 8dfd48a.

📒 Files selected for processing (2)
  • src/database/setups.py
  • src/routers/openml/setups.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/database/setups.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

POST /setup/tag

1 participant