[FEAT] Add POST /datasets/untag endpoint by ritoban23 · Pull Request #263 · openml/server-api

ritoban23 · 2026-02-28T20:35:34Z

Summary

Implements the POST /datasets/untag endpoint (issue #20), part of #6.

Changes

src/database/datasets.py — added untag() function to delete a row from dataset_tag
src/routers/openml/datasets.py — added untag_dataset endpoint and create_tag_not_found_error() helper (error code 474)
tests/routers/openml/dataset_tag_test.py — added tests covering: unauthenticated requests, successful untag, tag-not-found error, and invalid tag validation

Behaviour

Requires authentication (error 103 if missing)
Returns 474 if the tag is not present on the dataset
Returns {"data_untag": {"id": "<id>"}} on success

coderabbitai · 2026-02-28T20:35:51Z

Walkthrough

This pull request introduces an untag feature for datasets. The changes include: a new database function untag() in src/database/datasets.py that removes tags from datasets using a DELETE query; a new POST /untag API endpoint in src/routers/openml/datasets.py with authentication requirements, tag existence validation, and error handling; and comprehensive test coverage in tests/routers/openml/dataset_tag_test.py for authorization checks, successful untagging across user roles, tag validation, and error scenarios.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a new POST /datasets/untag endpoint, which is clearly related to the changeset containing database function, router endpoint, and corresponding tests.
Description check	✅ Passed	The description is well-structured and directly related to the changeset, detailing the three modified files, behavioral expectations including error codes, and authentication requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Instead of fetching all tags and doing a case-insensitive membership check in Python before deleting, consider letting untag return the number of affected rows (via rowcount) and derive the tag-not-found error from that to avoid an extra query and potential race conditions.
The case-insensitive tag comparison in untag_dataset currently rebuilds a list on every request; you could simplify and make this more efficient by normalizing tags once to a set of casefold()ed values or by normalizing input on insert and comparing directly.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Instead of fetching all tags and doing a case-insensitive membership check in Python before deleting, consider letting `untag` return the number of affected rows (via `rowcount`) and derive the tag-not-found error from that to avoid an extra query and potential race conditions.
- The case-insensitive tag comparison in `untag_dataset` currently rebuilds a list on every request; you could simplify and make this more efficient by normalizing `tags` once to a `set` of `casefold()`ed values or by normalizing input on insert and comparing directly.

## Individual Comments

### Comment 1
<location path="src/routers/openml/datasets.py" line_range="64-67" />
<code_context>
+        raise create_authentication_failed_error()
+
+    tags = database.datasets.get_tags_for(data_id, expdb_db)
+    if tag.casefold() not in [t.casefold() for t in tags]:
+        raise create_tag_not_found_error(data_id, tag)
+
+    database.datasets.untag(data_id, tag, connection=expdb_db)
+    return {
+        "data_untag": {"id": str(data_id)},
</code_context>
<issue_to_address>
**issue (bug_risk):** Case-insensitive existence check but case-sensitive delete can lead to silent no-op.

The membership check uses `casefold()` but `untag` receives the raw `tag`. If the stored tag differs only by case (e.g. "Foo" vs "foo"), the check can succeed while the `DELETE` affects 0 rows (depending on DB collation), so the endpoint reports success without untagging. Normalize the tag consistently for both check and delete, or enforce a canonical casing in storage so their behavior matches.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-02-28T20:36:38Z

src/routers/openml/datasets.py

+    if tag.casefold() not in [t.casefold() for t in tags]:
+        raise create_tag_not_found_error(data_id, tag)
+
+    database.datasets.untag(data_id, tag, connection=expdb_db)


issue (bug_risk): Case-insensitive existence check but case-sensitive delete can lead to silent no-op.

The membership check uses casefold() but untag receives the raw tag. If the stored tag differs only by case (e.g. "Foo" vs "foo"), the check can succeed while the DELETE affects 0 rows (depending on DB collation), so the endpoint reports success without untagging. Normalize the tag consistently for both check and delete, or enforce a canonical casing in storage so their behavior matches.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 63-69: Validation currently compares tag case-insensitively using
tags = database.datasets.get_tags_for(data_id, expdb_db) but then calls
database.datasets.untag(data_id, tag, connection=expdb_db) with the raw input,
which can no-op on case-sensitive DBs; change the flow to find the canonical
stored tag from tags (e.g., pick the element t from tags where t.casefold() ==
tag.casefold()) and pass that canonical value to database.datasets.untag; keep
the same create_tag_not_found_error path when no match is found and return the
same payload using the data_id.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72989df and 5ae0a22.

📒 Files selected for processing (3)

src/database/datasets.py
src/routers/openml/datasets.py
tests/routers/openml/dataset_tag_test.py

coderabbitai · 2026-02-28T20:38:56Z

src/routers/openml/datasets.py

+    tags = database.datasets.get_tags_for(data_id, expdb_db)
+    if tag.casefold() not in [t.casefold() for t in tags]:
+        raise create_tag_not_found_error(data_id, tag)
+
+    database.datasets.untag(data_id, tag, connection=expdb_db)
+    return {
+        "data_untag": {"id": str(data_id)},


⚠️ Potential issue | 🟠 Major

Use the canonical stored tag for delete to avoid false-success on mixed-case input.

Validation is case-insensitive, but deletion uses the raw input. On a case-sensitive collation, this can return success without removing anything.

💡 Suggested fix

- tags = database.datasets.get_tags_for(data_id, expdb_db) - if tag.casefold() not in [t.casefold() for t in tags]: + tags = database.datasets.get_tags_for(data_id, expdb_db) + matching_tag = next((existing for existing in tags if existing.casefold() == tag.casefold()), None) + if matching_tag is None: raise create_tag_not_found_error(data_id, tag) - database.datasets.untag(data_id, tag, connection=expdb_db) + database.datasets.untag(data_id, matching_tag, connection=expdb_db)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tags = database.datasets.get_tags_for(data_id, expdb_db)

if tag.casefold() not in [t.casefold() for t in tags]:

raise create_tag_not_found_error(data_id, tag)

database.datasets.untag(data_id, tag, connection=expdb_db)

return {

"data_untag": {"id": str(data_id)},

tags = database.datasets.get_tags_for(data_id, expdb_db)

matching_tag = next((existing for existing in tags if existing.casefold() == tag.casefold()), None)

if matching_tag is None:

raise create_tag_not_found_error(data_id, tag)

database.datasets.untag(data_id, matching_tag, connection=expdb_db)

return {

"data_untag": {"id": str(data_id)},

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 63 - 69, Validation currently compares tag case-insensitively using tags = database.datasets.get_tags_for(data_id, expdb_db) but then calls database.datasets.untag(data_id, tag, connection=expdb_db) with the raw input, which can no-op on case-sensitive DBs; change the flow to find the canonical stored tag from tags (e.g., pick the element t from tags where t.casefold() == tag.casefold()) and pass that canonical value to database.datasets.untag; keep the same create_tag_not_found_error path when no match is found and return the same payload using the data_id.

[FEAT] Add POST /datasets/untag endpoint (openml#20)

5ae0a22

sourcery-ai bot reviewed Feb 28, 2026

View reviewed changes

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEAT] Add POST /datasets/untag endpoint #263

[FEAT] Add POST /datasets/untag endpoint #263
ritoban23 wants to merge 1 commit intoopenml:mainfrom
ritoban23:feature/issue-20-dataset-untag

ritoban23 commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Feb 28, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ritoban23 commented Feb 28, 2026

Summary

Changes

Behaviour

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 28, 2026 •

edited

Loading