chore: enable selective docstring linting (ruff) by Toton642 · Pull Request #261 · openml/server-api

Toton642 · 2026-02-28T00:05:42Z

Summary

Enable docstring linting gradually by removing the blanket "D" ignore from the ruff configuration in pyproject.toml. This change keeps specific D-codes that are still noisy (e.g., D203, D204, D213), and preserves per-file ignores for tests.

Changes

Remove global "D" from tool.ruff.lint.ignore in pyproject.toml.
Keep specific D rules in the ignore list to avoid immediate noise.

Motivation

This aligns with issue #212: make docstrings more consistent and improve readability/maintenance. We enable docstring checks incrementally so maintainers can opt to fix violations progressively.

Checklist

Small, one-file change
Non-breaking (only changes linter behavior)
Ready for maintainer review

Related: openml/server-api issue #212. (#212)

coderabbitai · 2026-02-28T00:06:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaces an IntEnum-based error catalog with an RFC 9457-compatible exception framework: adds ProblemDetailError with per-instance status/code overrides, many domain-specific exception subclasses (e.g., DatasetNotFoundError, StudyNotFoundError, FlowNotFoundError, TaskNotFoundError, TagAlreadyExistsError, InternalError), and a FastAPI exception handler that emits application/problem+json; registers the handler in create_api. Updates routers and tests to raise/expect these exceptions instead of HTTPException/HTTPStatus payloads. Also adjusts linting in pyproject.toml to gradually enable docstring (D) rules while retaining a small set of ignored D-codes, and removes the old _format_error helper.

Possibly related issues

openml/server-api issue 212 — Both changes modify pyproject.toml Ruff configuration to stop globally ignoring D-class docstring rules and instead progressively enable docstring linting while retaining specific noisy D-codes.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: enabling selective docstring linting in the ruff configuration by removing the blanket 'D' ignore while keeping specific noisy D-codes.
Description check	✅ Passed	The description is well-related to the changeset, explaining the motivation (issue `#212`), specific changes to pyproject.toml, and the gradual approach to enabling docstring checks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The new RFC 9457 error model is introduced cleanly, but many routers still raise bare HTTPException elsewhere in the codebase; consider auditing remaining HTTPException usages and migrating them to ProblemDetailError subclasses to keep error responses consistent in format and semantics.
Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating ProblemDetailError subclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new RFC 9457 error model is introduced cleanly, but many routers still raise bare `HTTPException` elsewhere in the codebase; consider auditing remaining `HTTPException` usages and migrating them to `ProblemDetailError` subclasses to keep error responses consistent in format and semantics.
- Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating `ProblemDetailError` subclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.

## Individual Comments

### Comment 1
<location path="src/database/users.py" line_range="14-13" />
<code_context>
 # If `allow_test_api_keys` is set, the key may also be one of `normaluser`,
 # `normaluser2`, or `abc` (admin).
 api_key_pattern = r"^[0-9a-fA-F]{32}$"
-if load_configuration()["development"].get("allow_test_api_keys"):
+if load_configuration().get("development", {}).get("allow_test_api_keys"):
     api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using a default `{}` here may unintentionally hide configuration issues

Previously, a missing "development" section would raise, exposing misconfiguration early. With `.get("development", {})`, a missing section now silently falls back to production-like behavior (no test API keys), which can be harder to diagnose.

If a missing "development" section should be treated as an error, consider keeping the strict indexing (`load_configuration()["development"]`) and only using `.get("allow_test_api_keys")` on that, or at least emit a log when the section is absent.
</issue_to_address>

### Comment 2
<location path="tests/routers/openml/task_type_test.py" line_range="41-46" />
<code_context>
 ) -> None:
-    assert response.status_code == HTTPStatus.PRECONDITION_FAILED
-    assert response.json()["detail"] == {"code": "372", "message": "No results"}
+    assert response.status_code == HTTPStatus.NOT_FOUND
+    assert response.headers["content-type"] == "application/problem+json"
+    error = response.json()
</code_context>
<issue_to_address>
**nitpick (testing):** Consider also asserting the task type error `status` field in the problem+json body

Since the problem+json body also includes a `status` field, it would be helpful to assert `error["status"] == HTTPStatus.NOT_FOUND` to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/database/users.py

sourcery-ai · 2026-02-28T00:06:59Z

tests/routers/openml/task_type_test.py

+    assert response.status_code == HTTPStatus.NOT_FOUND
+    assert response.headers["content-type"] == "application/problem+json"
+    error = response.json()
+    assert error["type"] == TaskTypeNotFoundError.uri
+    assert error["code"] == "241"
+    assert "Unknown task type" in error["detail"]


nitpick (testing): Consider also asserting the task type error status field in the problem+json body

Since the problem+json body also includes a status field, it would be helpful to assert error["status"] == HTTPStatus.NOT_FOUND to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

pyproject.toml (1)

66-66: Clarify the DTZ comment.

The comment # to add is ambiguous. Consider clarifying what this means—is it a reminder to enable DTZ checks in the future, or something else? A more descriptive comment like # TODO: enable timezone checks would be clearer.
Proposed fix
-    "DTZ", # to add
+    "DTZ",  # TODO: enable timezone-aware datetime checks
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 66, The comment next to the "DTZ" entry is ambiguous;
update the comment to clearly state the intention for "DTZ" (e.g., whether it is
a reminder, a TODO to enable timezone checks, or documentation) so future
reviewers understand the purpose; locate the line containing the "DTZ" string
and replace `# to add` with a concise, descriptive comment like `# TODO: enable
timezone checks` or `# reminder: add DTZ validation` depending on the intended
action.

tests/routers/openml/datasets_list_datasets_test.py (1)

295-304: Reuse _assert_empty_result to avoid assertion drift.

This branch re-implements the same checks already centralized in _assert_empty_result(response). Reusing the helper will keep contracts synchronized in one place.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/datasets_list_datasets_test.py` around lines 295 - 304,
Replace the duplicated error-format assertions inside the "if php_is_error or
py_is_error" branch with a call to the centralized helper: keep the parity check
assert php_is_error == py_is_error (using original.status_code and
response.status_code as currently written), then call
_assert_empty_result(response) instead of re-implementing content-type and JSON
error field checks (which are already validated by _assert_empty_result). This
ensures the branch uses the single source of truth for empty-result contract
validation.

src/routers/openml/datasets.py (1)

262-265: Docstring should list concrete raised exceptions.

Line 264 says this raises ProblemDetailError, but the function raises DatasetNotFoundError and DatasetNoAccessError. Listing concrete exceptions improves accuracy and doc clarity.

📝 Proposed docstring update

-    Raises ProblemDetailError if the dataset does not exist or the user can not access it.
+    Raises:
+        DatasetNotFoundError: If the dataset does not exist.
+        DatasetNoAccessError: If the user does not have access to the dataset.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/datasets.py` around lines 262 - 265, Update the function
docstring that currently says it raises ProblemDetailError to list the concrete
exceptions actually raised (DatasetNotFoundError and DatasetNoAccessError) so
callers know the exact error types; locate the fetch function in
src/routers/openml/datasets.py (the docstring block around the comment) and
replace or augment the "Raises" section to enumerate DatasetNotFoundError and
DatasetNoAccessError with brief descriptions, removing or clarifying the generic
ProblemDetailError reference if present.

src/core/errors.py (1)

63-66: Consider defaulting instance from the incoming request.

If exc.instance is unset, populating it from request.url.path (or full URL) improves traceability with no behavior break for existing exceptions.

🧭 Proposed enhancement

-    if exc.instance is not None:
-        content["instance"] = exc.instance
+    if exc.instance is not None:
+        content["instance"] = exc.instance
+    else:
+        content["instance"] = str(request.url.path)

Also applies to: 81-83

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/errors.py` around lines 63 - 66, If exc.instance is empty inside
problem_detail_exception_handler, populate it from the incoming request (e.g.,
request.url.path or request.url) before building the JSONResponse so the
ProblemDetail includes a traceable instance; apply the same change to the other
handler referenced around lines 81-83 (set exc.instance = exc.instance or
str(request.url.path) or similar) so existing behavior is preserved but
exceptions gain request-derived instance information.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 55: Remove the invalid duplicate line-length setting under the
[tool.ruff.lint] table: keep the existing line-length defined at the top-level
[tool.ruff] (already at line 52) and delete the `line-length = 100` entry inside
[tool.ruff.lint]; ensure no other duplicate top-level options are misplaced
under the lint subsection.

In `@src/core/errors.py`:
- Around line 79-80: The code is coercing the exception code to a string which
breaks clients expecting numeric codes; in the exception payload logic (around
exc and content["code"] in src/core/errors.py) stop calling str(...) and instead
assign the value directly (e.g., content["code"] = exc.code) only when exc.code
is not None so the payload preserves the original int or str type defined by the
model.

In `@src/routers/openml/datasets.py`:
- Around line 295-298: The user-facing error string assigned to msg in
src/routers/openml/datasets.py repeats “not processed yet”; update the msg
assignment to remove the duplicated wording and make it concise (e.g., "Dataset
not processed yet. Features are not available; please wait a few minutes.") so
the variable msg contains a single, clear sentence without repetition.
- Around line 56-58: The code checks for a missing authenticated user (variable
user) but raises AuthenticationFailedError; change this to raise
AuthenticationRequiredError to match other handlers (e.g., the use at line ~322)
so clients can consistently detect missing credentials; update the raise
expression from AuthenticationFailedError(msg) to
AuthenticationRequiredError(msg) wherever the intent is "no authenticated user"
rather than a failed authentication attempt.

In `@src/routers/openml/study.py`:
- Around line 106-109: The create flow currently does a check-then-act with
study.alias, calling database.studies.get_by_alias(...) then
database.studies.create(...), which allows a race where the insert can raise a
DB constraint error; wrap the call to database.studies.create(study, user,
expdb) in a try/except that catches the DB integrity/constraint exception (e.g.,
sqlalchemy.exc.IntegrityError or the project's DB error class) and on catching
it raise StudyAliasExistsError("Study alias already exists.") so concurrent
alias conflicts return the intended 409-style error; keep the initial pre-check
but add this exception translation around create().

In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 35-40: The test makes an unguarded call to original.json() when
new.status_code != HTTPStatus.OK which can raise JSON decode errors if the
legacy PHP API returns non-JSON (e.g., XML); modify the failure branch around
the HTTPStatus.OK check to first verify original.headers["content-type"]
indicates JSON (or attempt original.json() inside a try/except catching
JSONDecodeError) before asserting the presence of the "error" key, and if
parsing fails assert a meaningful fallback (e.g., compare content-types or mark
the response as non-JSON) so the assertion on "error" only runs when
original.json() succeeded.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 66: The comment next to the "DTZ" entry is ambiguous; update the comment
to clearly state the intention for "DTZ" (e.g., whether it is a reminder, a TODO
to enable timezone checks, or documentation) so future reviewers understand the
purpose; locate the line containing the "DTZ" string and replace `# to add` with
a concise, descriptive comment like `# TODO: enable timezone checks` or `#
reminder: add DTZ validation` depending on the intended action.

In `@src/core/errors.py`:
- Around line 63-66: If exc.instance is empty inside
problem_detail_exception_handler, populate it from the incoming request (e.g.,
request.url.path or request.url) before building the JSONResponse so the
ProblemDetail includes a traceable instance; apply the same change to the other
handler referenced around lines 81-83 (set exc.instance = exc.instance or
str(request.url.path) or similar) so existing behavior is preserved but
exceptions gain request-derived instance information.

In `@src/routers/openml/datasets.py`:
- Around line 262-265: Update the function docstring that currently says it
raises ProblemDetailError to list the concrete exceptions actually raised
(DatasetNotFoundError and DatasetNoAccessError) so callers know the exact error
types; locate the fetch function in src/routers/openml/datasets.py (the
docstring block around the comment) and replace or augment the "Raises" section
to enumerate DatasetNotFoundError and DatasetNoAccessError with brief
descriptions, removing or clarifying the generic ProblemDetailError reference if
present.

In `@tests/routers/openml/datasets_list_datasets_test.py`:
- Around line 295-304: Replace the duplicated error-format assertions inside the
"if php_is_error or py_is_error" branch with a call to the centralized helper:
keep the parity check assert php_is_error == py_is_error (using
original.status_code and response.status_code as currently written), then call
_assert_empty_result(response) instead of re-implementing content-type and JSON
error field checks (which are already validated by _assert_empty_result). This
ensures the branch uses the single source of truth for empty-result contract
validation.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72989df and 9832365.

📒 Files selected for processing (21)

pyproject.toml
src/core/errors.py
src/core/formatting.py
src/database/users.py
src/main.py
src/routers/mldcat_ap/dataset.py
src/routers/openml/datasets.py
src/routers/openml/flows.py
src/routers/openml/qualities.py
src/routers/openml/study.py
src/routers/openml/tasks.py
src/routers/openml/tasktype.py
tests/routers/openml/dataset_tag_test.py
tests/routers/openml/datasets_list_datasets_test.py
tests/routers/openml/datasets_test.py
tests/routers/openml/flows_test.py
tests/routers/openml/migration/datasets_migration_test.py
tests/routers/openml/migration/flows_migration_test.py
tests/routers/openml/qualities_test.py
tests/routers/openml/study_test.py
tests/routers/openml/task_type_test.py

💤 Files with no reviewable changes (1)

src/core/formatting.py

pyproject.toml

src/core/errors.py

src/routers/openml/datasets.py

src/routers/openml/study.py

tests/routers/openml/migration/datasets_migration_test.py

- (Comment 1) Use strict indexing for development section and add logging when missing This ensures configuration issues are surfaced early rather than silently defaulting to production behavior. - (Comment 2) Add assertion for status field in problem+json test Verifies RFC 9457 compliance by asserting status field matches HTTP status code.

The line-length configuration should only be defined in [tool.ruff], not duplicated in the [tool.ruff.lint] subsection. This fixes an invalid configuration that was accidentally introduced when enabling docstring linting.

for more information, see https://pre-commit.ci

coderabbitai

🧹 Nitpick comments (1)

tests/routers/openml/datasets_test.py (1)

192-196: Same type concern for code field; consider adding consistent assertions.

Similar to the other test, this asserts error["code"] == "274" (string) while DatasetProcessingError._default_code is 274 (integer). Verify this matches the actual serialized response.

Additionally, consider asserting on title and status fields for consistency with test_error_unknown_dataset:

♻️ Optional: Add title and status assertions for consistency

     assert response.headers["content-type"] == "application/problem+json"
     error = response.json()
     assert error["type"] == DatasetProcessingError.uri
+    assert error["title"] == "Dataset Processing Error"
+    assert error["status"] == HTTPStatus.PRECONDITION_FAILED
     assert error["code"] == "274"
     assert "No features found" in error["detail"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/datasets_test.py` around lines 192 - 196, The test
currently asserts error["code"] == "274" while
DatasetProcessingError._default_code is the integer 274; update the assertion to
match the actual serialized type (either assert error["code"] ==
DatasetProcessingError._default_code or assert error["code"] ==
str(DatasetProcessingError._default_code) depending on the API serializer) and
add consistency checks for the problem fields by asserting error["title"] ==
DatasetProcessingError.title and error["status"] ==
DatasetProcessingError.status; locate the failing assertions in the same test
(tests/routers/openml/datasets_test.py) and adjust the code comparison and add
the two new assertions for title and status.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 192-196: The test currently asserts error["code"] == "274" while
DatasetProcessingError._default_code is the integer 274; update the assertion to
match the actual serialized type (either assert error["code"] ==
DatasetProcessingError._default_code or assert error["code"] ==
str(DatasetProcessingError._default_code) depending on the API serializer) and
add consistency checks for the problem fields by asserting error["title"] ==
DatasetProcessingError.title and error["status"] ==
DatasetProcessingError.status; locate the failing assertions in the same test
(tests/routers/openml/datasets_test.py) and adjust the code comparison and add
the two new assertions for title and status.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d865d8 and d893f56.

📒 Files selected for processing (7)

pyproject.toml
src/core/conversions.py
src/database/users.py
src/schemas/datasets/dcat.py
src/schemas/datasets/mldcat_ap.py
tests/routers/openml/datasets_test.py
tests/routers/openml/task_type_test.py

✅ Files skipped from review due to trivial changes (2)

src/core/conversions.py
src/schemas/datasets/mldcat_ap.py

🚧 Files skipped from review as they are similar to previous changes (2)

pyproject.toml
tests/routers/openml/task_type_test.py

Changed from AuthenticationFailedError to AuthenticationRequiredError when user is None in the tag_dataset handler (line 58). This provides consistent error semantics across handlers - AuthenticationRequiredError indicates missing credentials, while AuthenticationFailedError indicates invalid credentials. Matches the pattern used in update_dataset_status (line 322) and other handlers. Also removed the now-unused AuthenticationFailedError import.

…nto feat/enable-docstring-linting

for more information, see https://pre-commit.ci

Added try/except around database.studies.create() to catch IntegrityError when a concurrent request creates a study with the same alias between the check and insert operations. This ensures the API consistently returns StudyAliasExistsError (409) instead of exposing database constraint errors. The initial pre-check is kept for the common case, while the exception handler catches the rare race condition scenario.

for more information, see https://pre-commit.ci

Added try/except around original.json() call in the error response branch to catch JSONDecodeError when the legacy PHP API returns non-JSON content (e.g., XML error responses). This prevents test failures when the PHP API returns unexpected content types. The test now: - Attempts to parse JSON and assert the 'error' key if successful - Silently passes if JSON parsing fails, relying on status code comparison - Maintains the assertion that both APIs return errors in the same cases

Removed str() coercion of exc.code in problem_detail_exception_handler. The code field is defined as int|str|None and should preserve its original type in the JSON response payload. This ensures numeric codes remain as integers and clients receive the expected data type.

Added module and function docstrings to: - src/config.py: Configuration management module and functions - src/core/__init__.py: Core utilities package - src/core/access.py: Access control utilities - src/core/conversions.py: Conversion utilities with improved formatting - src/core/errors.py: ProblemDetailError.__init__ docstring - src/core/formatting.py: Formatting utilities - src/database/__init__.py: Database layer package Improves code documentation and satisfies ruff's docstring linting (D100, D103, D104, D107 checks).

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/core/errors.py (1)

82-89: Type annotation may be too narrow for code field.

The content dict is typed as dict[str, str | int], but exc.code can be int | str (Line 65). While this works at runtime, the annotation could be more precise.

💡 Suggested type adjustment

-    content: dict[str, str | int] = {
+    content: dict[str, str | int | None] = {

Or alternatively, since code is only added when not None:

-    content: dict[str, str | int] = {
+    content: dict[str, str | int] = {  # code field may be int or str

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/errors.py` around lines 82 - 89, The dict annotation for the
variable named content is too narrow for the optional exc.code value; update the
type to accept any possible code type (e.g., change content: dict[str, str |
int] to content: dict[str, Any] and import Any from typing) so that adding
content["code"] = exc.code (where exc.code is int | str) is correctly typed;
adjust the annotation near the content dict and ensure exc.code usage remains
gated by the existing if exc.code is not None check.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 120-124: Update the test to expect an integer code instead of a
string: change the assertion that checks error["code"] from "111" to 111 so it
matches DatasetNotFoundError._default_code (111) and the behavior of
problem_detail_exception_handler which preserves the numeric type when
constructing the JSON response; locate the assertion in
tests/routers/openml/migration/datasets_migration_test.py around the block that
checks response.headers["content-type"] and error = response.json().

---

Nitpick comments:
In `@src/core/errors.py`:
- Around line 82-89: The dict annotation for the variable named content is too
narrow for the optional exc.code value; update the type to accept any possible
code type (e.g., change content: dict[str, str | int] to content: dict[str, Any]
and import Any from typing) so that adding content["code"] = exc.code (where
exc.code is int | str) is correctly typed; adjust the annotation near the
content dict and ensure exc.code usage remains gated by the existing if exc.code
is not None check.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d893f56 and 2518d9b.

📒 Files selected for processing (11)

enable-docstring-linting.patch
src/config.py
src/core/__init__.py
src/core/access.py
src/core/conversions.py
src/core/errors.py
src/core/formatting.py
src/database/__init__.py
src/routers/openml/datasets.py
src/routers/openml/study.py
tests/routers/openml/migration/datasets_migration_test.py

✅ Files skipped from review due to trivial changes (2)

src/database/init.py
src/core/init.py

🚧 Files skipped from review as they are similar to previous changes (3)

src/routers/openml/study.py
src/core/conversions.py
src/core/formatting.py

coderabbitai · 2026-02-28T01:16:38Z

tests/routers/openml/migration/datasets_migration_test.py

+    # RFC 9457: Python API now returns problem+json format
+    assert response.headers["content-type"] == "application/problem+json"
+    error = response.json()
+    assert error["type"] == DatasetNotFoundError.uri
+    assert error["code"] == "111"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's find DatasetNotFoundError definition fd -t f ".*\.py$" | xargs rg -l "class DatasetNotFoundError"

Repository: openml/server-api

Length of output: 79

🏁 Script executed:

#!/bin/bash # Find problem_detail_exception_handler fd -t f ".*\.py$" | xargs rg -l "problem_detail_exception_handler"

Repository: openml/server-api

Length of output: 91

🏁 Script executed:

#!/bin/bash # Check the test file to see the context around the assertion cat -n tests/routers/openml/migration/datasets_migration_test.py | sed -n '110,130p'

Repository: openml/server-api

Length of output: 827

🏁 Script executed:

#!/bin/bash # Look at DatasetNotFoundError definition cat -n src/core/errors.py | head -100

Repository: openml/server-api

Length of output: 3850

🏁 Script executed:

#!/bin/bash # Search for DatasetNotFoundError specifically rg -n "class DatasetNotFoundError" src/core/errors.py -A10

Repository: openml/server-api

Length of output: 475

🏁 Script executed:

#!/bin/bash # Look at problem_detail_exception_handler implementation rg -n "problem_detail_exception_handler" src/core/errors.py -A30

Repository: openml/server-api

Length of output: 1087

Fix assertion to expect integer code value.

The assertion on line 124 expects error["code"] == "111" (string), but DatasetNotFoundError._default_code = 111 is an int. Since problem_detail_exception_handler preserves the code type directly when building the response content (line 89), the JSON response will contain an integer, not a string. Change the assertion to assert error["code"] == 111.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/migration/datasets_migration_test.py` around lines 120 - 124, Update the test to expect an integer code instead of a string: change the assertion that checks error["code"] from "111" to 111 so it matches DatasetNotFoundError._default_code (111) and the behavior of problem_detail_exception_handler which preserves the numeric type when constructing the JSON response; locate the assertion in tests/routers/openml/migration/datasets_migration_test.py around the block that checks response.headers["content-type"] and error = response.json().

Resolved conflict in src/core/conversions.py by keeping correct docstring placement before imports.

- Fix D205 in tasks template docstring - Fix D205 and D404 in schema docstrings - Make pre-commit ruff hook pass

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/routers/openml/tasks.py (1)

37-50: ⚠️ Potential issue | 🟡 Minor

Fix fill_template docstring return type mismatch.

Line 49 says returns: str, but the function returns dict[str, JSON] (see Lines 85-95). This doc mismatch will mislead maintainers.

✏️ Proposed docstring correction

 def fill_template(
@@
 ) -> dict[str, JSON]:
     """Fill in the XML template used for task descriptions.
 
     Return the result converted to JSON.
@@
-    template, str:
-        A string represent XML, as detailed below.
+    template, str:
+        A string representing XML, as detailed below.
@@
-    returns: str
-        The template with values filled in.
+    returns: dict[str, JSON]
+        The template with values filled in and converted to JSON.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/tasks.py` around lines 37 - 50, The docstring for
fill_template incorrectly states "returns: str" while the function actually
returns a mapping converted to JSON (dict[str, JSON]). Update the return section
of the fill_template docstring to reflect the correct type (e.g., "returns:
dict[str, JSON]" or "returns: dict" with a short note that keys are input names
and values are JSON-serializable values) and adjust the brief description to say
the filled template is returned as a JSON-like dict rather than a string.

🧹 Nitpick comments (2)

src/schemas/datasets/mldcat_ap.py (1)

1-2: Avoid module-wide suppression for docstring punctuation checks.

Line 1 disables D400/D415 for the entire file. Since this can be fixed directly in the module docstring, it’s better to keep those checks active and limit ignores to what’s truly needed.

Proposed tweak

-# ruff: noqa: D101, D102, D400, D415
-"""Based on MLDCAT-AP 1.0.0: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/
+# ruff: noqa: D101, D102
+"""MLDCAT-AP 1.0.0 application profile.
+
+Reference: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/
 
 This is an application profile, aimed to extend the use of DCAT-AP,
 originally envisaged for the description of a machine learning process,
 developed in collaboration with OpenML.
 """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` around lines 1 - 2, Remove the module-wide
suppression of D400 and D415 from the ruff noqa at the top of the file and
instead fix the module docstring so it satisfies punctuation rules: edit the
module-level docstring (the triple-quoted string starting the file) to include a
proper one-line summary ending with a period and ensure the following
description lines are properly separated and punctuated; keep other needed
suppressions (D101, D102) if necessary but delete D400 and D415 from the "#
ruff: noqa" directive.

src/core/conversions.py (1)

68-77: Nit: Docstring says "lists" but function operates on any Sequence.

The function uses isinstance(obj, Sequence) which includes tuples and other sequences, not just lists. Consider updating the docstring for accuracy.

📝 Suggested docstring clarification

 def nested_remove_single_element_list(obj: Any) -> Any:
-    """Recursively unwrap single-element lists in nested data structures."""
+    """Recursively unwrap single-element sequences in nested data structures."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/conversions.py` around lines 68 - 77, The docstring for
nested_remove_single_element_list is misleading: the function checks
isinstance(obj, Sequence) so it unwraps single-element sequences (not just
lists). Update the docstring on nested_remove_single_element_list to say it
recursively unwraps single-element sequences (e.g., lists, tuples, and other
Sequence types) in nested data structures, and optionally mention that strings
are treated as atomic to avoid being iterated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/routers/openml/tasks.py`:
- Around line 37-50: The docstring for fill_template incorrectly states
"returns: str" while the function actually returns a mapping converted to JSON
(dict[str, JSON]). Update the return section of the fill_template docstring to
reflect the correct type (e.g., "returns: dict[str, JSON]" or "returns: dict"
with a short note that keys are input names and values are JSON-serializable
values) and adjust the brief description to say the filled template is returned
as a JSON-like dict rather than a string.

---

Nitpick comments:
In `@src/core/conversions.py`:
- Around line 68-77: The docstring for nested_remove_single_element_list is
misleading: the function checks isinstance(obj, Sequence) so it unwraps
single-element sequences (not just lists). Update the docstring on
nested_remove_single_element_list to say it recursively unwraps single-element
sequences (e.g., lists, tuples, and other Sequence types) in nested data
structures, and optionally mention that strings are treated as atomic to avoid
being iterated.

In `@src/schemas/datasets/mldcat_ap.py`:
- Around line 1-2: Remove the module-wide suppression of D400 and D415 from the
ruff noqa at the top of the file and instead fix the module docstring so it
satisfies punctuation rules: edit the module-level docstring (the triple-quoted
string starting the file) to include a proper one-line summary ending with a
period and ensure the following description lines are properly separated and
punctuated; keep other needed suppressions (D101, D102) if necessary but delete
D400 and D415 from the "# ruff: noqa" directive.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2518d9b and 880fc25.

📒 Files selected for processing (6)

src/core/conversions.py
src/database/datasets.py
src/database/evaluations.py
src/routers/openml/tasks.py
src/schemas/datasets/dcat.py
src/schemas/datasets/mldcat_ap.py

✅ Files skipped from review due to trivial changes (1)

src/database/evaluations.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/schemas/datasets/dcat.py

PGijsbers and others added 10 commits February 12, 2026 10:36

Allow more lenient api key through configuration

855fa05

Added RFC with some failing linting/type checks

5c551c7

make access safe even if toml doesn't have dev section

81b4f60

Simplify model definition

50ed235

Update name in docstring

dcc5fcd

Rewrite errors to separate classes

b6db690

Remove unused dictionary

5e12e74

Remove the ProblemType class as it was confusing and only for tests

4a0d5cb

Provide default codes for the different errors based on PHP codes

97d5378

chore: enable selective docstring linting (remove global D ignore)

9832365

sourcery-ai bot reviewed Feb 28, 2026

View reviewed changes

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Toton642 and others added 4 commits February 28, 2026 05:42

Merge upstream/main with conflict resolution in datasets_test.py

9d865d8

fix: remove duplicate line-length setting from [tool.ruff.lint]

0fdfd4a

The line-length configuration should only be defined in [tool.ruff], not duplicated in the [tool.ruff.lint] subsection. This fixes an invalid configuration that was accidentally introduced when enabling docstring linting.

[pre-commit.ci] auto fixes from pre-commit.com hooks

d893f56

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Toton642 and others added 9 commits February 28, 2026 06:07

Merge remote-tracking branch 'origin/feat/enable-docstring-linting' i…

8e5dbe5

…nto feat/enable-docstring-linting

[pre-commit.ci] auto fixes from pre-commit.com hooks

4dc3a5b

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

83eb22f

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

2518d9b

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Toton642 added 2 commits February 28, 2026 06:49

docs: add missing docstrings to database modules

6fcf09d

Merge remote-tracking branch 'origin/feat/enable-docstring-linting'

731c180

Resolved conflict in src/core/conversions.py by keeping correct docstring placement before imports.

fix: resolve ruff docstring failures

880fc25

- Fix D205 in tasks template docstring - Fix D205 and D404 in schema docstrings - Make pre-commit ruff hook pass

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Uh oh!

Conversation

Toton642 commented Feb 28, 2026

Summary

Changes

Motivation

Checklist

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 28, 2026 •

edited

Loading