Skip to content

chore: enable selective docstring linting (ruff)#261

Open
Toton642 wants to merge 26 commits intoopenml:mainfrom
Toton642:feat/enable-docstring-linting
Open

chore: enable selective docstring linting (ruff)#261
Toton642 wants to merge 26 commits intoopenml:mainfrom
Toton642:feat/enable-docstring-linting

Conversation

@Toton642
Copy link

Summary

Enable docstring linting gradually by removing the blanket "D" ignore from the ruff configuration in pyproject.toml. This change keeps specific D-codes that are still noisy (e.g., D203, D204, D213), and preserves per-file ignores for tests.

Changes

  • Remove global "D" from tool.ruff.lint.ignore in pyproject.toml.
  • Keep specific D rules in the ignore list to avoid immediate noise.

Motivation

This aligns with issue #212: make docstrings more consistent and improve readability/maintenance. We enable docstring checks incrementally so maintainers can opt to fix violations progressively.

Checklist

  • Small, one-file change
  • Non-breaking (only changes linter behavior)
  • Ready for maintainer review

Related: openml/server-api issue #212. (#212)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces an IntEnum-based error catalog with an RFC 9457-compatible exception framework: adds ProblemDetailError with per-instance status/code overrides, many domain-specific exception subclasses (e.g., DatasetNotFoundError, StudyNotFoundError, FlowNotFoundError, TaskNotFoundError, TagAlreadyExistsError, InternalError), and a FastAPI exception handler that emits application/problem+json; registers the handler in create_api. Updates routers and tests to raise/expect these exceptions instead of HTTPException/HTTPStatus payloads. Also adjusts linting in pyproject.toml to gradually enable docstring (D) rules while retaining a small set of ignored D-codes, and removes the old _format_error helper.

Possibly related issues

  • openml/server-api issue 212 — Both changes modify pyproject.toml Ruff configuration to stop globally ignoring D-class docstring rules and instead progressively enable docstring linting while retaining specific noisy D-codes.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: enabling selective docstring linting in the ruff configuration by removing the blanket 'D' ignore while keeping specific noisy D-codes.
Description check ✅ Passed The description is well-related to the changeset, explaining the motivation (issue #212), specific changes to pyproject.toml, and the gradual approach to enabling docstring checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new RFC 9457 error model is introduced cleanly, but many routers still raise bare HTTPException elsewhere in the codebase; consider auditing remaining HTTPException usages and migrating them to ProblemDetailError subclasses to keep error responses consistent in format and semantics.
  • Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating ProblemDetailError subclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new RFC 9457 error model is introduced cleanly, but many routers still raise bare `HTTPException` elsewhere in the codebase; consider auditing remaining `HTTPException` usages and migrating them to `ProblemDetailError` subclasses to keep error responses consistent in format and semantics.
- Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating `ProblemDetailError` subclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.

## Individual Comments

### Comment 1
<location path="src/database/users.py" line_range="14-13" />
<code_context>
 # If `allow_test_api_keys` is set, the key may also be one of `normaluser`,
 # `normaluser2`, or `abc` (admin).
 api_key_pattern = r"^[0-9a-fA-F]{32}$"
-if load_configuration()["development"].get("allow_test_api_keys"):
+if load_configuration().get("development", {}).get("allow_test_api_keys"):
     api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using a default `{}` here may unintentionally hide configuration issues

Previously, a missing "development" section would raise, exposing misconfiguration early. With `.get("development", {})`, a missing section now silently falls back to production-like behavior (no test API keys), which can be harder to diagnose.

If a missing "development" section should be treated as an error, consider keeping the strict indexing (`load_configuration()["development"]`) and only using `.get("allow_test_api_keys")` on that, or at least emit a log when the section is absent.
</issue_to_address>

### Comment 2
<location path="tests/routers/openml/task_type_test.py" line_range="41-46" />
<code_context>
 ) -> None:
-    assert response.status_code == HTTPStatus.PRECONDITION_FAILED
-    assert response.json()["detail"] == {"code": "372", "message": "No results"}
+    assert response.status_code == HTTPStatus.NOT_FOUND
+    assert response.headers["content-type"] == "application/problem+json"
+    error = response.json()
</code_context>
<issue_to_address>
**nitpick (testing):** Consider also asserting the task type error `status` field in the problem+json body

Since the problem+json body also includes a `status` field, it would be helpful to assert `error["status"] == HTTPStatus.NOT_FOUND` to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 41 to 46
assert response.status_code == HTTPStatus.NOT_FOUND
assert response.headers["content-type"] == "application/problem+json"
error = response.json()
assert error["type"] == TaskTypeNotFoundError.uri
assert error["code"] == "241"
assert "Unknown task type" in error["detail"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (testing): Consider also asserting the task type error status field in the problem+json body

Since the problem+json body also includes a status field, it would be helpful to assert error["status"] == HTTPStatus.NOT_FOUND to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
pyproject.toml (1)

66-66: Clarify the DTZ comment.

The comment # to add is ambiguous. Consider clarifying what this means—is it a reminder to enable DTZ checks in the future, or something else? A more descriptive comment like # TODO: enable timezone checks would be clearer.

Proposed fix
-    "DTZ", # to add
+    "DTZ",  # TODO: enable timezone-aware datetime checks
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 66, The comment next to the "DTZ" entry is ambiguous;
update the comment to clearly state the intention for "DTZ" (e.g., whether it is
a reminder, a TODO to enable timezone checks, or documentation) so future
reviewers understand the purpose; locate the line containing the "DTZ" string
and replace `# to add` with a concise, descriptive comment like `# TODO: enable
timezone checks` or `# reminder: add DTZ validation` depending on the intended
action.
tests/routers/openml/datasets_list_datasets_test.py (1)

295-304: Reuse _assert_empty_result to avoid assertion drift.

This branch re-implements the same checks already centralized in _assert_empty_result(response). Reusing the helper will keep contracts synchronized in one place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/datasets_list_datasets_test.py` around lines 295 - 304,
Replace the duplicated error-format assertions inside the "if php_is_error or
py_is_error" branch with a call to the centralized helper: keep the parity check
assert php_is_error == py_is_error (using original.status_code and
response.status_code as currently written), then call
_assert_empty_result(response) instead of re-implementing content-type and JSON
error field checks (which are already validated by _assert_empty_result). This
ensures the branch uses the single source of truth for empty-result contract
validation.
src/routers/openml/datasets.py (1)

262-265: Docstring should list concrete raised exceptions.

Line 264 says this raises ProblemDetailError, but the function raises DatasetNotFoundError and DatasetNoAccessError. Listing concrete exceptions improves accuracy and doc clarity.

📝 Proposed docstring update
-    Raises ProblemDetailError if the dataset does not exist or the user can not access it.
+    Raises:
+        DatasetNotFoundError: If the dataset does not exist.
+        DatasetNoAccessError: If the user does not have access to the dataset.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/datasets.py` around lines 262 - 265, Update the function
docstring that currently says it raises ProblemDetailError to list the concrete
exceptions actually raised (DatasetNotFoundError and DatasetNoAccessError) so
callers know the exact error types; locate the fetch function in
src/routers/openml/datasets.py (the docstring block around the comment) and
replace or augment the "Raises" section to enumerate DatasetNotFoundError and
DatasetNoAccessError with brief descriptions, removing or clarifying the generic
ProblemDetailError reference if present.
src/core/errors.py (1)

63-66: Consider defaulting instance from the incoming request.

If exc.instance is unset, populating it from request.url.path (or full URL) improves traceability with no behavior break for existing exceptions.

🧭 Proposed enhancement
-    if exc.instance is not None:
-        content["instance"] = exc.instance
+    if exc.instance is not None:
+        content["instance"] = exc.instance
+    else:
+        content["instance"] = str(request.url.path)

Also applies to: 81-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/errors.py` around lines 63 - 66, If exc.instance is empty inside
problem_detail_exception_handler, populate it from the incoming request (e.g.,
request.url.path or request.url) before building the JSONResponse so the
ProblemDetail includes a traceable instance; apply the same change to the other
handler referenced around lines 81-83 (set exc.instance = exc.instance or
str(request.url.path) or similar) so existing behavior is preserved but
exceptions gain request-derived instance information.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 55: Remove the invalid duplicate line-length setting under the
[tool.ruff.lint] table: keep the existing line-length defined at the top-level
[tool.ruff] (already at line 52) and delete the `line-length = 100` entry inside
[tool.ruff.lint]; ensure no other duplicate top-level options are misplaced
under the lint subsection.

In `@src/core/errors.py`:
- Around line 79-80: The code is coercing the exception code to a string which
breaks clients expecting numeric codes; in the exception payload logic (around
exc and content["code"] in src/core/errors.py) stop calling str(...) and instead
assign the value directly (e.g., content["code"] = exc.code) only when exc.code
is not None so the payload preserves the original int or str type defined by the
model.

In `@src/routers/openml/datasets.py`:
- Around line 295-298: The user-facing error string assigned to msg in
src/routers/openml/datasets.py repeats “not processed yet”; update the msg
assignment to remove the duplicated wording and make it concise (e.g., "Dataset
not processed yet. Features are not available; please wait a few minutes.") so
the variable msg contains a single, clear sentence without repetition.
- Around line 56-58: The code checks for a missing authenticated user (variable
user) but raises AuthenticationFailedError; change this to raise
AuthenticationRequiredError to match other handlers (e.g., the use at line ~322)
so clients can consistently detect missing credentials; update the raise
expression from AuthenticationFailedError(msg) to
AuthenticationRequiredError(msg) wherever the intent is "no authenticated user"
rather than a failed authentication attempt.

In `@src/routers/openml/study.py`:
- Around line 106-109: The create flow currently does a check-then-act with
study.alias, calling database.studies.get_by_alias(...) then
database.studies.create(...), which allows a race where the insert can raise a
DB constraint error; wrap the call to database.studies.create(study, user,
expdb) in a try/except that catches the DB integrity/constraint exception (e.g.,
sqlalchemy.exc.IntegrityError or the project's DB error class) and on catching
it raise StudyAliasExistsError("Study alias already exists.") so concurrent
alias conflicts return the intended 409-style error; keep the initial pre-check
but add this exception translation around create().

In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 35-40: The test makes an unguarded call to original.json() when
new.status_code != HTTPStatus.OK which can raise JSON decode errors if the
legacy PHP API returns non-JSON (e.g., XML); modify the failure branch around
the HTTPStatus.OK check to first verify original.headers["content-type"]
indicates JSON (or attempt original.json() inside a try/except catching
JSONDecodeError) before asserting the presence of the "error" key, and if
parsing fails assert a meaningful fallback (e.g., compare content-types or mark
the response as non-JSON) so the assertion on "error" only runs when
original.json() succeeded.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 66: The comment next to the "DTZ" entry is ambiguous; update the comment
to clearly state the intention for "DTZ" (e.g., whether it is a reminder, a TODO
to enable timezone checks, or documentation) so future reviewers understand the
purpose; locate the line containing the "DTZ" string and replace `# to add` with
a concise, descriptive comment like `# TODO: enable timezone checks` or `#
reminder: add DTZ validation` depending on the intended action.

In `@src/core/errors.py`:
- Around line 63-66: If exc.instance is empty inside
problem_detail_exception_handler, populate it from the incoming request (e.g.,
request.url.path or request.url) before building the JSONResponse so the
ProblemDetail includes a traceable instance; apply the same change to the other
handler referenced around lines 81-83 (set exc.instance = exc.instance or
str(request.url.path) or similar) so existing behavior is preserved but
exceptions gain request-derived instance information.

In `@src/routers/openml/datasets.py`:
- Around line 262-265: Update the function docstring that currently says it
raises ProblemDetailError to list the concrete exceptions actually raised
(DatasetNotFoundError and DatasetNoAccessError) so callers know the exact error
types; locate the fetch function in src/routers/openml/datasets.py (the
docstring block around the comment) and replace or augment the "Raises" section
to enumerate DatasetNotFoundError and DatasetNoAccessError with brief
descriptions, removing or clarifying the generic ProblemDetailError reference if
present.

In `@tests/routers/openml/datasets_list_datasets_test.py`:
- Around line 295-304: Replace the duplicated error-format assertions inside the
"if php_is_error or py_is_error" branch with a call to the centralized helper:
keep the parity check assert php_is_error == py_is_error (using
original.status_code and response.status_code as currently written), then call
_assert_empty_result(response) instead of re-implementing content-type and JSON
error field checks (which are already validated by _assert_empty_result). This
ensures the branch uses the single source of truth for empty-result contract
validation.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72989df and 9832365.

📒 Files selected for processing (21)
  • pyproject.toml
  • src/core/errors.py
  • src/core/formatting.py
  • src/database/users.py
  • src/main.py
  • src/routers/mldcat_ap/dataset.py
  • src/routers/openml/datasets.py
  • src/routers/openml/flows.py
  • src/routers/openml/qualities.py
  • src/routers/openml/study.py
  • src/routers/openml/tasks.py
  • src/routers/openml/tasktype.py
  • tests/routers/openml/dataset_tag_test.py
  • tests/routers/openml/datasets_list_datasets_test.py
  • tests/routers/openml/datasets_test.py
  • tests/routers/openml/flows_test.py
  • tests/routers/openml/migration/datasets_migration_test.py
  • tests/routers/openml/migration/flows_migration_test.py
  • tests/routers/openml/qualities_test.py
  • tests/routers/openml/study_test.py
  • tests/routers/openml/task_type_test.py
💤 Files with no reviewable changes (1)
  • src/core/formatting.py

Toton642 and others added 4 commits February 28, 2026 05:42
- (Comment 1) Use strict indexing for development section and add logging when missing
  This ensures configuration issues are surfaced early rather than silently defaulting
  to production behavior.

- (Comment 2) Add assertion for status field in problem+json test
  Verifies RFC 9457 compliance by asserting status field matches HTTP status code.
The line-length configuration should only be defined in [tool.ruff], not
duplicated in the [tool.ruff.lint] subsection. This fixes an invalid
configuration that was accidentally introduced when enabling docstring linting.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/routers/openml/datasets_test.py (1)

192-196: Same type concern for code field; consider adding consistent assertions.

Similar to the other test, this asserts error["code"] == "274" (string) while DatasetProcessingError._default_code is 274 (integer). Verify this matches the actual serialized response.

Additionally, consider asserting on title and status fields for consistency with test_error_unknown_dataset:

♻️ Optional: Add title and status assertions for consistency
     assert response.headers["content-type"] == "application/problem+json"
     error = response.json()
     assert error["type"] == DatasetProcessingError.uri
+    assert error["title"] == "Dataset Processing Error"
+    assert error["status"] == HTTPStatus.PRECONDITION_FAILED
     assert error["code"] == "274"
     assert "No features found" in error["detail"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/datasets_test.py` around lines 192 - 196, The test
currently asserts error["code"] == "274" while
DatasetProcessingError._default_code is the integer 274; update the assertion to
match the actual serialized type (either assert error["code"] ==
DatasetProcessingError._default_code or assert error["code"] ==
str(DatasetProcessingError._default_code) depending on the API serializer) and
add consistency checks for the problem fields by asserting error["title"] ==
DatasetProcessingError.title and error["status"] ==
DatasetProcessingError.status; locate the failing assertions in the same test
(tests/routers/openml/datasets_test.py) and adjust the code comparison and add
the two new assertions for title and status.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 192-196: The test currently asserts error["code"] == "274" while
DatasetProcessingError._default_code is the integer 274; update the assertion to
match the actual serialized type (either assert error["code"] ==
DatasetProcessingError._default_code or assert error["code"] ==
str(DatasetProcessingError._default_code) depending on the API serializer) and
add consistency checks for the problem fields by asserting error["title"] ==
DatasetProcessingError.title and error["status"] ==
DatasetProcessingError.status; locate the failing assertions in the same test
(tests/routers/openml/datasets_test.py) and adjust the code comparison and add
the two new assertions for title and status.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d865d8 and d893f56.

📒 Files selected for processing (7)
  • pyproject.toml
  • src/core/conversions.py
  • src/database/users.py
  • src/schemas/datasets/dcat.py
  • src/schemas/datasets/mldcat_ap.py
  • tests/routers/openml/datasets_test.py
  • tests/routers/openml/task_type_test.py
✅ Files skipped from review due to trivial changes (2)
  • src/core/conversions.py
  • src/schemas/datasets/mldcat_ap.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • tests/routers/openml/task_type_test.py

Toton642 and others added 9 commits February 28, 2026 06:07
Changed from AuthenticationFailedError to AuthenticationRequiredError when
user is None in the tag_dataset handler (line 58). This provides consistent
error semantics across handlers - AuthenticationRequiredError indicates
missing credentials, while AuthenticationFailedError indicates invalid
credentials. Matches the pattern used in update_dataset_status (line 322)
and other handlers.

Also removed the now-unused AuthenticationFailedError import.
Added try/except around database.studies.create() to catch IntegrityError
when a concurrent request creates a study with the same alias between the
check and insert operations. This ensures the API consistently returns
StudyAliasExistsError (409) instead of exposing database constraint errors.

The initial pre-check is kept for the common case, while the exception
handler catches the rare race condition scenario.
Added try/except around original.json() call in the error response branch
to catch JSONDecodeError when the legacy PHP API returns non-JSON content
(e.g., XML error responses). This prevents test failures when the PHP API
returns unexpected content types.

The test now:
- Attempts to parse JSON and assert the 'error' key if successful
- Silently passes if JSON parsing fails, relying on status code comparison
- Maintains the assertion that both APIs return errors in the same cases
Removed str() coercion of exc.code in problem_detail_exception_handler. The code field is defined as int|str|None and should preserve its original type in the JSON response payload.

This ensures numeric codes remain as integers and clients receive the expected data type.
Added module and function docstrings to:
- src/config.py: Configuration management module and functions
- src/core/__init__.py: Core utilities package
- src/core/access.py: Access control utilities
- src/core/conversions.py: Conversion utilities with improved formatting
- src/core/errors.py: ProblemDetailError.__init__ docstring
- src/core/formatting.py: Formatting utilities
- src/database/__init__.py: Database layer package

Improves code documentation and satisfies ruff's docstring linting (D100, D103, D104, D107 checks).
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/core/errors.py (1)

82-89: Type annotation may be too narrow for code field.

The content dict is typed as dict[str, str | int], but exc.code can be int | str (Line 65). While this works at runtime, the annotation could be more precise.

💡 Suggested type adjustment
-    content: dict[str, str | int] = {
+    content: dict[str, str | int | None] = {

Or alternatively, since code is only added when not None:

-    content: dict[str, str | int] = {
+    content: dict[str, str | int] = {  # code field may be int or str
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/errors.py` around lines 82 - 89, The dict annotation for the
variable named content is too narrow for the optional exc.code value; update the
type to accept any possible code type (e.g., change content: dict[str, str |
int] to content: dict[str, Any] and import Any from typing) so that adding
content["code"] = exc.code (where exc.code is int | str) is correctly typed;
adjust the annotation near the content dict and ensure exc.code usage remains
gated by the existing if exc.code is not None check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 120-124: Update the test to expect an integer code instead of a
string: change the assertion that checks error["code"] from "111" to 111 so it
matches DatasetNotFoundError._default_code (111) and the behavior of
problem_detail_exception_handler which preserves the numeric type when
constructing the JSON response; locate the assertion in
tests/routers/openml/migration/datasets_migration_test.py around the block that
checks response.headers["content-type"] and error = response.json().

---

Nitpick comments:
In `@src/core/errors.py`:
- Around line 82-89: The dict annotation for the variable named content is too
narrow for the optional exc.code value; update the type to accept any possible
code type (e.g., change content: dict[str, str | int] to content: dict[str, Any]
and import Any from typing) so that adding content["code"] = exc.code (where
exc.code is int | str) is correctly typed; adjust the annotation near the
content dict and ensure exc.code usage remains gated by the existing if exc.code
is not None check.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d893f56 and 2518d9b.

📒 Files selected for processing (11)
  • enable-docstring-linting.patch
  • src/config.py
  • src/core/__init__.py
  • src/core/access.py
  • src/core/conversions.py
  • src/core/errors.py
  • src/core/formatting.py
  • src/database/__init__.py
  • src/routers/openml/datasets.py
  • src/routers/openml/study.py
  • tests/routers/openml/migration/datasets_migration_test.py
✅ Files skipped from review due to trivial changes (2)
  • src/database/init.py
  • src/core/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/routers/openml/study.py
  • src/core/conversions.py
  • src/core/formatting.py

Comment on lines +120 to +124
# RFC 9457: Python API now returns problem+json format
assert response.headers["content-type"] == "application/problem+json"
error = response.json()
assert error["type"] == DatasetNotFoundError.uri
assert error["code"] == "111"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find DatasetNotFoundError definition
fd -t f ".*\.py$" | xargs rg -l "class DatasetNotFoundError"

Repository: openml/server-api

Length of output: 79


🏁 Script executed:

#!/bin/bash
# Find problem_detail_exception_handler
fd -t f ".*\.py$" | xargs rg -l "problem_detail_exception_handler"

Repository: openml/server-api

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Check the test file to see the context around the assertion
cat -n tests/routers/openml/migration/datasets_migration_test.py | sed -n '110,130p'

Repository: openml/server-api

Length of output: 827


🏁 Script executed:

#!/bin/bash
# Look at DatasetNotFoundError definition
cat -n src/core/errors.py | head -100

Repository: openml/server-api

Length of output: 3850


🏁 Script executed:

#!/bin/bash
# Search for DatasetNotFoundError specifically
rg -n "class DatasetNotFoundError" src/core/errors.py -A10

Repository: openml/server-api

Length of output: 475


🏁 Script executed:

#!/bin/bash
# Look at problem_detail_exception_handler implementation
rg -n "problem_detail_exception_handler" src/core/errors.py -A30

Repository: openml/server-api

Length of output: 1087


Fix assertion to expect integer code value.

The assertion on line 124 expects error["code"] == "111" (string), but DatasetNotFoundError._default_code = 111 is an int. Since problem_detail_exception_handler preserves the code type directly when building the response content (line 89), the JSON response will contain an integer, not a string. Change the assertion to assert error["code"] == 111.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/migration/datasets_migration_test.py` around lines 120 -
124, Update the test to expect an integer code instead of a string: change the
assertion that checks error["code"] from "111" to 111 so it matches
DatasetNotFoundError._default_code (111) and the behavior of
problem_detail_exception_handler which preserves the numeric type when
constructing the JSON response; locate the assertion in
tests/routers/openml/migration/datasets_migration_test.py around the block that
checks response.headers["content-type"] and error = response.json().

Resolved conflict in src/core/conversions.py by keeping correct docstring placement before imports.
- Fix D205 in tasks template docstring
- Fix D205 and D404 in schema docstrings
- Make pre-commit ruff hook pass
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/routers/openml/tasks.py (1)

37-50: ⚠️ Potential issue | 🟡 Minor

Fix fill_template docstring return type mismatch.

Line 49 says returns: str, but the function returns dict[str, JSON] (see Lines 85-95). This doc mismatch will mislead maintainers.

✏️ Proposed docstring correction
 def fill_template(
@@
 ) -> dict[str, JSON]:
     """Fill in the XML template used for task descriptions.
 
     Return the result converted to JSON.
@@
-    template, str:
-        A string represent XML, as detailed below.
+    template, str:
+        A string representing XML, as detailed below.
@@
-    returns: str
-        The template with values filled in.
+    returns: dict[str, JSON]
+        The template with values filled in and converted to JSON.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/tasks.py` around lines 37 - 50, The docstring for
fill_template incorrectly states "returns: str" while the function actually
returns a mapping converted to JSON (dict[str, JSON]). Update the return section
of the fill_template docstring to reflect the correct type (e.g., "returns:
dict[str, JSON]" or "returns: dict" with a short note that keys are input names
and values are JSON-serializable values) and adjust the brief description to say
the filled template is returned as a JSON-like dict rather than a string.
🧹 Nitpick comments (2)
src/schemas/datasets/mldcat_ap.py (1)

1-2: Avoid module-wide suppression for docstring punctuation checks.

Line 1 disables D400/D415 for the entire file. Since this can be fixed directly in the module docstring, it’s better to keep those checks active and limit ignores to what’s truly needed.

Proposed tweak
-# ruff: noqa: D101, D102, D400, D415
-"""Based on MLDCAT-AP 1.0.0: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/
+# ruff: noqa: D101, D102
+"""MLDCAT-AP 1.0.0 application profile.
+
+Reference: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/
 
 This is an application profile, aimed to extend the use of DCAT-AP,
 originally envisaged for the description of a machine learning process,
 developed in collaboration with OpenML.
 """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/schemas/datasets/mldcat_ap.py` around lines 1 - 2, Remove the module-wide
suppression of D400 and D415 from the ruff noqa at the top of the file and
instead fix the module docstring so it satisfies punctuation rules: edit the
module-level docstring (the triple-quoted string starting the file) to include a
proper one-line summary ending with a period and ensure the following
description lines are properly separated and punctuated; keep other needed
suppressions (D101, D102) if necessary but delete D400 and D415 from the "#
ruff: noqa" directive.
src/core/conversions.py (1)

68-77: Nit: Docstring says "lists" but function operates on any Sequence.

The function uses isinstance(obj, Sequence) which includes tuples and other sequences, not just lists. Consider updating the docstring for accuracy.

📝 Suggested docstring clarification
 def nested_remove_single_element_list(obj: Any) -> Any:
-    """Recursively unwrap single-element lists in nested data structures."""
+    """Recursively unwrap single-element sequences in nested data structures."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/conversions.py` around lines 68 - 77, The docstring for
nested_remove_single_element_list is misleading: the function checks
isinstance(obj, Sequence) so it unwraps single-element sequences (not just
lists). Update the docstring on nested_remove_single_element_list to say it
recursively unwraps single-element sequences (e.g., lists, tuples, and other
Sequence types) in nested data structures, and optionally mention that strings
are treated as atomic to avoid being iterated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/routers/openml/tasks.py`:
- Around line 37-50: The docstring for fill_template incorrectly states
"returns: str" while the function actually returns a mapping converted to JSON
(dict[str, JSON]). Update the return section of the fill_template docstring to
reflect the correct type (e.g., "returns: dict[str, JSON]" or "returns: dict"
with a short note that keys are input names and values are JSON-serializable
values) and adjust the brief description to say the filled template is returned
as a JSON-like dict rather than a string.

---

Nitpick comments:
In `@src/core/conversions.py`:
- Around line 68-77: The docstring for nested_remove_single_element_list is
misleading: the function checks isinstance(obj, Sequence) so it unwraps
single-element sequences (not just lists). Update the docstring on
nested_remove_single_element_list to say it recursively unwraps single-element
sequences (e.g., lists, tuples, and other Sequence types) in nested data
structures, and optionally mention that strings are treated as atomic to avoid
being iterated.

In `@src/schemas/datasets/mldcat_ap.py`:
- Around line 1-2: Remove the module-wide suppression of D400 and D415 from the
ruff noqa at the top of the file and instead fix the module docstring so it
satisfies punctuation rules: edit the module-level docstring (the triple-quoted
string starting the file) to include a proper one-line summary ending with a
period and ensure the following description lines are properly separated and
punctuated; keep other needed suppressions (D101, D102) if necessary but delete
D400 and D415 from the "# ruff: noqa" directive.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2518d9b and 880fc25.

📒 Files selected for processing (6)
  • src/core/conversions.py
  • src/database/datasets.py
  • src/database/evaluations.py
  • src/routers/openml/tasks.py
  • src/schemas/datasets/dcat.py
  • src/schemas/datasets/mldcat_ap.py
✅ Files skipped from review due to trivial changes (1)
  • src/database/evaluations.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/schemas/datasets/dcat.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants