chore: enable selective docstring linting (ruff)#261
chore: enable selective docstring linting (ruff)#261Toton642 wants to merge 26 commits intoopenml:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces an IntEnum-based error catalog with an RFC 9457-compatible exception framework: adds ProblemDetailError with per-instance status/code overrides, many domain-specific exception subclasses (e.g., DatasetNotFoundError, StudyNotFoundError, FlowNotFoundError, TaskNotFoundError, TagAlreadyExistsError, InternalError), and a FastAPI exception handler that emits application/problem+json; registers the handler in create_api. Updates routers and tests to raise/expect these exceptions instead of HTTPException/HTTPStatus payloads. Also adjusts linting in pyproject.toml to gradually enable docstring (D) rules while retaining a small set of ignored D-codes, and removes the old _format_error helper. Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The new RFC 9457 error model is introduced cleanly, but many routers still raise bare
HTTPExceptionelsewhere in the codebase; consider auditing remainingHTTPExceptionusages and migrating them toProblemDetailErrorsubclasses to keep error responses consistent in format and semantics. - Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating
ProblemDetailErrorsubclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new RFC 9457 error model is introduced cleanly, but many routers still raise bare `HTTPException` elsewhere in the codebase; consider auditing remaining `HTTPException` usages and migrating them to `ProblemDetailError` subclasses to keep error responses consistent in format and semantics.
- Several error messages and legacy codes (e.g., the various 'Unknown dataset.'/no-results cases) are now repeated across routers when instantiating `ProblemDetailError` subclasses; consider adding small factory helpers or classmethods on the error types to centralize these messages and codes and avoid divergence over time.
## Individual Comments
### Comment 1
<location path="src/database/users.py" line_range="14-13" />
<code_context>
# If `allow_test_api_keys` is set, the key may also be one of `normaluser`,
# `normaluser2`, or `abc` (admin).
api_key_pattern = r"^[0-9a-fA-F]{32}$"
-if load_configuration()["development"].get("allow_test_api_keys"):
+if load_configuration().get("development", {}).get("allow_test_api_keys"):
api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using a default `{}` here may unintentionally hide configuration issues
Previously, a missing "development" section would raise, exposing misconfiguration early. With `.get("development", {})`, a missing section now silently falls back to production-like behavior (no test API keys), which can be harder to diagnose.
If a missing "development" section should be treated as an error, consider keeping the strict indexing (`load_configuration()["development"]`) and only using `.get("allow_test_api_keys")` on that, or at least emit a log when the section is absent.
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/task_type_test.py" line_range="41-46" />
<code_context>
) -> None:
- assert response.status_code == HTTPStatus.PRECONDITION_FAILED
- assert response.json()["detail"] == {"code": "372", "message": "No results"}
+ assert response.status_code == HTTPStatus.NOT_FOUND
+ assert response.headers["content-type"] == "application/problem+json"
+ error = response.json()
</code_context>
<issue_to_address>
**nitpick (testing):** Consider also asserting the task type error `status` field in the problem+json body
Since the problem+json body also includes a `status` field, it would be helpful to assert `error["status"] == HTTPStatus.NOT_FOUND` to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| assert response.status_code == HTTPStatus.NOT_FOUND | ||
| assert response.headers["content-type"] == "application/problem+json" | ||
| error = response.json() | ||
| assert error["type"] == TaskTypeNotFoundError.uri | ||
| assert error["code"] == "241" | ||
| assert "Unknown task type" in error["detail"] |
There was a problem hiding this comment.
nitpick (testing): Consider also asserting the task type error status field in the problem+json body
Since the problem+json body also includes a status field, it would be helpful to assert error["status"] == HTTPStatus.NOT_FOUND to keep the body shape consistent with RFC 9457 and in sync with the HTTP status code.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
pyproject.toml (1)
66-66: Clarify the DTZ comment.The comment
# to addis ambiguous. Consider clarifying what this means—is it a reminder to enable DTZ checks in the future, or something else? A more descriptive comment like# TODO: enable timezone checkswould be clearer.Proposed fix
- "DTZ", # to add + "DTZ", # TODO: enable timezone-aware datetime checks🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 66, The comment next to the "DTZ" entry is ambiguous; update the comment to clearly state the intention for "DTZ" (e.g., whether it is a reminder, a TODO to enable timezone checks, or documentation) so future reviewers understand the purpose; locate the line containing the "DTZ" string and replace `# to add` with a concise, descriptive comment like `# TODO: enable timezone checks` or `# reminder: add DTZ validation` depending on the intended action.tests/routers/openml/datasets_list_datasets_test.py (1)
295-304: Reuse_assert_empty_resultto avoid assertion drift.This branch re-implements the same checks already centralized in
_assert_empty_result(response). Reusing the helper will keep contracts synchronized in one place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/datasets_list_datasets_test.py` around lines 295 - 304, Replace the duplicated error-format assertions inside the "if php_is_error or py_is_error" branch with a call to the centralized helper: keep the parity check assert php_is_error == py_is_error (using original.status_code and response.status_code as currently written), then call _assert_empty_result(response) instead of re-implementing content-type and JSON error field checks (which are already validated by _assert_empty_result). This ensures the branch uses the single source of truth for empty-result contract validation.src/routers/openml/datasets.py (1)
262-265: Docstring should list concrete raised exceptions.Line 264 says this raises
ProblemDetailError, but the function raisesDatasetNotFoundErrorandDatasetNoAccessError. Listing concrete exceptions improves accuracy and doc clarity.📝 Proposed docstring update
- Raises ProblemDetailError if the dataset does not exist or the user can not access it. + Raises: + DatasetNotFoundError: If the dataset does not exist. + DatasetNoAccessError: If the user does not have access to the dataset.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/datasets.py` around lines 262 - 265, Update the function docstring that currently says it raises ProblemDetailError to list the concrete exceptions actually raised (DatasetNotFoundError and DatasetNoAccessError) so callers know the exact error types; locate the fetch function in src/routers/openml/datasets.py (the docstring block around the comment) and replace or augment the "Raises" section to enumerate DatasetNotFoundError and DatasetNoAccessError with brief descriptions, removing or clarifying the generic ProblemDetailError reference if present.src/core/errors.py (1)
63-66: Consider defaultinginstancefrom the incoming request.If
exc.instanceis unset, populating it fromrequest.url.path(or full URL) improves traceability with no behavior break for existing exceptions.🧭 Proposed enhancement
- if exc.instance is not None: - content["instance"] = exc.instance + if exc.instance is not None: + content["instance"] = exc.instance + else: + content["instance"] = str(request.url.path)Also applies to: 81-83
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/errors.py` around lines 63 - 66, If exc.instance is empty inside problem_detail_exception_handler, populate it from the incoming request (e.g., request.url.path or request.url) before building the JSONResponse so the ProblemDetail includes a traceable instance; apply the same change to the other handler referenced around lines 81-83 (set exc.instance = exc.instance or str(request.url.path) or similar) so existing behavior is preserved but exceptions gain request-derived instance information.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Line 55: Remove the invalid duplicate line-length setting under the
[tool.ruff.lint] table: keep the existing line-length defined at the top-level
[tool.ruff] (already at line 52) and delete the `line-length = 100` entry inside
[tool.ruff.lint]; ensure no other duplicate top-level options are misplaced
under the lint subsection.
In `@src/core/errors.py`:
- Around line 79-80: The code is coercing the exception code to a string which
breaks clients expecting numeric codes; in the exception payload logic (around
exc and content["code"] in src/core/errors.py) stop calling str(...) and instead
assign the value directly (e.g., content["code"] = exc.code) only when exc.code
is not None so the payload preserves the original int or str type defined by the
model.
In `@src/routers/openml/datasets.py`:
- Around line 295-298: The user-facing error string assigned to msg in
src/routers/openml/datasets.py repeats “not processed yet”; update the msg
assignment to remove the duplicated wording and make it concise (e.g., "Dataset
not processed yet. Features are not available; please wait a few minutes.") so
the variable msg contains a single, clear sentence without repetition.
- Around line 56-58: The code checks for a missing authenticated user (variable
user) but raises AuthenticationFailedError; change this to raise
AuthenticationRequiredError to match other handlers (e.g., the use at line ~322)
so clients can consistently detect missing credentials; update the raise
expression from AuthenticationFailedError(msg) to
AuthenticationRequiredError(msg) wherever the intent is "no authenticated user"
rather than a failed authentication attempt.
In `@src/routers/openml/study.py`:
- Around line 106-109: The create flow currently does a check-then-act with
study.alias, calling database.studies.get_by_alias(...) then
database.studies.create(...), which allows a race where the insert can raise a
DB constraint error; wrap the call to database.studies.create(study, user,
expdb) in a try/except that catches the DB integrity/constraint exception (e.g.,
sqlalchemy.exc.IntegrityError or the project's DB error class) and on catching
it raise StudyAliasExistsError("Study alias already exists.") so concurrent
alias conflicts return the intended 409-style error; keep the initial pre-check
but add this exception translation around create().
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 35-40: The test makes an unguarded call to original.json() when
new.status_code != HTTPStatus.OK which can raise JSON decode errors if the
legacy PHP API returns non-JSON (e.g., XML); modify the failure branch around
the HTTPStatus.OK check to first verify original.headers["content-type"]
indicates JSON (or attempt original.json() inside a try/except catching
JSONDecodeError) before asserting the presence of the "error" key, and if
parsing fails assert a meaningful fallback (e.g., compare content-types or mark
the response as non-JSON) so the assertion on "error" only runs when
original.json() succeeded.
---
Nitpick comments:
In `@pyproject.toml`:
- Line 66: The comment next to the "DTZ" entry is ambiguous; update the comment
to clearly state the intention for "DTZ" (e.g., whether it is a reminder, a TODO
to enable timezone checks, or documentation) so future reviewers understand the
purpose; locate the line containing the "DTZ" string and replace `# to add` with
a concise, descriptive comment like `# TODO: enable timezone checks` or `#
reminder: add DTZ validation` depending on the intended action.
In `@src/core/errors.py`:
- Around line 63-66: If exc.instance is empty inside
problem_detail_exception_handler, populate it from the incoming request (e.g.,
request.url.path or request.url) before building the JSONResponse so the
ProblemDetail includes a traceable instance; apply the same change to the other
handler referenced around lines 81-83 (set exc.instance = exc.instance or
str(request.url.path) or similar) so existing behavior is preserved but
exceptions gain request-derived instance information.
In `@src/routers/openml/datasets.py`:
- Around line 262-265: Update the function docstring that currently says it
raises ProblemDetailError to list the concrete exceptions actually raised
(DatasetNotFoundError and DatasetNoAccessError) so callers know the exact error
types; locate the fetch function in src/routers/openml/datasets.py (the
docstring block around the comment) and replace or augment the "Raises" section
to enumerate DatasetNotFoundError and DatasetNoAccessError with brief
descriptions, removing or clarifying the generic ProblemDetailError reference if
present.
In `@tests/routers/openml/datasets_list_datasets_test.py`:
- Around line 295-304: Replace the duplicated error-format assertions inside the
"if php_is_error or py_is_error" branch with a call to the centralized helper:
keep the parity check assert php_is_error == py_is_error (using
original.status_code and response.status_code as currently written), then call
_assert_empty_result(response) instead of re-implementing content-type and JSON
error field checks (which are already validated by _assert_empty_result). This
ensures the branch uses the single source of truth for empty-result contract
validation.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
pyproject.tomlsrc/core/errors.pysrc/core/formatting.pysrc/database/users.pysrc/main.pysrc/routers/mldcat_ap/dataset.pysrc/routers/openml/datasets.pysrc/routers/openml/flows.pysrc/routers/openml/qualities.pysrc/routers/openml/study.pysrc/routers/openml/tasks.pysrc/routers/openml/tasktype.pytests/routers/openml/dataset_tag_test.pytests/routers/openml/datasets_list_datasets_test.pytests/routers/openml/datasets_test.pytests/routers/openml/flows_test.pytests/routers/openml/migration/datasets_migration_test.pytests/routers/openml/migration/flows_migration_test.pytests/routers/openml/qualities_test.pytests/routers/openml/study_test.pytests/routers/openml/task_type_test.py
💤 Files with no reviewable changes (1)
- src/core/formatting.py
- (Comment 1) Use strict indexing for development section and add logging when missing This ensures configuration issues are surfaced early rather than silently defaulting to production behavior. - (Comment 2) Add assertion for status field in problem+json test Verifies RFC 9457 compliance by asserting status field matches HTTP status code.
The line-length configuration should only be defined in [tool.ruff], not duplicated in the [tool.ruff.lint] subsection. This fixes an invalid configuration that was accidentally introduced when enabling docstring linting.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/routers/openml/datasets_test.py (1)
192-196: Same type concern forcodefield; consider adding consistent assertions.Similar to the other test, this asserts
error["code"] == "274"(string) whileDatasetProcessingError._default_codeis274(integer). Verify this matches the actual serialized response.Additionally, consider asserting on
titleandstatusfields for consistency withtest_error_unknown_dataset:♻️ Optional: Add title and status assertions for consistency
assert response.headers["content-type"] == "application/problem+json" error = response.json() assert error["type"] == DatasetProcessingError.uri + assert error["title"] == "Dataset Processing Error" + assert error["status"] == HTTPStatus.PRECONDITION_FAILED assert error["code"] == "274" assert "No features found" in error["detail"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/datasets_test.py` around lines 192 - 196, The test currently asserts error["code"] == "274" while DatasetProcessingError._default_code is the integer 274; update the assertion to match the actual serialized type (either assert error["code"] == DatasetProcessingError._default_code or assert error["code"] == str(DatasetProcessingError._default_code) depending on the API serializer) and add consistency checks for the problem fields by asserting error["title"] == DatasetProcessingError.title and error["status"] == DatasetProcessingError.status; locate the failing assertions in the same test (tests/routers/openml/datasets_test.py) and adjust the code comparison and add the two new assertions for title and status.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/routers/openml/datasets_test.py`:
- Around line 192-196: The test currently asserts error["code"] == "274" while
DatasetProcessingError._default_code is the integer 274; update the assertion to
match the actual serialized type (either assert error["code"] ==
DatasetProcessingError._default_code or assert error["code"] ==
str(DatasetProcessingError._default_code) depending on the API serializer) and
add consistency checks for the problem fields by asserting error["title"] ==
DatasetProcessingError.title and error["status"] ==
DatasetProcessingError.status; locate the failing assertions in the same test
(tests/routers/openml/datasets_test.py) and adjust the code comparison and add
the two new assertions for title and status.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
pyproject.tomlsrc/core/conversions.pysrc/database/users.pysrc/schemas/datasets/dcat.pysrc/schemas/datasets/mldcat_ap.pytests/routers/openml/datasets_test.pytests/routers/openml/task_type_test.py
✅ Files skipped from review due to trivial changes (2)
- src/core/conversions.py
- src/schemas/datasets/mldcat_ap.py
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- tests/routers/openml/task_type_test.py
Changed from AuthenticationFailedError to AuthenticationRequiredError when user is None in the tag_dataset handler (line 58). This provides consistent error semantics across handlers - AuthenticationRequiredError indicates missing credentials, while AuthenticationFailedError indicates invalid credentials. Matches the pattern used in update_dataset_status (line 322) and other handlers. Also removed the now-unused AuthenticationFailedError import.
…nto feat/enable-docstring-linting
Added try/except around database.studies.create() to catch IntegrityError when a concurrent request creates a study with the same alias between the check and insert operations. This ensures the API consistently returns StudyAliasExistsError (409) instead of exposing database constraint errors. The initial pre-check is kept for the common case, while the exception handler catches the rare race condition scenario.
Added try/except around original.json() call in the error response branch to catch JSONDecodeError when the legacy PHP API returns non-JSON content (e.g., XML error responses). This prevents test failures when the PHP API returns unexpected content types. The test now: - Attempts to parse JSON and assert the 'error' key if successful - Silently passes if JSON parsing fails, relying on status code comparison - Maintains the assertion that both APIs return errors in the same cases
Removed str() coercion of exc.code in problem_detail_exception_handler. The code field is defined as int|str|None and should preserve its original type in the JSON response payload. This ensures numeric codes remain as integers and clients receive the expected data type.
Added module and function docstrings to: - src/config.py: Configuration management module and functions - src/core/__init__.py: Core utilities package - src/core/access.py: Access control utilities - src/core/conversions.py: Conversion utilities with improved formatting - src/core/errors.py: ProblemDetailError.__init__ docstring - src/core/formatting.py: Formatting utilities - src/database/__init__.py: Database layer package Improves code documentation and satisfies ruff's docstring linting (D100, D103, D104, D107 checks).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/core/errors.py (1)
82-89: Type annotation may be too narrow forcodefield.The
contentdict is typed asdict[str, str | int], butexc.codecan beint | str(Line 65). While this works at runtime, the annotation could be more precise.💡 Suggested type adjustment
- content: dict[str, str | int] = { + content: dict[str, str | int | None] = {Or alternatively, since
codeis only added when not None:- content: dict[str, str | int] = { + content: dict[str, str | int] = { # code field may be int or str🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/errors.py` around lines 82 - 89, The dict annotation for the variable named content is too narrow for the optional exc.code value; update the type to accept any possible code type (e.g., change content: dict[str, str | int] to content: dict[str, Any] and import Any from typing) so that adding content["code"] = exc.code (where exc.code is int | str) is correctly typed; adjust the annotation near the content dict and ensure exc.code usage remains gated by the existing if exc.code is not None check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 120-124: Update the test to expect an integer code instead of a
string: change the assertion that checks error["code"] from "111" to 111 so it
matches DatasetNotFoundError._default_code (111) and the behavior of
problem_detail_exception_handler which preserves the numeric type when
constructing the JSON response; locate the assertion in
tests/routers/openml/migration/datasets_migration_test.py around the block that
checks response.headers["content-type"] and error = response.json().
---
Nitpick comments:
In `@src/core/errors.py`:
- Around line 82-89: The dict annotation for the variable named content is too
narrow for the optional exc.code value; update the type to accept any possible
code type (e.g., change content: dict[str, str | int] to content: dict[str, Any]
and import Any from typing) so that adding content["code"] = exc.code (where
exc.code is int | str) is correctly typed; adjust the annotation near the
content dict and ensure exc.code usage remains gated by the existing if exc.code
is not None check.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
enable-docstring-linting.patchsrc/config.pysrc/core/__init__.pysrc/core/access.pysrc/core/conversions.pysrc/core/errors.pysrc/core/formatting.pysrc/database/__init__.pysrc/routers/openml/datasets.pysrc/routers/openml/study.pytests/routers/openml/migration/datasets_migration_test.py
✅ Files skipped from review due to trivial changes (2)
- src/database/init.py
- src/core/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/routers/openml/study.py
- src/core/conversions.py
- src/core/formatting.py
| # RFC 9457: Python API now returns problem+json format | ||
| assert response.headers["content-type"] == "application/problem+json" | ||
| error = response.json() | ||
| assert error["type"] == DatasetNotFoundError.uri | ||
| assert error["code"] == "111" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's find DatasetNotFoundError definition
fd -t f ".*\.py$" | xargs rg -l "class DatasetNotFoundError"Repository: openml/server-api
Length of output: 79
🏁 Script executed:
#!/bin/bash
# Find problem_detail_exception_handler
fd -t f ".*\.py$" | xargs rg -l "problem_detail_exception_handler"Repository: openml/server-api
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Check the test file to see the context around the assertion
cat -n tests/routers/openml/migration/datasets_migration_test.py | sed -n '110,130p'Repository: openml/server-api
Length of output: 827
🏁 Script executed:
#!/bin/bash
# Look at DatasetNotFoundError definition
cat -n src/core/errors.py | head -100Repository: openml/server-api
Length of output: 3850
🏁 Script executed:
#!/bin/bash
# Search for DatasetNotFoundError specifically
rg -n "class DatasetNotFoundError" src/core/errors.py -A10Repository: openml/server-api
Length of output: 475
🏁 Script executed:
#!/bin/bash
# Look at problem_detail_exception_handler implementation
rg -n "problem_detail_exception_handler" src/core/errors.py -A30Repository: openml/server-api
Length of output: 1087
Fix assertion to expect integer code value.
The assertion on line 124 expects error["code"] == "111" (string), but DatasetNotFoundError._default_code = 111 is an int. Since problem_detail_exception_handler preserves the code type directly when building the response content (line 89), the JSON response will contain an integer, not a string. Change the assertion to assert error["code"] == 111.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/routers/openml/migration/datasets_migration_test.py` around lines 120 -
124, Update the test to expect an integer code instead of a string: change the
assertion that checks error["code"] from "111" to 111 so it matches
DatasetNotFoundError._default_code (111) and the behavior of
problem_detail_exception_handler which preserves the numeric type when
constructing the JSON response; locate the assertion in
tests/routers/openml/migration/datasets_migration_test.py around the block that
checks response.headers["content-type"] and error = response.json().
Resolved conflict in src/core/conversions.py by keeping correct docstring placement before imports.
- Fix D205 in tasks template docstring - Fix D205 and D404 in schema docstrings - Make pre-commit ruff hook pass
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/routers/openml/tasks.py (1)
37-50:⚠️ Potential issue | 🟡 MinorFix
fill_templatedocstring return type mismatch.Line 49 says
returns: str, but the function returnsdict[str, JSON](see Lines 85-95). This doc mismatch will mislead maintainers.✏️ Proposed docstring correction
def fill_template( @@ ) -> dict[str, JSON]: """Fill in the XML template used for task descriptions. Return the result converted to JSON. @@ - template, str: - A string represent XML, as detailed below. + template, str: + A string representing XML, as detailed below. @@ - returns: str - The template with values filled in. + returns: dict[str, JSON] + The template with values filled in and converted to JSON.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/tasks.py` around lines 37 - 50, The docstring for fill_template incorrectly states "returns: str" while the function actually returns a mapping converted to JSON (dict[str, JSON]). Update the return section of the fill_template docstring to reflect the correct type (e.g., "returns: dict[str, JSON]" or "returns: dict" with a short note that keys are input names and values are JSON-serializable values) and adjust the brief description to say the filled template is returned as a JSON-like dict rather than a string.
🧹 Nitpick comments (2)
src/schemas/datasets/mldcat_ap.py (1)
1-2: Avoid module-wide suppression for docstring punctuation checks.Line 1 disables
D400/D415for the entire file. Since this can be fixed directly in the module docstring, it’s better to keep those checks active and limit ignores to what’s truly needed.Proposed tweak
-# ruff: noqa: D101, D102, D400, D415 -"""Based on MLDCAT-AP 1.0.0: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/ +# ruff: noqa: D101, D102 +"""MLDCAT-AP 1.0.0 application profile. + +Reference: https://semiceu.github.io/MLDCAT-AP/releases/1.0.0/ This is an application profile, aimed to extend the use of DCAT-AP, originally envisaged for the description of a machine learning process, developed in collaboration with OpenML. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/schemas/datasets/mldcat_ap.py` around lines 1 - 2, Remove the module-wide suppression of D400 and D415 from the ruff noqa at the top of the file and instead fix the module docstring so it satisfies punctuation rules: edit the module-level docstring (the triple-quoted string starting the file) to include a proper one-line summary ending with a period and ensure the following description lines are properly separated and punctuated; keep other needed suppressions (D101, D102) if necessary but delete D400 and D415 from the "# ruff: noqa" directive.src/core/conversions.py (1)
68-77: Nit: Docstring says "lists" but function operates on anySequence.The function uses
isinstance(obj, Sequence)which includes tuples and other sequences, not just lists. Consider updating the docstring for accuracy.📝 Suggested docstring clarification
def nested_remove_single_element_list(obj: Any) -> Any: - """Recursively unwrap single-element lists in nested data structures.""" + """Recursively unwrap single-element sequences in nested data structures."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/conversions.py` around lines 68 - 77, The docstring for nested_remove_single_element_list is misleading: the function checks isinstance(obj, Sequence) so it unwraps single-element sequences (not just lists). Update the docstring on nested_remove_single_element_list to say it recursively unwraps single-element sequences (e.g., lists, tuples, and other Sequence types) in nested data structures, and optionally mention that strings are treated as atomic to avoid being iterated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/routers/openml/tasks.py`:
- Around line 37-50: The docstring for fill_template incorrectly states
"returns: str" while the function actually returns a mapping converted to JSON
(dict[str, JSON]). Update the return section of the fill_template docstring to
reflect the correct type (e.g., "returns: dict[str, JSON]" or "returns: dict"
with a short note that keys are input names and values are JSON-serializable
values) and adjust the brief description to say the filled template is returned
as a JSON-like dict rather than a string.
---
Nitpick comments:
In `@src/core/conversions.py`:
- Around line 68-77: The docstring for nested_remove_single_element_list is
misleading: the function checks isinstance(obj, Sequence) so it unwraps
single-element sequences (not just lists). Update the docstring on
nested_remove_single_element_list to say it recursively unwraps single-element
sequences (e.g., lists, tuples, and other Sequence types) in nested data
structures, and optionally mention that strings are treated as atomic to avoid
being iterated.
In `@src/schemas/datasets/mldcat_ap.py`:
- Around line 1-2: Remove the module-wide suppression of D400 and D415 from the
ruff noqa at the top of the file and instead fix the module docstring so it
satisfies punctuation rules: edit the module-level docstring (the triple-quoted
string starting the file) to include a proper one-line summary ending with a
period and ensure the following description lines are properly separated and
punctuated; keep other needed suppressions (D101, D102) if necessary but delete
D400 and D415 from the "# ruff: noqa" directive.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/core/conversions.pysrc/database/datasets.pysrc/database/evaluations.pysrc/routers/openml/tasks.pysrc/schemas/datasets/dcat.pysrc/schemas/datasets/mldcat_ap.py
✅ Files skipped from review due to trivial changes (1)
- src/database/evaluations.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/schemas/datasets/dcat.py
Summary
Enable docstring linting gradually by removing the blanket "D" ignore from the ruff configuration in
pyproject.toml. This change keeps specific D-codes that are still noisy (e.g., D203, D204, D213), and preserves per-file ignores for tests.Changes
"D"fromtool.ruff.lint.ignoreinpyproject.toml.Motivation
This aligns with issue #212: make docstrings more consistent and improve readability/maintenance. We enable docstring checks incrementally so maintainers can opt to fix violations progressively.
Checklist
Related: openml/server-api issue #212. (#212)