Skip to content

[ENH] RFC9457 compliant errors#238

Open
PGijsbers wants to merge 9 commits intomainfrom
errors
Open

[ENH] RFC9457 compliant errors#238
PGijsbers wants to merge 9 commits intomainfrom
errors

Conversation

@PGijsbers
Copy link
Contributor

Support RFC9457 compliant errors.

@PGijsbers PGijsbers added the enhancement New feature or request label Feb 12, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces ad-hoc HTTPException usage with an RFC 9457 "application/problem+json" surface: adds ProblemDetailError, many domain-specific ProblemDetailError subclasses, and problem_detail_exception_handler in src/core/errors.py; removes the old DatasetError IntEnum. Registers the handler in src/main.py. Updates routers to raise the new problem-detail exceptions (datasets, flows, studies, tasks, qualities, task types, mldcat dataset service). Adjusts tests to expect problem+json responses. Minor unrelated tweaks: removed an error-format helper in src/core/formatting.py and made a safer config lookup in src/database/users.py.

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[ENH] RFC9457 compliant errors' directly corresponds to the main objective of implementing RFC 9457 compliance across the codebase.
Description check ✅ Passed The PR description 'Support RFC9457 compliant errors' is directly related to the changeset, which implements RFC 9457 compliance throughout the error handling system.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch errors

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/routers/openml/qualities.py (1)

38-45: Semantic mismatch: DatasetNotFoundError overridden to behave as DatasetNoDataFileError.

The overrides (code=113, status_code=412) exactly match DatasetNoDataFileError's defaults. This means the response type URI will say dataset-not-found while the legacy code says "no data file" — contradictory signals for API consumers.

If the PHP API truly returned 412/113 for this case, consider raising DatasetNoDataFileError directly (which already has those defaults), or introduce a dedicated error class for this legacy endpoint behavior.

Proposed fix using existing error class
-from core.errors import DatasetNotFoundError
+from core.errors import DatasetNoDataFileError
 ...
         # Backwards compatibility: PHP API returns 412 with code 113
         msg = "Unknown dataset."
-        no_data_file = 113
-        raise DatasetNotFoundError(
+        raise DatasetNoDataFileError(
             msg,
-            code=no_data_file,
-            status_code=HTTPStatus.PRECONDITION_FAILED,
         )

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 8 issues, and left some high level feedback:

  • The ProblemDetailError docstring and problem_detail_exception_handler docstring still refer to ProblemDetailException; consider updating those names to avoid confusion for future readers.
  • For unknown datasets, most endpoints now use HTTPStatus.NOT_FOUND, but get_qualities still returns PRECONDITION_FAILED while mapping to ProblemType.DATASET_NOT_FOUND; consider aligning this status code with the other dataset-not-found cases unless there is a strong backward-compatibility reason not to.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `ProblemDetailError` docstring and `problem_detail_exception_handler` docstring still refer to `ProblemDetailException`; consider updating those names to avoid confusion for future readers.
- For unknown datasets, most endpoints now use `HTTPStatus.NOT_FOUND`, but `get_qualities` still returns `PRECONDITION_FAILED` while mapping to `ProblemType.DATASET_NOT_FOUND`; consider aligning this status code with the other dataset-not-found cases unless there is a strong backward-compatibility reason not to.

## Individual Comments

### Comment 1
<location> `src/core/errors.py:62` </location>
<code_context>
+    """Exception that produces RFC 9457 compliant error responses.
+
+    Usage:
+        raise ProblemDetailException(
+            status_code=HTTPStatus.NOT_FOUND,
+            detail="Dataset 123 was not found.",
+            title="Dataset Not Found",
+            type_="https://openml.org/problems/dataset-not-found",
+            code="111",  # Extension field for legacy error codes
+        )
+    """
</code_context>

<issue_to_address>
**issue (typo):** The docstring example references `ProblemDetailException`, but the actual class is `ProblemDetailError`.

This mismatch in the example can confuse users and any tooling that consumes docstrings. Please update the example (and any related text) to use `ProblemDetailError` instead.

```suggestion
        raise ProblemDetailError(
```
</issue_to_address>

### Comment 2
<location> `src/core/errors.py:92-101` </location>
<code_context>
+        super().__init__(detail or title or "An error occurred")
+
+
+def problem_detail_exception_handler(
+    request: Request,  # noqa: ARG001
+    exc: ProblemDetailError,
+) -> JSONResponse:
+    """FastAPI exception handler for ProblemDetailException.
+
+    Returns a response with:
</code_context>

<issue_to_address>
**issue (typo):** The exception handler docstring mentions `ProblemDetailException` instead of `ProblemDetailError`.

Since this handler is registered for `ProblemDetailError` in `main.py`, the docstring should reference `ProblemDetailError` as well to keep the documentation accurate and avoid confusion in editors and generated docs.

```suggestion
def problem_detail_exception_handler(
    request: Request,  # noqa: ARG001
    exc: ProblemDetailError,
) -> JSONResponse:
    """FastAPI exception handler for ProblemDetailError.

    Returns a response with:
    - Content-Type: application/problem+json
    - RFC 9457 compliant JSON body
    """
```
</issue_to_address>

### Comment 3
<location> `src/routers/openml/qualities.py:40-42` </location>
<code_context>
-        raise HTTPException(status_code=HTTPStatus.NOT_FOUND, detail=error)
+        raise_problem(
+            status_code=HTTPStatus.NOT_FOUND,
+            type_=ProblemType.DATASET_NOT_FOUND,
+            detail="Unknown dataset.",
+            code=DatasetError.NOT_FOUND,
</code_context>

<issue_to_address>
**issue:** The problem type and legacy error code for the missing dataset seem inconsistent.

Here `type_` is `ProblemType.DATASET_NOT_FOUND`, while the legacy `code` previously used for this case was `DatasetError.NO_DATA_FILE` with the message "Unknown dataset". To keep the problem type and legacy error code semantically aligned, consider either using `DatasetError.NOT_FOUND` when the intent is "unknown dataset", or adjusting `type_` to better match the specific "no data file" condition if that’s what this branch represents.
</issue_to_address>

### Comment 4
<location> `src/core/errors.py:212` </location>
<code_context>
+        code: Optional legacy OpenML error code (for backwards compatibility).
+        **extensions: Additional extension fields to include in the response.
+    """
+    title = PROBLEM_TITLES.get(type_)
+    if code is not None:
+        extensions["code"] = str(code)
</code_context>

<issue_to_address>
**suggestion:** Consider providing a sensible fallback title when the problem type URI is not in `PROBLEM_TITLES`.

If a new `type_` is added without updating `PROBLEM_TITLES`, `title` becomes `None` and is omitted. To keep responses self-describing, consider falling back to `HTTPStatus(status_code).phrase` or a generic title when `type_` is not found.

Suggested implementation:

```python
    title = PROBLEM_TITLES.get(type_)
    if title is None:
        try:
            # Fallback to the standard HTTP status phrase when the problem type is unknown.
            title = HTTPStatus(status_code).phrase
        except ValueError:
            # As a last resort, provide a generic, self-describing title.
            title = "Unknown error"
    if code is not None:
        extensions["code"] = str(code)

```

To complete this change, ensure that `HTTPStatus` is imported at the top of `src/core/errors.py`:

1. Add `from http import HTTPStatus` alongside your other imports, if it is not already present.
2. If `HTTPStatus` is already imported elsewhere in the file, no further changes are needed.
</issue_to_address>

### Comment 5
<location> `tests/routers/openml/datasets_test.py:31-37` </location>
<code_context>
-    assert response.status_code == HTTPStatus.PRECONDITION_FAILED
-    assert response.json()["detail"] == {"code": "103", "message": "Authentication failed"}
+    assert response.status_code == HTTPStatus.UNAUTHORIZED
+    assert response.headers["content-type"] == "application/problem+json"
+    error = response.json()
+    assert error["type"] == ProblemType.AUTHENTICATION_FAILED
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding dedicated unit tests for the new RFC9457 error helpers (ProblemDetail, ProblemDetailError, problem_detail_exception_handler, raise_problem).

The current endpoint tests give good integration coverage, but please also add unit tests for `core.errors` itself (e.g. in `tests/core/test_errors.py`) to make the new helpers safer to change. In particular, cover:
- `ProblemDetail`: default `type` of `about:blank`, `model_dump(exclude_none=True)` omitting `None` fields, and `type_/type` aliasing.
- `ProblemDetailError`: storing `status_code`, building a `ProblemDetail` with matching `status`, and preserving arbitrary `extensions` (multiple keys and value types).
- `problem_detail_exception_handler`: returns a `JSONResponse` with `application/problem+json`, merges `problem` and `extensions` correctly, and ensures `status` in the body matches the HTTP status.
- `raise_problem`: looks up `title` from `PROBLEM_TITLES`, coerces numeric `code` to strings, and passes through `instance` and extra extension fields.
These will better protect RFC9457 behavior from regressions than endpoint-only tests.
</issue_to_address>

### Comment 6
<location> `tests/routers/openml/migration/datasets_migration_test.py:31-36` </location>
<code_context>

     if new.status_code != HTTPStatus.OK:
-        assert original.json()["error"] == new.json()["detail"]
+        # RFC 9457: Python API now returns problem+json format
+        assert new.headers["content-type"] == "application/problem+json"
+        # Both APIs should return error responses in the same cases
+        assert "error" in original.json()
         return
</code_context>

<issue_to_address>
**suggestion (testing):** Strengthen migration test for dataset error responses by asserting basic RFC9457 fields.

This currently only checks the `application/problem+json` content type and that the PHP response has an `"error"` key. To better protect the new behavior while keeping this as a migration test (not a full schema check), consider also asserting minimal RFC9457 fields, for example:
```python
error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_code
```
This gives early warning if the problem-details structure or status field diverges from expectations without making the test brittle.

```suggestion
    if new.status_code != HTTPStatus.OK:
        # RFC 9457: Python API now returns problem+json format
        assert new.headers["content-type"] == "application/problem+json"
        # Both APIs should return error responses in the same cases
        assert "error" in original.json()

        # Minimal RFC 9457 structure checks for Python API response
        error = new.json()
        assert "type" in error
        assert "status" in error
        assert error["status"] == new.status_code
        return
```
</issue_to_address>

### Comment 7
<location> `tests/routers/openml/migration/datasets_migration_test.py:209-213` </location>
<code_context>
     assert response.status_code == original.status_code

     if response.status_code != HTTPStatus.OK:
-        error = response.json()["detail"]
-        error["code"] = str(error["code"])
-        assert error == original.json()["error"]
+        # RFC 9457: Python API now returns problem+json format
+        assert response.headers["content-type"] == "application/problem+json"
+        error = response.json()
+        # Verify Python API returns properly typed RFC 9457 response
+        assert "type" in error
+        assert "status" in error
+        # Both APIs should error in the same cases
+        assert "error" in original.json()
         return
</code_context>

<issue_to_address>
**suggestion (testing):** Preserve explicit checks for legacy error codes in the features migration test to ensure backward-compatibility.

This test used to compare the legacy `code` field between PHP and Python. After switching to RFC9457, it only validates `type`/`status` and the presence of an `"error"` block. Since this endpoint historically exposed specific error codes (272/273/274), please keep a minimal assertion that the legacy `code` is still present and string-typed in the Python response, e.g.:
```python
error = response.json()
assert "code" in error
assert isinstance(error["code"], str)
# optionally:
assert error["code"] in {"272", "273", "274"}
```

```suggestion
        # RFC 9457: Python API now returns problem+json format
        assert new.headers["content-type"] == "application/problem+json"
        error = new.json()
        # Verify Python API returns properly typed RFC 9457 response
        assert "type" in error
        assert "status" in error
        # Preserve legacy error code field for backward-compatibility
        assert "code" in error
        assert isinstance(error["code"], str)
        assert error["code"] in {"272", "273", "274"}
        # Both APIs should error in the same cases
        assert "error" in original.json()
        return
```
</issue_to_address>

### Comment 8
<location> `src/core/errors.py:114` </location>
<code_context>
+
+# Problem type URIs for OpenML-specific errors
+# These should be documented at the corresponding URLs
+class ProblemType:
+    """Problem type URIs for common OpenML errors."""
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider collapsing the parallel problem-type URI/title structures and separate extensions dict into richer enum and model types to reduce indirection and duplication.

You can reduce indirection without changing functionality by collapsing a couple of parallel structures.

### 1. Collapse `ProblemType` + `PROBLEM_TITLES`

You currently maintain URIs and titles separately. You can avoid the extra dict, runtime lookup, and mismatch risk by making `ProblemType` an enum that carries both `uri` and `title`, and letting `raise_problem` accept that enum:

```python
from enum import Enum

class ProblemType(Enum):
    DATASET_NOT_FOUND = (
        "https://openml.org/problems/dataset-not-found",
        "Dataset Not Found",
    )
    DATASET_NO_ACCESS = (
        "https://openml.org/problems/dataset-no-access",
        "Dataset Access Denied",
    )
    # ... other entries ...

    @property
    def uri(self) -> str:
        return self.value[0]

    @property
    def title(self) -> str:
        return self.value[1]
```

Then simplify `raise_problem` to:

```python
def raise_problem(
    status_code: HTTPStatus | int,
    type_: ProblemType,
    detail: str,
    *,
    instance: str | None = None,
    code: int | str | None = None,
    **extensions: ExtensionValue,
) -> NoReturn:
    if code is not None:
        extensions["code"] = str(code)

    raise ProblemDetailError(
        status_code=status_code,
        detail=detail,
        title=type_.title,
        type_=type_.uri,
        instance=instance,
        **extensions,
    )
```

Call sites become more self-documenting and you can drop `PROBLEM_TITLES` entirely:

```python
raise_problem(
    HTTPStatus.NOT_FOUND,
    ProblemType.DATASET_NOT_FOUND,
    detail="Dataset 123 was not found.",
)
```

### 2. Fold `extensions` into `ProblemDetail`

You don’t need both `ProblemDetail` and a separate `extensions` dict plus merge logic. You can let `ProblemDetail` accept arbitrary extra fields and construct it directly with extensions:

```python
class ProblemDetail(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,
        extra="allow",  # allow arbitrary extension fields
    )

    type_: str = Field(
        default="about:blank",
        alias="type",
        serialization_alias="type",
    )
    title: str | None = None
    status: int | None = None
    detail: str | None = None
    instance: str | None = None
```

Then `ProblemDetailError` doesn’t need to store `extensions` separately:

```python
class ProblemDetailError(Exception):
    def __init__(
        self,
        status_code: HTTPStatus | int,
        detail: str | None = None,
        title: str | None = None,
        type_: str = "about:blank",
        instance: str | None = None,
        **extensions: ExtensionValue,
    ) -> None:
        self.status_code = int(status_code)
        self.problem = ProblemDetail(
            type_=type_,
            title=title,
            status=self.status_code,
            detail=detail,
            instance=instance,
            **extensions,  # extensions become top-level JSON fields
        )
        super().__init__(detail or title or "An error occurred")
```

And the handler becomes a straight dump (no merge needed):

```python
def problem_detail_exception_handler(
    request: Request,  # noqa: ARG001
    exc: ProblemDetailError,
) -> JSONResponse:
    return JSONResponse(
        status_code=exc.status_code,
        content=exc.problem.model_dump(by_alias=True, exclude_none=True),
        media_type="application/problem+json",
    )
```

This keeps the wire format identical (extensions are still top-level fields) while removing one data structure (`extensions` on the exception) and one merge step from the handler. Combined with the enum change, you remove duplication and some mental overhead without changing behavior or the RFC 9457-compliant API surface.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 31 to 36
if new.status_code != HTTPStatus.OK:
assert original.json()["error"] == new.json()["detail"]
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should return error responses in the same cases
assert "error" in original.json()
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen migration test for dataset error responses by asserting basic RFC9457 fields.

This currently only checks the application/problem+json content type and that the PHP response has an "error" key. To better protect the new behavior while keeping this as a migration test (not a full schema check), consider also asserting minimal RFC9457 fields, for example:

error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_code

This gives early warning if the problem-details structure or status field diverges from expectations without making the test brittle.

Suggested change
if new.status_code != HTTPStatus.OK:
assert original.json()["error"] == new.json()["detail"]
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should return error responses in the same cases
assert "error" in original.json()
return
if new.status_code != HTTPStatus.OK:
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should return error responses in the same cases
assert "error" in original.json()
# Minimal RFC 9457 structure checks for Python API response
error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_code
return

Comment on lines +209 to 213
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should error in the same cases
assert "error" in original.json()
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Preserve explicit checks for legacy error codes in the features migration test to ensure backward-compatibility.

This test used to compare the legacy code field between PHP and Python. After switching to RFC9457, it only validates type/status and the presence of an "error" block. Since this endpoint historically exposed specific error codes (272/273/274), please keep a minimal assertion that the legacy code is still present and string-typed in the Python response, e.g.:

error = response.json()
assert "code" in error
assert isinstance(error["code"], str)
# optionally:
assert error["code"] in {"272", "273", "274"}
Suggested change
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should error in the same cases
assert "error" in original.json()
return
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
error = new.json()
# Verify Python API returns properly typed RFC 9457 response
assert "type" in error
assert "status" in error
# Preserve legacy error code field for backward-compatibility
assert "code" in error
assert isinstance(error["code"], str)
assert error["code"] in {"272", "273", "274"}
# Both APIs should error in the same cases
assert "error" in original.json()
return

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 62.66094% with 87 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@294ca47). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/routers/openml/datasets.py 6.45% 29 Missing ⚠️
src/routers/openml/study.py 7.69% 24 Missing ⚠️
src/core/errors.py 88.51% 17 Missing ⚠️
src/routers/openml/flows.py 33.33% 4 Missing ⚠️
src/routers/openml/tasks.py 33.33% 4 Missing ⚠️
src/routers/openml/qualities.py 40.00% 3 Missing ⚠️
src/routers/mldcat_ap/dataset.py 50.00% 2 Missing ⚠️
src/routers/openml/tasktype.py 50.00% 2 Missing ⚠️
src/database/users.py 0.00% 0 Missing and 1 partial ⚠️
src/main.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #238   +/-   ##
=======================================
  Coverage        ?   56.15%           
=======================================
  Files           ?       32           
  Lines           ?     1284           
  Branches        ?      105           
=======================================
  Hits            ?      721           
  Misses          ?      561           
  Partials        ?        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/routers/mldcat_ap/dataset.py (1)

153-154: ⚠️ Potential issue | 🟡 Minor

Unhandled StopIteration if quality name doesn't match any results.

Line 154 uses a generator expression with next() but has no fallback. If quality_name doesn't match any quality in the list, this will raise a raw StopIteration, resulting in an uncontrolled 500. Consider catching this and raising a proper raise_problem error consistent with the new error pattern.

Suggested fix
-    quality = next(q for q in qualities if q.name == quality_name)
+    quality = next((q for q in qualities if q.name == quality_name), None)
+    if quality is None:
+        raise_problem(
+            status_code=HTTPStatus.NOT_FOUND,
+            type_=ProblemType.NO_RESULTS,
+            detail=f"Quality '{quality_name}' not found for distribution {distribution_id}.",
+        )
tests/routers/openml/dataset_tag_test.py (1)

83-84: ⚠️ Potential issue | 🟡 Minor

Pre-existing bug: missing = in API key query parameter.

Line 84 has f"/datasets/tag?api_key{ApiKey.ADMIN}" which produces a URL like /datasets/tag?api_keyabc instead of /datasets/tag?api_key=abc. The test likely still passes because FastAPI body validation rejects the invalid tag before authentication is checked, but the request is sent without a valid API key.

🐛 Proposed fix
-        f"/datasets/tag?api_key{ApiKey.ADMIN}",
+        f"/datasets/tag?api_key={ApiKey.ADMIN}",
src/routers/openml/datasets.py (1)

255-278: ⚠️ Potential issue | 🟡 Minor

Docstring still references ProblemDetailException.

Line 262 says "Raises ProblemDetailException" but the actual exception class is ProblemDetailError.

Proposed fix
-    Raises ProblemDetailException if the dataset does not exist or the user can not access it.
+    Raises ProblemDetailError if the dataset does not exist or the user can not access it.
🤖 Fix all issues with AI agents
In `@src/core/errors.py`:
- Around line 58-89: Update the class docstring to reference the correct class
name: replace occurrences of "ProblemDetailException" with "ProblemDetailError"
(in the usage example) so the example matches the actual class name
ProblemDetailError; ensure any other mentions inside the docstring align with
the class identifier ProblemDetailError and keep the example parameters
(status_code, detail, title, type_, code) unchanged.

In `@src/routers/openml/qualities.py`:
- Around line 38-43: The raised problem in raise_problem uses mismatched enums:
ProblemType.DATASET_NOT_FOUND with code=DatasetError.NO_DATA_FILE; update this
to be consistent by either changing code to DatasetError.NOT_FOUND to match
ProblemType.DATASET_NOT_FOUND, or change type_ to
ProblemType.DATASET_NO_DATA_FILE if the intent is to signal a missing data file;
adjust the call site (raise_problem) accordingly following the pattern used in
datasets.py so ProblemType and DatasetError align (e.g., use
ProblemType.DATASET_NOT_FOUND + DatasetError.NOT_FOUND or
ProblemType.DATASET_NO_DATA_FILE + DatasetError.NO_DATA_FILE).
🧹 Nitpick comments (3)
tests/routers/openml/migration/datasets_migration_test.py (1)

235-242: Consider asserting a specific ProblemType for feature error cases.

Lines 239–240 only check that "type" and "status" keys exist in the error payload, unlike other migration tests (e.g., Lines 112–113) which assert exact ProblemType values. If multiple error types are possible here (e.g., dataset not found vs. features not processed), a set-based check like assert error["type"] in {ProblemType.DATASET_NOT_FOUND, ProblemType.DATASET_NO_FEATURES, ...} would be more precise while still flexible.

src/core/errors.py (2)

92-109: Extension keys could shadow standard RFC 9457 fields.

content.update(exc.extensions) on Line 103 could overwrite standard problem detail fields (type, title, status, detail, instance) if an extension key collides. Currently raise_problem only adds "code", but direct ProblemDetailError construction could introduce collisions.

Consider either validating extension keys or applying them with a namespace/prefix guard.

Proposed defensive guard
     content = exc.problem.model_dump(by_alias=True, exclude_none=True)
-    content.update(exc.extensions)
+    reserved = {"type", "title", "status", "detail", "instance"}
+    safe_extensions = {k: v for k, v in exc.extensions.items() if k not in reserved}
+    content.update(safe_extensions)

162-190: Consider a test to enforce PROBLEM_TITLES covers all ProblemType constants.

raise_problem silently falls back to title=None when a type URI isn't in PROBLEM_TITLES. If a new ProblemType constant is added without a corresponding title entry, responses will lack a title with no warning. A simple test iterating over ProblemType class attributes and asserting presence in PROBLEM_TITLES would prevent this drift.

Also applies to: 193-222

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/core/errors.py`:
- Line 93: Update the docstring that currently reads "FastAPI exception handler
for ProblemDetailException" to reference the correct class name
ProblemDetailError; locate the docstring associated with the ProblemDetailError
FastAPI exception handler (search for ProblemDetailError and its handler
function) and change the class name in the text so the docstring correctly
states "FastAPI exception handler for ProblemDetailError."
- Around line 99-100: The code currently merges exc.extensions into the
ProblemDetail dict with content.update(exc.extensions), which allows extensions
to overwrite RFC 9457 standard members; change this to filter or validate
extensions before merging: identify the standard keys (type, title, status,
detail, instance) and either remove any of those keys from exc.extensions or
raise a validation error in ProblemDetailError.__init__ when a collision occurs,
then merge only the allowed extension keys (e.g., filtered_extensions) into
content instead of exc.extensions to ensure standard fields cannot be
overwritten.
🧹 Nitpick comments (1)
src/core/errors.py (1)

111-187: No enforcement that every ProblemType has a corresponding PROBLEM_TITLES entry.

Adding a new constant to ProblemType without updating PROBLEM_TITLES will silently produce responses with title: null. Consider adding a module-level assertion or a test to keep them in sync.

# e.g., at module level or in a test:
_problem_types = {v for k, v in vars(ProblemType).items() if not k.startswith("_")}
assert _problem_types == set(PROBLEM_TITLES), "ProblemType and PROBLEM_TITLES are out of sync"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant