Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces ad-hoc HTTPException usage with an RFC 9457 "application/problem+json" surface: adds 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 8 issues, and left some high level feedback:
- The
ProblemDetailErrordocstring andproblem_detail_exception_handlerdocstring still refer toProblemDetailException; consider updating those names to avoid confusion for future readers. - For unknown datasets, most endpoints now use
HTTPStatus.NOT_FOUND, butget_qualitiesstill returnsPRECONDITION_FAILEDwhile mapping toProblemType.DATASET_NOT_FOUND; consider aligning this status code with the other dataset-not-found cases unless there is a strong backward-compatibility reason not to.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `ProblemDetailError` docstring and `problem_detail_exception_handler` docstring still refer to `ProblemDetailException`; consider updating those names to avoid confusion for future readers.
- For unknown datasets, most endpoints now use `HTTPStatus.NOT_FOUND`, but `get_qualities` still returns `PRECONDITION_FAILED` while mapping to `ProblemType.DATASET_NOT_FOUND`; consider aligning this status code with the other dataset-not-found cases unless there is a strong backward-compatibility reason not to.
## Individual Comments
### Comment 1
<location> `src/core/errors.py:62` </location>
<code_context>
+ """Exception that produces RFC 9457 compliant error responses.
+
+ Usage:
+ raise ProblemDetailException(
+ status_code=HTTPStatus.NOT_FOUND,
+ detail="Dataset 123 was not found.",
+ title="Dataset Not Found",
+ type_="https://openml.org/problems/dataset-not-found",
+ code="111", # Extension field for legacy error codes
+ )
+ """
</code_context>
<issue_to_address>
**issue (typo):** The docstring example references `ProblemDetailException`, but the actual class is `ProblemDetailError`.
This mismatch in the example can confuse users and any tooling that consumes docstrings. Please update the example (and any related text) to use `ProblemDetailError` instead.
```suggestion
raise ProblemDetailError(
```
</issue_to_address>
### Comment 2
<location> `src/core/errors.py:92-101` </location>
<code_context>
+ super().__init__(detail or title or "An error occurred")
+
+
+def problem_detail_exception_handler(
+ request: Request, # noqa: ARG001
+ exc: ProblemDetailError,
+) -> JSONResponse:
+ """FastAPI exception handler for ProblemDetailException.
+
+ Returns a response with:
</code_context>
<issue_to_address>
**issue (typo):** The exception handler docstring mentions `ProblemDetailException` instead of `ProblemDetailError`.
Since this handler is registered for `ProblemDetailError` in `main.py`, the docstring should reference `ProblemDetailError` as well to keep the documentation accurate and avoid confusion in editors and generated docs.
```suggestion
def problem_detail_exception_handler(
request: Request, # noqa: ARG001
exc: ProblemDetailError,
) -> JSONResponse:
"""FastAPI exception handler for ProblemDetailError.
Returns a response with:
- Content-Type: application/problem+json
- RFC 9457 compliant JSON body
"""
```
</issue_to_address>
### Comment 3
<location> `src/routers/openml/qualities.py:40-42` </location>
<code_context>
- raise HTTPException(status_code=HTTPStatus.NOT_FOUND, detail=error)
+ raise_problem(
+ status_code=HTTPStatus.NOT_FOUND,
+ type_=ProblemType.DATASET_NOT_FOUND,
+ detail="Unknown dataset.",
+ code=DatasetError.NOT_FOUND,
</code_context>
<issue_to_address>
**issue:** The problem type and legacy error code for the missing dataset seem inconsistent.
Here `type_` is `ProblemType.DATASET_NOT_FOUND`, while the legacy `code` previously used for this case was `DatasetError.NO_DATA_FILE` with the message "Unknown dataset". To keep the problem type and legacy error code semantically aligned, consider either using `DatasetError.NOT_FOUND` when the intent is "unknown dataset", or adjusting `type_` to better match the specific "no data file" condition if that’s what this branch represents.
</issue_to_address>
### Comment 4
<location> `src/core/errors.py:212` </location>
<code_context>
+ code: Optional legacy OpenML error code (for backwards compatibility).
+ **extensions: Additional extension fields to include in the response.
+ """
+ title = PROBLEM_TITLES.get(type_)
+ if code is not None:
+ extensions["code"] = str(code)
</code_context>
<issue_to_address>
**suggestion:** Consider providing a sensible fallback title when the problem type URI is not in `PROBLEM_TITLES`.
If a new `type_` is added without updating `PROBLEM_TITLES`, `title` becomes `None` and is omitted. To keep responses self-describing, consider falling back to `HTTPStatus(status_code).phrase` or a generic title when `type_` is not found.
Suggested implementation:
```python
title = PROBLEM_TITLES.get(type_)
if title is None:
try:
# Fallback to the standard HTTP status phrase when the problem type is unknown.
title = HTTPStatus(status_code).phrase
except ValueError:
# As a last resort, provide a generic, self-describing title.
title = "Unknown error"
if code is not None:
extensions["code"] = str(code)
```
To complete this change, ensure that `HTTPStatus` is imported at the top of `src/core/errors.py`:
1. Add `from http import HTTPStatus` alongside your other imports, if it is not already present.
2. If `HTTPStatus` is already imported elsewhere in the file, no further changes are needed.
</issue_to_address>
### Comment 5
<location> `tests/routers/openml/datasets_test.py:31-37` </location>
<code_context>
- assert response.status_code == HTTPStatus.PRECONDITION_FAILED
- assert response.json()["detail"] == {"code": "103", "message": "Authentication failed"}
+ assert response.status_code == HTTPStatus.UNAUTHORIZED
+ assert response.headers["content-type"] == "application/problem+json"
+ error = response.json()
+ assert error["type"] == ProblemType.AUTHENTICATION_FAILED
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding dedicated unit tests for the new RFC9457 error helpers (ProblemDetail, ProblemDetailError, problem_detail_exception_handler, raise_problem).
The current endpoint tests give good integration coverage, but please also add unit tests for `core.errors` itself (e.g. in `tests/core/test_errors.py`) to make the new helpers safer to change. In particular, cover:
- `ProblemDetail`: default `type` of `about:blank`, `model_dump(exclude_none=True)` omitting `None` fields, and `type_/type` aliasing.
- `ProblemDetailError`: storing `status_code`, building a `ProblemDetail` with matching `status`, and preserving arbitrary `extensions` (multiple keys and value types).
- `problem_detail_exception_handler`: returns a `JSONResponse` with `application/problem+json`, merges `problem` and `extensions` correctly, and ensures `status` in the body matches the HTTP status.
- `raise_problem`: looks up `title` from `PROBLEM_TITLES`, coerces numeric `code` to strings, and passes through `instance` and extra extension fields.
These will better protect RFC9457 behavior from regressions than endpoint-only tests.
</issue_to_address>
### Comment 6
<location> `tests/routers/openml/migration/datasets_migration_test.py:31-36` </location>
<code_context>
if new.status_code != HTTPStatus.OK:
- assert original.json()["error"] == new.json()["detail"]
+ # RFC 9457: Python API now returns problem+json format
+ assert new.headers["content-type"] == "application/problem+json"
+ # Both APIs should return error responses in the same cases
+ assert "error" in original.json()
return
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen migration test for dataset error responses by asserting basic RFC9457 fields.
This currently only checks the `application/problem+json` content type and that the PHP response has an `"error"` key. To better protect the new behavior while keeping this as a migration test (not a full schema check), consider also asserting minimal RFC9457 fields, for example:
```python
error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_code
```
This gives early warning if the problem-details structure or status field diverges from expectations without making the test brittle.
```suggestion
if new.status_code != HTTPStatus.OK:
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
# Both APIs should return error responses in the same cases
assert "error" in original.json()
# Minimal RFC 9457 structure checks for Python API response
error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_code
return
```
</issue_to_address>
### Comment 7
<location> `tests/routers/openml/migration/datasets_migration_test.py:209-213` </location>
<code_context>
assert response.status_code == original.status_code
if response.status_code != HTTPStatus.OK:
- error = response.json()["detail"]
- error["code"] = str(error["code"])
- assert error == original.json()["error"]
+ # RFC 9457: Python API now returns problem+json format
+ assert response.headers["content-type"] == "application/problem+json"
+ error = response.json()
+ # Verify Python API returns properly typed RFC 9457 response
+ assert "type" in error
+ assert "status" in error
+ # Both APIs should error in the same cases
+ assert "error" in original.json()
return
</code_context>
<issue_to_address>
**suggestion (testing):** Preserve explicit checks for legacy error codes in the features migration test to ensure backward-compatibility.
This test used to compare the legacy `code` field between PHP and Python. After switching to RFC9457, it only validates `type`/`status` and the presence of an `"error"` block. Since this endpoint historically exposed specific error codes (272/273/274), please keep a minimal assertion that the legacy `code` is still present and string-typed in the Python response, e.g.:
```python
error = response.json()
assert "code" in error
assert isinstance(error["code"], str)
# optionally:
assert error["code"] in {"272", "273", "274"}
```
```suggestion
# RFC 9457: Python API now returns problem+json format
assert new.headers["content-type"] == "application/problem+json"
error = new.json()
# Verify Python API returns properly typed RFC 9457 response
assert "type" in error
assert "status" in error
# Preserve legacy error code field for backward-compatibility
assert "code" in error
assert isinstance(error["code"], str)
assert error["code"] in {"272", "273", "274"}
# Both APIs should error in the same cases
assert "error" in original.json()
return
```
</issue_to_address>
### Comment 8
<location> `src/core/errors.py:114` </location>
<code_context>
+
+# Problem type URIs for OpenML-specific errors
+# These should be documented at the corresponding URLs
+class ProblemType:
+ """Problem type URIs for common OpenML errors."""
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider collapsing the parallel problem-type URI/title structures and separate extensions dict into richer enum and model types to reduce indirection and duplication.
You can reduce indirection without changing functionality by collapsing a couple of parallel structures.
### 1. Collapse `ProblemType` + `PROBLEM_TITLES`
You currently maintain URIs and titles separately. You can avoid the extra dict, runtime lookup, and mismatch risk by making `ProblemType` an enum that carries both `uri` and `title`, and letting `raise_problem` accept that enum:
```python
from enum import Enum
class ProblemType(Enum):
DATASET_NOT_FOUND = (
"https://openml.org/problems/dataset-not-found",
"Dataset Not Found",
)
DATASET_NO_ACCESS = (
"https://openml.org/problems/dataset-no-access",
"Dataset Access Denied",
)
# ... other entries ...
@property
def uri(self) -> str:
return self.value[0]
@property
def title(self) -> str:
return self.value[1]
```
Then simplify `raise_problem` to:
```python
def raise_problem(
status_code: HTTPStatus | int,
type_: ProblemType,
detail: str,
*,
instance: str | None = None,
code: int | str | None = None,
**extensions: ExtensionValue,
) -> NoReturn:
if code is not None:
extensions["code"] = str(code)
raise ProblemDetailError(
status_code=status_code,
detail=detail,
title=type_.title,
type_=type_.uri,
instance=instance,
**extensions,
)
```
Call sites become more self-documenting and you can drop `PROBLEM_TITLES` entirely:
```python
raise_problem(
HTTPStatus.NOT_FOUND,
ProblemType.DATASET_NOT_FOUND,
detail="Dataset 123 was not found.",
)
```
### 2. Fold `extensions` into `ProblemDetail`
You don’t need both `ProblemDetail` and a separate `extensions` dict plus merge logic. You can let `ProblemDetail` accept arbitrary extra fields and construct it directly with extensions:
```python
class ProblemDetail(BaseModel):
model_config = ConfigDict(
populate_by_name=True,
extra="allow", # allow arbitrary extension fields
)
type_: str = Field(
default="about:blank",
alias="type",
serialization_alias="type",
)
title: str | None = None
status: int | None = None
detail: str | None = None
instance: str | None = None
```
Then `ProblemDetailError` doesn’t need to store `extensions` separately:
```python
class ProblemDetailError(Exception):
def __init__(
self,
status_code: HTTPStatus | int,
detail: str | None = None,
title: str | None = None,
type_: str = "about:blank",
instance: str | None = None,
**extensions: ExtensionValue,
) -> None:
self.status_code = int(status_code)
self.problem = ProblemDetail(
type_=type_,
title=title,
status=self.status_code,
detail=detail,
instance=instance,
**extensions, # extensions become top-level JSON fields
)
super().__init__(detail or title or "An error occurred")
```
And the handler becomes a straight dump (no merge needed):
```python
def problem_detail_exception_handler(
request: Request, # noqa: ARG001
exc: ProblemDetailError,
) -> JSONResponse:
return JSONResponse(
status_code=exc.status_code,
content=exc.problem.model_dump(by_alias=True, exclude_none=True),
media_type="application/problem+json",
)
```
This keeps the wire format identical (extensions are still top-level fields) while removing one data structure (`extensions` on the exception) and one merge step from the handler. Combined with the enum change, you remove duplication and some mental overhead without changing behavior or the RFC 9457-compliant API surface.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if new.status_code != HTTPStatus.OK: | ||
| assert original.json()["error"] == new.json()["detail"] | ||
| # RFC 9457: Python API now returns problem+json format | ||
| assert new.headers["content-type"] == "application/problem+json" | ||
| # Both APIs should return error responses in the same cases | ||
| assert "error" in original.json() | ||
| return |
There was a problem hiding this comment.
suggestion (testing): Strengthen migration test for dataset error responses by asserting basic RFC9457 fields.
This currently only checks the application/problem+json content type and that the PHP response has an "error" key. To better protect the new behavior while keeping this as a migration test (not a full schema check), consider also asserting minimal RFC9457 fields, for example:
error = new.json()
assert "type" in error
assert "status" in error
assert error["status"] == new.status_codeThis gives early warning if the problem-details structure or status field diverges from expectations without making the test brittle.
| if new.status_code != HTTPStatus.OK: | |
| assert original.json()["error"] == new.json()["detail"] | |
| # RFC 9457: Python API now returns problem+json format | |
| assert new.headers["content-type"] == "application/problem+json" | |
| # Both APIs should return error responses in the same cases | |
| assert "error" in original.json() | |
| return | |
| if new.status_code != HTTPStatus.OK: | |
| # RFC 9457: Python API now returns problem+json format | |
| assert new.headers["content-type"] == "application/problem+json" | |
| # Both APIs should return error responses in the same cases | |
| assert "error" in original.json() | |
| # Minimal RFC 9457 structure checks for Python API response | |
| error = new.json() | |
| assert "type" in error | |
| assert "status" in error | |
| assert error["status"] == new.status_code | |
| return |
| # RFC 9457: Python API now returns problem+json format | ||
| assert new.headers["content-type"] == "application/problem+json" | ||
| # Both APIs should error in the same cases | ||
| assert "error" in original.json() | ||
| return |
There was a problem hiding this comment.
suggestion (testing): Preserve explicit checks for legacy error codes in the features migration test to ensure backward-compatibility.
This test used to compare the legacy code field between PHP and Python. After switching to RFC9457, it only validates type/status and the presence of an "error" block. Since this endpoint historically exposed specific error codes (272/273/274), please keep a minimal assertion that the legacy code is still present and string-typed in the Python response, e.g.:
error = response.json()
assert "code" in error
assert isinstance(error["code"], str)
# optionally:
assert error["code"] in {"272", "273", "274"}| # RFC 9457: Python API now returns problem+json format | |
| assert new.headers["content-type"] == "application/problem+json" | |
| # Both APIs should error in the same cases | |
| assert "error" in original.json() | |
| return | |
| # RFC 9457: Python API now returns problem+json format | |
| assert new.headers["content-type"] == "application/problem+json" | |
| error = new.json() | |
| # Verify Python API returns properly typed RFC 9457 response | |
| assert "type" in error | |
| assert "status" in error | |
| # Preserve legacy error code field for backward-compatibility | |
| assert "code" in error | |
| assert isinstance(error["code"], str) | |
| assert error["code"] in {"272", "273", "274"} | |
| # Both APIs should error in the same cases | |
| assert "error" in original.json() | |
| return |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #238 +/- ##
=======================================
Coverage ? 56.15%
=======================================
Files ? 32
Lines ? 1284
Branches ? 105
=======================================
Hits ? 721
Misses ? 561
Partials ? 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/routers/mldcat_ap/dataset.py (1)
153-154:⚠️ Potential issue | 🟡 MinorUnhandled
StopIterationif quality name doesn't match any results.Line 154 uses a generator expression with
next()but has no fallback. Ifquality_namedoesn't match any quality in the list, this will raise a rawStopIteration, resulting in an uncontrolled 500. Consider catching this and raising a properraise_problemerror consistent with the new error pattern.Suggested fix
- quality = next(q for q in qualities if q.name == quality_name) + quality = next((q for q in qualities if q.name == quality_name), None) + if quality is None: + raise_problem( + status_code=HTTPStatus.NOT_FOUND, + type_=ProblemType.NO_RESULTS, + detail=f"Quality '{quality_name}' not found for distribution {distribution_id}.", + )tests/routers/openml/dataset_tag_test.py (1)
83-84:⚠️ Potential issue | 🟡 MinorPre-existing bug: missing
=in API key query parameter.Line 84 has
f"/datasets/tag?api_key{ApiKey.ADMIN}"which produces a URL like/datasets/tag?api_keyabcinstead of/datasets/tag?api_key=abc. The test likely still passes because FastAPI body validation rejects the invalid tag before authentication is checked, but the request is sent without a valid API key.🐛 Proposed fix
- f"/datasets/tag?api_key{ApiKey.ADMIN}", + f"/datasets/tag?api_key={ApiKey.ADMIN}",src/routers/openml/datasets.py (1)
255-278:⚠️ Potential issue | 🟡 MinorDocstring still references
ProblemDetailException.Line 262 says "Raises ProblemDetailException" but the actual exception class is
ProblemDetailError.Proposed fix
- Raises ProblemDetailException if the dataset does not exist or the user can not access it. + Raises ProblemDetailError if the dataset does not exist or the user can not access it.
🤖 Fix all issues with AI agents
In `@src/core/errors.py`:
- Around line 58-89: Update the class docstring to reference the correct class
name: replace occurrences of "ProblemDetailException" with "ProblemDetailError"
(in the usage example) so the example matches the actual class name
ProblemDetailError; ensure any other mentions inside the docstring align with
the class identifier ProblemDetailError and keep the example parameters
(status_code, detail, title, type_, code) unchanged.
In `@src/routers/openml/qualities.py`:
- Around line 38-43: The raised problem in raise_problem uses mismatched enums:
ProblemType.DATASET_NOT_FOUND with code=DatasetError.NO_DATA_FILE; update this
to be consistent by either changing code to DatasetError.NOT_FOUND to match
ProblemType.DATASET_NOT_FOUND, or change type_ to
ProblemType.DATASET_NO_DATA_FILE if the intent is to signal a missing data file;
adjust the call site (raise_problem) accordingly following the pattern used in
datasets.py so ProblemType and DatasetError align (e.g., use
ProblemType.DATASET_NOT_FOUND + DatasetError.NOT_FOUND or
ProblemType.DATASET_NO_DATA_FILE + DatasetError.NO_DATA_FILE).
🧹 Nitpick comments (3)
tests/routers/openml/migration/datasets_migration_test.py (1)
235-242: Consider asserting a specificProblemTypefor feature error cases.Lines 239–240 only check that
"type"and"status"keys exist in the error payload, unlike other migration tests (e.g., Lines 112–113) which assert exactProblemTypevalues. If multiple error types are possible here (e.g., dataset not found vs. features not processed), a set-based check likeassert error["type"] in {ProblemType.DATASET_NOT_FOUND, ProblemType.DATASET_NO_FEATURES, ...}would be more precise while still flexible.src/core/errors.py (2)
92-109: Extension keys could shadow standard RFC 9457 fields.
content.update(exc.extensions)on Line 103 could overwrite standard problem detail fields (type,title,status,detail,instance) if an extension key collides. Currentlyraise_problemonly adds"code", but directProblemDetailErrorconstruction could introduce collisions.Consider either validating extension keys or applying them with a namespace/prefix guard.
Proposed defensive guard
content = exc.problem.model_dump(by_alias=True, exclude_none=True) - content.update(exc.extensions) + reserved = {"type", "title", "status", "detail", "instance"} + safe_extensions = {k: v for k, v in exc.extensions.items() if k not in reserved} + content.update(safe_extensions)
162-190: Consider a test to enforcePROBLEM_TITLEScovers allProblemTypeconstants.
raise_problemsilently falls back totitle=Nonewhen a type URI isn't inPROBLEM_TITLES. If a newProblemTypeconstant is added without a corresponding title entry, responses will lack a title with no warning. A simple test iterating overProblemTypeclass attributes and asserting presence inPROBLEM_TITLESwould prevent this drift.Also applies to: 193-222
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/core/errors.py`:
- Line 93: Update the docstring that currently reads "FastAPI exception handler
for ProblemDetailException" to reference the correct class name
ProblemDetailError; locate the docstring associated with the ProblemDetailError
FastAPI exception handler (search for ProblemDetailError and its handler
function) and change the class name in the text so the docstring correctly
states "FastAPI exception handler for ProblemDetailError."
- Around line 99-100: The code currently merges exc.extensions into the
ProblemDetail dict with content.update(exc.extensions), which allows extensions
to overwrite RFC 9457 standard members; change this to filter or validate
extensions before merging: identify the standard keys (type, title, status,
detail, instance) and either remove any of those keys from exc.extensions or
raise a validation error in ProblemDetailError.__init__ when a collision occurs,
then merge only the allowed extension keys (e.g., filtered_extensions) into
content instead of exc.extensions to ensure standard fields cannot be
overwritten.
🧹 Nitpick comments (1)
src/core/errors.py (1)
111-187: No enforcement that everyProblemTypehas a correspondingPROBLEM_TITLESentry.Adding a new constant to
ProblemTypewithout updatingPROBLEM_TITLESwill silently produce responses withtitle: null. Consider adding a module-level assertion or a test to keep them in sync.# e.g., at module level or in a test: _problem_types = {v for k, v in vars(ProblemType).items() if not k.startswith("_")} assert _problem_types == set(PROBLEM_TITLES), "ProblemType and PROBLEM_TITLES are out of sync"
Support RFC9457 compliant errors.