Conversation
WalkthroughUpdated Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
test_attach_task_to_study_needs_owner, the setup updates study id 7 but the request usesstudy_id=1andApiKey.OWNER_USER; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario. - The
APIKeytype insrc/database/users.pynow hardcodes specific literal keys (e.g.abc,normaluser), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately. - Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `test_attach_task_to_study_needs_owner`, the setup updates study id 7 but the request uses `study_id=1` and `ApiKey.OWNER_USER`; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario.
- The `APIKey` type in `src/database/users.py` now hardcodes specific literal keys (e.g. `abc`, `normaluser`), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately.
- Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.
## Individual Comments
### Comment 1
<location> `src/database/users.py:9-11` </location>
<code_context>
# Enforces str is 32 hexadecimal characters, does not check validity.
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+ str,
+ StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]
</code_context>
<issue_to_address>
**issue (bug_risk):** Regex alternation is not fully anchored, which allows extra characters for some variants.
Because of the grouping in `r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"`, `^` only anchors the first alternative and `$` only the last. As a result, values like `"abcXYZ"` or `"fooabc"` will incorrectly match. To require the entire string to be one of the allowed values, anchor a non-capturing alternation instead: `r"^(?:[0-9a-fA-F]{32}|abc|normaluser|normaluser2)$"`.
</issue_to_address>
### Comment 2
<location> `src/database/users.py:11` </location>
<code_context>
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+ str,
+ StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]
</code_context>
<issue_to_address>
**🚨 question (security):** Allowing specific non-hex API keys in production validation may weaken API key semantics.
By adding literals like `abc`, `normaluser`, and `normaluser2`, this type now permits hard-coded “magic” API keys instead of enforcing the 32-char hex contract. If these are only for tests/fixtures, consider keeping the production type strict (hex-only) and routing test keys through separate configuration or test-only validation, so these values can’t accidentally become valid in real environments.
</issue_to_address>
### Comment 3
<location> `tests/routers/openml/study_test.py:534-526` </location>
<code_context>
+ assert response.json() == {"study_id": 7, "main_entity_type": StudyType.TASK}
def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None:
- expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1"))
+ expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7"))
response = _attach_tasks_to_study(
study_id=1,
task_ids=[2, 3, 4],
- api_key="00000000000000000000000000000000",
+ api_key=ApiKey.OWNER_USER,
py_api=py_api,
expdb_test=expdb_test,
</code_context>
<issue_to_address>
**issue (testing):** The `test_attach_task_to_study_needs_owner` test now uses an owner API key and mismatched study ids, which likely breaks the intent of the test.
This test should assert that a non-owner cannot attach tasks. As written, it:
- Updates `study.status` for `id = 7` but calls `_attach_tasks_to_study` with `study_id=1`, so the precondition may not hold for the target study.
- Uses `ApiKey.OWNER_USER`, which turns a “needs owner” negative test into an owner / positive-path call.
Please update the same `study_id` in both the SQL and `_attach_tasks_to_study`, and use a non-owner API key (e.g. `ApiKey.SOME_USER`) here, keeping `ApiKey.OWNER_USER` for the success-case test.
</issue_to_address>
### Comment 4
<location> `tests/routers/openml/migration/datasets_migration_test.py:228` </location>
<code_context>
else:
# The old API formats bool as string in lower-case
feature[key] = str(value) if not isinstance(value, bool) else str(value).lower()
- assert python_body == original.json()["data_features"]["feature"]
+ original_features = original.json()["data_features"]["feature"]
+ for feature in original_features:
+ feature.pop("ontology", None)
+ assert python_body == original_features
</code_context>
<issue_to_address>
**suggestion (testing):** Dropping the `ontology` field in `test_datasets_feature_is_identical` removes coverage of the new ontology data.
Popping `ontology` avoids failures but also means we no longer verify any ontology data, despite it being newly added.
To preserve coverage, add explicit checks before popping, for example:
```python
overiginal_features = original.json()["data_features"]["feature"]
for feature in original_features:
assert "ontology" in feature # and/or more specific checks
feature.pop("ontology", None)
```
Alternatively, add a separate test that asserts the expected ontology fields, rather than dropping them silently.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #236 +/- ##
=======================================
Coverage ? 53.58%
=======================================
Files ? 32
Lines ? 1116
Branches ? 101
=======================================
Hits ? 598
Misses ? 516
Partials ? 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/database/users.py`:
- Around line 8-12: The APIKey regex in the APIKey Annotated type is vulnerable
because its alternation is not fully anchored and allows substring matches
(symbol: APIKey, StringConstraints), so replace the pattern with a single
anchored grouped alternation like ^([0-9a-fA-F]{32}|abc|normaluser|normaluser2)$
to ensure full-string matches; remove test-only literals (abc, normaluser,
normaluser2) from production or move them to test helpers if they must exist;
also update the comment above APIKey to accurately state what the pattern
enforces (or remove the stale comment about 32 hex chars).
In `@tests/users.py`:
- Around line 12-16: The test ApiKey literals (ApiKey.ADMIN, ApiKey.SOME_USER,
ApiKey.OWNER_USER) are non-hex and forced a relaxation of the production APIKey
validator; instead, update the test DB image so the users' session_hash values
are 32-character hex strings and restore the strict validator in
src/database/users.py; specifically, change the test DB records for users
corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex
session_hash values, revert any weakened validation logic, and keep the
ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone
if it references test keys) so production validation remains strict.
🧹 Nitpick comments (2)
tests/routers/openml/study_test.py (1)
534-543: Line 535: Setting study 7 toin_preparationappears unnecessary here.This test operates on study 1 (line 537), and the helper
_attach_tasks_to_studyalready sets study 1 toin_preparation(line 514). TheUPDATE ... WHERE id = 7on line 535 has no effect on the test outcome and looks like a copy-paste leftover fromtest_attach_task_to_studyabove.Remove the unnecessary update
def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None: - expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7")) response = _attach_tasks_to_study(tests/routers/openml/datasets_list_datasets_test.py (1)
96-102: Consider derivingextra_datasetsfrom constants to avoid duplication.The hardcoded
[161, 162, 163]duplicates knowledge that lives intests/constants.py(whereIN_PREPARATION_IDandDATASETSwere updated with these same IDs). If the DB image changes again, this list needs manual sync.That said, it's a minor concern given the explanatory comment on Line 96 and the test's tight coupling to a specific DB snapshot.
| class ApiKey(StrEnum): | ||
| ADMIN = "AD000000000000000000000000000000" | ||
| SOME_USER = "00000000000000000000000000000000" | ||
| OWNER_USER = "DA1A0000000000000000000000000000" | ||
| ADMIN = "abc" | ||
| SOME_USER = "normaluser2" | ||
| OWNER_USER = "normaluser" | ||
| DATASET_130_OWNER = "DA1A0000000000000000000000000000" |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Non-hex API keys drive the production validation change — consider updating the test DB instead.
The literal keys "abc", "normaluser", and "normaluser2" don't conform to the hex-format expected by the production APIKey validator, which is why src/database/users.py had to be weakened. A cleaner approach would be to update the session_hash values for these users in the test database image to use 32-character hex strings, keeping production validation strict.
🤖 Prompt for AI Agents
In `@tests/users.py` around lines 12 - 16, The test ApiKey literals (ApiKey.ADMIN,
ApiKey.SOME_USER, ApiKey.OWNER_USER) are non-hex and forced a relaxation of the
production APIKey validator; instead, update the test DB image so the users'
session_hash values are 32-character hex strings and restore the strict
validator in src/database/users.py; specifically, change the test DB records for
users corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex
session_hash values, revert any weakened validation logic, and keep the
ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone
if it references test keys) so production validation remains strict.
There was a problem hiding this comment.
I agree, which is why I had constructed the previous test database keys that way. But the new keys were chosen by @josvandervelde and are now used in both openml-python and openml java. We need to coordinate changing the keys again, so I am not updating the image for this PR.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
SubhadityaMukherjee
left a comment
There was a problem hiding this comment.
minor comments
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/database/users.py`:
- Around line 12-14: The code accesses load_configuration()["development"]
directly which raises KeyError if the [development] section is missing; update
the logic around api_key_pattern to access the development section safely (e.g.,
use load_configuration().get("development", {}) or assign cfg =
load_configuration(); dev = cfg.get("development", {}) ) and then check
dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t
crash; modify the code that sets api_key_pattern (and any other direct
["development"] usages) to use .get and a sensible default.
| api_key_pattern = r"^[0-9a-fA-F]{32}$" | ||
| if load_configuration()["development"].get("allow_test_api_keys"): | ||
| api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$" |
There was a problem hiding this comment.
["development"] will KeyError if the section is absent from config.toml.
Line 13 mixes strict (["development"]) and safe (.get("allow_test_api_keys")) dictionary access. If a config file omits the [development] section entirely, this crashes at import time. Use .get() consistently or handle the missing section:
Proposed fix
-api_key_pattern = r"^[0-9a-fA-F]{32}$"
-if load_configuration()["development"].get("allow_test_api_keys"):
- api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"
+api_key_pattern = r"^[0-9a-fA-F]{32}$"
+_config = load_configuration()
+if _config.get("development", {}).get("allow_test_api_keys"):
+ api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"#!/bin/bash
# Check if [development] section is always present in config files
fd "config.toml" --exec cat {}🤖 Prompt for AI Agents
In `@src/database/users.py` around lines 12 - 14, The code accesses
load_configuration()["development"] directly which raises KeyError if the
[development] section is missing; update the logic around api_key_pattern to
access the development section safely (e.g., use
load_configuration().get("development", {}) or assign cfg =
load_configuration(); dev = cfg.get("development", {}) ) and then check
dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t
crash; modify the code that sets api_key_pattern (and any other direct
["development"] usages) to use .get and a sensible default.
I’m somehow unable to accept all changes but all have been fixed so I’m just dismissing it
The new test database image has a few differences. It includes some runs and ontologies so it will be useful in the future. It's also self contained so a startup script shouldn't be needed, with helps with test containers (#232).