Skip to content

[MNT] Use new test database image#236

Merged
PGijsbers merged 13 commits intomainfrom
enh/use-new-db-image
Feb 12, 2026
Merged

[MNT] Use new test database image#236
PGijsbers merged 13 commits intomainfrom
enh/use-new-db-image

Conversation

@PGijsbers
Copy link
Contributor

The new test database image has a few differences. It includes some runs and ontologies so it will be useful in the future. It's also self contained so a startup script shouldn't be needed, with helps with test containers (#232).

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Walkthrough

Updated docker-compose.yaml: database image tag changed and the database-setup service removed. src/config.toml adds a [development] section with allow_test_api_keys = true. src/database/users.py loads configuration and defines APIKey validation pattern at module scope, switching to allow test literals when allow_test_api_keys is true. Tests adjusted: tests/constants.py adds datasets 161–163; multiple dataset-, study-, and user-related tests updated; tests/users.py changes user constants and the ApiKey enum and adds DATASET_130_OWNER.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adopting a new test database image, which aligns with the primary modifications to docker-compose.yaml and supporting test updates.
Description check ✅ Passed The description is related to the changeset, explaining the motivation behind using a new database image that is self-contained and removes the need for a startup script.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enh/use-new-db-image

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In test_attach_task_to_study_needs_owner, the setup updates study id 7 but the request uses study_id=1 and ApiKey.OWNER_USER; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario.
  • The APIKey type in src/database/users.py now hardcodes specific literal keys (e.g. abc, normaluser), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately.
  • Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `test_attach_task_to_study_needs_owner`, the setup updates study id 7 but the request uses `study_id=1` and `ApiKey.OWNER_USER`; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario.
- The `APIKey` type in `src/database/users.py` now hardcodes specific literal keys (e.g. `abc`, `normaluser`), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately.
- Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.

## Individual Comments

### Comment 1
<location> `src/database/users.py:9-11` </location>
<code_context>

 # Enforces str is 32 hexadecimal characters, does not check validity.
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+    str,
+    StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]

</code_context>

<issue_to_address>
**issue (bug_risk):** Regex alternation is not fully anchored, which allows extra characters for some variants.

Because of the grouping in `r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"`, `^` only anchors the first alternative and `$` only the last. As a result, values like `"abcXYZ"` or `"fooabc"` will incorrectly match. To require the entire string to be one of the allowed values, anchor a non-capturing alternation instead: `r"^(?:[0-9a-fA-F]{32}|abc|normaluser|normaluser2)$"`.
</issue_to_address>

### Comment 2
<location> `src/database/users.py:11` </location>
<code_context>
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+    str,
+    StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]

</code_context>

<issue_to_address>
**🚨 question (security):** Allowing specific non-hex API keys in production validation may weaken API key semantics.

By adding literals like `abc`, `normaluser`, and `normaluser2`, this type now permits hard-coded “magic” API keys instead of enforcing the 32-char hex contract. If these are only for tests/fixtures, consider keeping the production type strict (hex-only) and routing test keys through separate configuration or test-only validation, so these values can’t accidentally become valid in real environments.
</issue_to_address>

### Comment 3
<location> `tests/routers/openml/study_test.py:534-526` </location>
<code_context>
+    assert response.json() == {"study_id": 7, "main_entity_type": StudyType.TASK}


 def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None:
-    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1"))
+    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7"))
     response = _attach_tasks_to_study(
         study_id=1,
         task_ids=[2, 3, 4],
-        api_key="00000000000000000000000000000000",
+        api_key=ApiKey.OWNER_USER,
         py_api=py_api,
         expdb_test=expdb_test,
</code_context>

<issue_to_address>
**issue (testing):** The `test_attach_task_to_study_needs_owner` test now uses an owner API key and mismatched study ids, which likely breaks the intent of the test.

This test should assert that a non-owner cannot attach tasks. As written, it:
- Updates `study.status` for `id = 7` but calls `_attach_tasks_to_study` with `study_id=1`, so the precondition may not hold for the target study.
- Uses `ApiKey.OWNER_USER`, which turns a “needs owner” negative test into an owner / positive-path call.

Please update the same `study_id` in both the SQL and `_attach_tasks_to_study`, and use a non-owner API key (e.g. `ApiKey.SOME_USER`) here, keeping `ApiKey.OWNER_USER` for the success-case test.
</issue_to_address>

### Comment 4
<location> `tests/routers/openml/migration/datasets_migration_test.py:228` </location>
<code_context>
             else:
                 # The old API formats bool as string in lower-case
                 feature[key] = str(value) if not isinstance(value, bool) else str(value).lower()
-    assert python_body == original.json()["data_features"]["feature"]
+    original_features = original.json()["data_features"]["feature"]
+    for feature in original_features:
+        feature.pop("ontology", None)
+    assert python_body == original_features
</code_context>

<issue_to_address>
**suggestion (testing):** Dropping the `ontology` field in `test_datasets_feature_is_identical` removes coverage of the new ontology data.

Popping `ontology` avoids failures but also means we no longer verify any ontology data, despite it being newly added.

To preserve coverage, add explicit checks before popping, for example:

```python
overiginal_features = original.json()["data_features"]["feature"]
for feature in original_features:
    assert "ontology" in feature  # and/or more specific checks
    feature.pop("ontology", None)
```

Alternatively, add a separate test that asserts the expected ontology fields, rather than dropping them silently.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@7c0916b). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/database/users.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #236   +/-   ##
=======================================
  Coverage        ?   53.58%           
=======================================
  Files           ?       32           
  Lines           ?     1116           
  Branches        ?      101           
=======================================
  Hits            ?      598           
  Misses          ?      516           
  Partials        ?        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/database/users.py`:
- Around line 8-12: The APIKey regex in the APIKey Annotated type is vulnerable
because its alternation is not fully anchored and allows substring matches
(symbol: APIKey, StringConstraints), so replace the pattern with a single
anchored grouped alternation like ^([0-9a-fA-F]{32}|abc|normaluser|normaluser2)$
to ensure full-string matches; remove test-only literals (abc, normaluser,
normaluser2) from production or move them to test helpers if they must exist;
also update the comment above APIKey to accurately state what the pattern
enforces (or remove the stale comment about 32 hex chars).

In `@tests/users.py`:
- Around line 12-16: The test ApiKey literals (ApiKey.ADMIN, ApiKey.SOME_USER,
ApiKey.OWNER_USER) are non-hex and forced a relaxation of the production APIKey
validator; instead, update the test DB image so the users' session_hash values
are 32-character hex strings and restore the strict validator in
src/database/users.py; specifically, change the test DB records for users
corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex
session_hash values, revert any weakened validation logic, and keep the
ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone
if it references test keys) so production validation remains strict.
🧹 Nitpick comments (2)
tests/routers/openml/study_test.py (1)

534-543: Line 535: Setting study 7 to in_preparation appears unnecessary here.

This test operates on study 1 (line 537), and the helper _attach_tasks_to_study already sets study 1 to in_preparation (line 514). The UPDATE ... WHERE id = 7 on line 535 has no effect on the test outcome and looks like a copy-paste leftover from test_attach_task_to_study above.

Remove the unnecessary update
 def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None:
-    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7"))
     response = _attach_tasks_to_study(
tests/routers/openml/datasets_list_datasets_test.py (1)

96-102: Consider deriving extra_datasets from constants to avoid duplication.

The hardcoded [161, 162, 163] duplicates knowledge that lives in tests/constants.py (where IN_PREPARATION_ID and DATASETS were updated with these same IDs). If the DB image changes again, this list needs manual sync.

That said, it's a minor concern given the explanatory comment on Line 96 and the test's tight coupling to a specific DB snapshot.

Comment on lines 12 to +16
class ApiKey(StrEnum):
ADMIN = "AD000000000000000000000000000000"
SOME_USER = "00000000000000000000000000000000"
OWNER_USER = "DA1A0000000000000000000000000000"
ADMIN = "abc"
SOME_USER = "normaluser2"
OWNER_USER = "normaluser"
DATASET_130_OWNER = "DA1A0000000000000000000000000000"
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Non-hex API keys drive the production validation change — consider updating the test DB instead.

The literal keys "abc", "normaluser", and "normaluser2" don't conform to the hex-format expected by the production APIKey validator, which is why src/database/users.py had to be weakened. A cleaner approach would be to update the session_hash values for these users in the test database image to use 32-character hex strings, keeping production validation strict.

🤖 Prompt for AI Agents
In `@tests/users.py` around lines 12 - 16, The test ApiKey literals (ApiKey.ADMIN,
ApiKey.SOME_USER, ApiKey.OWNER_USER) are non-hex and forced a relaxation of the
production APIKey validator; instead, update the test DB image so the users'
session_hash values are 32-character hex strings and restore the strict
validator in src/database/users.py; specifically, change the test DB records for
users corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex
session_hash values, revert any weakened validation logic, and keep the
ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone
if it references test keys) so production validation remains strict.

Copy link
Contributor Author

@PGijsbers PGijsbers Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, which is why I had constructed the previous test database keys that way. But the new keys were chosen by @josvandervelde and are now used in both openml-python and openml java. We need to coordinate changing the keys again, so I am not updating the image for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator

@SubhadityaMukherjee SubhadityaMukherjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/database/users.py`:
- Around line 12-14: The code accesses load_configuration()["development"]
directly which raises KeyError if the [development] section is missing; update
the logic around api_key_pattern to access the development section safely (e.g.,
use load_configuration().get("development", {}) or assign cfg =
load_configuration(); dev = cfg.get("development", {}) ) and then check
dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t
crash; modify the code that sets api_key_pattern (and any other direct
["development"] usages) to use .get and a sensible default.

Comment on lines +12 to +14
api_key_pattern = r"^[0-9a-fA-F]{32}$"
if load_configuration()["development"].get("allow_test_api_keys"):
api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

["development"] will KeyError if the section is absent from config.toml.

Line 13 mixes strict (["development"]) and safe (.get("allow_test_api_keys")) dictionary access. If a config file omits the [development] section entirely, this crashes at import time. Use .get() consistently or handle the missing section:

Proposed fix
-api_key_pattern = r"^[0-9a-fA-F]{32}$"
-if load_configuration()["development"].get("allow_test_api_keys"):
-    api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"
+api_key_pattern = r"^[0-9a-fA-F]{32}$"
+_config = load_configuration()
+if _config.get("development", {}).get("allow_test_api_keys"):
+    api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"
#!/bin/bash
# Check if [development] section is always present in config files
fd "config.toml" --exec cat {}
🤖 Prompt for AI Agents
In `@src/database/users.py` around lines 12 - 14, The code accesses
load_configuration()["development"] directly which raises KeyError if the
[development] section is missing; update the logic around api_key_pattern to
access the development section safely (e.g., use
load_configuration().get("development", {}) or assign cfg =
load_configuration(); dev = cfg.get("development", {}) ) and then check
dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t
crash; modify the code that sets api_key_pattern (and any other direct
["development"] usages) to use .get and a sensible default.

@SubhadityaMukherjee SubhadityaMukherjee dismissed their stale review February 11, 2026 15:09

I’m somehow unable to accept all changes but all have been fixed so I’m just dismissing it

@PGijsbers PGijsbers merged commit 294ca47 into main Feb 12, 2026
9 checks passed
@PGijsbers PGijsbers deleted the enh/use-new-db-image branch February 12, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants