[MNT] Use new test database image by PGijsbers · Pull Request #236 · openml/server-api

PGijsbers · 2026-02-11T07:23:57Z

The new test database image has a few differences. It includes some runs and ontologies so it will be useful in the future. It's also self contained so a startup script shouldn't be needed, with helps with test containers (#232).

coderabbitai · 2026-02-11T07:24:30Z

Walkthrough

Updated docker-compose.yaml: database image tag changed and the database-setup service removed. src/config.toml adds a [development] section with allow_test_api_keys = true. src/database/users.py loads configuration and defines APIKey validation pattern at module scope, switching to allow test literals when allow_test_api_keys is true. Tests adjusted: tests/constants.py adds datasets 161–163; multiple dataset-, study-, and user-related tests updated; tests/users.py changes user constants and the ApiKey enum and adds DATASET_130_OWNER.

Possibly related PRs

Move extra dev commits into main #218: overlapping changes to docker-compose.yaml, tests/users.py, and tests/constants.py.
Update pre-commit hooks #219: updates to ApiKey definitions in tests/users.py aligning with APIKey validation changes.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adopting a new test database image, which aligns with the primary modifications to docker-compose.yaml and supporting test updates.
Description check	✅ Passed	The description is related to the changeset, explaining the motivation behind using a new database image that is self-contained and removes the need for a startup script.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch enh/use-new-db-image

No actionable comments were generated in the recent review. 🎉

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In test_attach_task_to_study_needs_owner, the setup updates study id 7 but the request uses study_id=1 and ApiKey.OWNER_USER; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario.
The APIKey type in src/database/users.py now hardcodes specific literal keys (e.g. abc, normaluser), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately.
Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `test_attach_task_to_study_needs_owner`, the setup updates study id 7 but the request uses `study_id=1` and `ApiKey.OWNER_USER`; this looks inconsistent with the test name/intent and the previous behavior (non-owner access) and may no longer be verifying the correct scenario.
- The `APIKey` type in `src/database/users.py` now hardcodes specific literal keys (e.g. `abc`, `normaluser`), which tightly couples production validation to test credentials; consider keeping the production constraint generic and injecting or configuring test-only keys separately.
- Several tests now rely on magic ids for studies and datasets (e.g. 7, 161–163); it may be more maintainable to centralize these in descriptive constants so the intent is clearer and updates to the test image only require changes in one place.

## Individual Comments

### Comment 1
<location> `src/database/users.py:9-11` </location>
<code_context>

 # Enforces str is 32 hexadecimal characters, does not check validity.
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+    str,
+    StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]

</code_context>

<issue_to_address>
**issue (bug_risk):** Regex alternation is not fully anchored, which allows extra characters for some variants.

Because of the grouping in `r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"`, `^` only anchors the first alternative and `$` only the last. As a result, values like `"abcXYZ"` or `"fooabc"` will incorrectly match. To require the entire string to be one of the allowed values, anchor a non-capturing alternation instead: `r"^(?:[0-9a-fA-F]{32}|abc|normaluser|normaluser2)$"`.
</issue_to_address>

### Comment 2
<location> `src/database/users.py:11` </location>
<code_context>
-APIKey = Annotated[str, StringConstraints(pattern=r"^[0-9a-fA-F]{32}$")]
+APIKey = Annotated[
+    str,
+    StringConstraints(pattern=r"^([0-9a-fA-F]{32})|(abc)|(normaluser)|(normaluser2)$"),
+]

</code_context>

<issue_to_address>
**🚨 question (security):** Allowing specific non-hex API keys in production validation may weaken API key semantics.

By adding literals like `abc`, `normaluser`, and `normaluser2`, this type now permits hard-coded “magic” API keys instead of enforcing the 32-char hex contract. If these are only for tests/fixtures, consider keeping the production type strict (hex-only) and routing test keys through separate configuration or test-only validation, so these values can’t accidentally become valid in real environments.
</issue_to_address>

### Comment 3
<location> `tests/routers/openml/study_test.py:534-526` </location>
<code_context>
+    assert response.json() == {"study_id": 7, "main_entity_type": StudyType.TASK}


 def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None:
-    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 1"))
+    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7"))
     response = _attach_tasks_to_study(
         study_id=1,
         task_ids=[2, 3, 4],
-        api_key="00000000000000000000000000000000",
+        api_key=ApiKey.OWNER_USER,
         py_api=py_api,
         expdb_test=expdb_test,
</code_context>

<issue_to_address>
**issue (testing):** The `test_attach_task_to_study_needs_owner` test now uses an owner API key and mismatched study ids, which likely breaks the intent of the test.

This test should assert that a non-owner cannot attach tasks. As written, it:
- Updates `study.status` for `id = 7` but calls `_attach_tasks_to_study` with `study_id=1`, so the precondition may not hold for the target study.
- Uses `ApiKey.OWNER_USER`, which turns a “needs owner” negative test into an owner / positive-path call.

Please update the same `study_id` in both the SQL and `_attach_tasks_to_study`, and use a non-owner API key (e.g. `ApiKey.SOME_USER`) here, keeping `ApiKey.OWNER_USER` for the success-case test.
</issue_to_address>

### Comment 4
<location> `tests/routers/openml/migration/datasets_migration_test.py:228` </location>
<code_context>
             else:
                 # The old API formats bool as string in lower-case
                 feature[key] = str(value) if not isinstance(value, bool) else str(value).lower()
-    assert python_body == original.json()["data_features"]["feature"]
+    original_features = original.json()["data_features"]["feature"]
+    for feature in original_features:
+        feature.pop("ontology", None)
+    assert python_body == original_features
</code_context>

<issue_to_address>
**suggestion (testing):** Dropping the `ontology` field in `test_datasets_feature_is_identical` removes coverage of the new ontology data.

Popping `ontology` avoids failures but also means we no longer verify any ontology data, despite it being newly added.

To preserve coverage, add explicit checks before popping, for example:

```python
overiginal_features = original.json()["data_features"]["feature"]
for feature in original_features:
    assert "ontology" in feature  # and/or more specific checks
    feature.pop("ontology", None)
```

Alternatively, add a separate test that asserts the expected ontology fields, rather than dropping them silently.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/database/users.py

tests/routers/openml/study_test.py

tests/routers/openml/migration/datasets_migration_test.py

codecov · 2026-02-11T07:26:02Z

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@7c0916b). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/database/users.py	80.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #236   +/-   ##
=======================================
  Coverage        ?   53.58%           
=======================================
  Files           ?       32           
  Lines           ?     1116           
  Branches        ?      101           
=======================================
  Hits            ?      598           
  Misses          ?      516           
  Partials        ?        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/database/users.py`:
- Around line 8-12: The APIKey regex in the APIKey Annotated type is vulnerable
because its alternation is not fully anchored and allows substring matches
(symbol: APIKey, StringConstraints), so replace the pattern with a single
anchored grouped alternation like ^([0-9a-fA-F]{32}|abc|normaluser|normaluser2)$
to ensure full-string matches; remove test-only literals (abc, normaluser,
normaluser2) from production or move them to test helpers if they must exist;
also update the comment above APIKey to accurately state what the pattern
enforces (or remove the stale comment about 32 hex chars).

In `@tests/users.py`:
- Around line 12-16: The test ApiKey literals (ApiKey.ADMIN, ApiKey.SOME_USER,
ApiKey.OWNER_USER) are non-hex and forced a relaxation of the production APIKey
validator; instead, update the test DB image so the users' session_hash values
are 32-character hex strings and restore the strict validator in
src/database/users.py; specifically, change the test DB records for users
corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex
session_hash values, revert any weakened validation logic, and keep the
ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone
if it references test keys) so production validation remains strict.

🧹 Nitpick comments (2)

tests/routers/openml/study_test.py (1)
534-543: Line 535: Setting study 7 to in_preparation appears unnecessary here.

This test operates on study 1 (line 537), and the helper _attach_tasks_to_study already sets study 1 to in_preparation (line 514). The UPDATE ... WHERE id = 7 on line 535 has no effect on the test outcome and looks like a copy-paste leftover from test_attach_task_to_study above.
Remove the unnecessary update
 def test_attach_task_to_study_needs_owner(py_api: TestClient, expdb_test: Connection) -> None:
-    expdb_test.execute(text("UPDATE study SET status = 'in_preparation' WHERE id = 7"))
     response = _attach_tasks_to_study(
tests/routers/openml/datasets_list_datasets_test.py (1)

96-102: Consider deriving extra_datasets from constants to avoid duplication.

The hardcoded [161, 162, 163] duplicates knowledge that lives in tests/constants.py (where IN_PREPARATION_ID and DATASETS were updated with these same IDs). If the DB image changes again, this list needs manual sync.

That said, it's a minor concern given the explanatory comment on Line 96 and the test's tight coupling to a specific DB snapshot.

src/database/users.py

coderabbitai · 2026-02-11T07:30:06Z

tests/users.py

 class ApiKey(StrEnum):
-    ADMIN = "AD000000000000000000000000000000"
-    SOME_USER = "00000000000000000000000000000000"
-    OWNER_USER = "DA1A0000000000000000000000000000"
+    ADMIN = "abc"
+    SOME_USER = "normaluser2"
+    OWNER_USER = "normaluser"
+    DATASET_130_OWNER = "DA1A0000000000000000000000000000"


🛠️ Refactor suggestion | 🟠 Major

Non-hex API keys drive the production validation change — consider updating the test DB instead.

The literal keys "abc", "normaluser", and "normaluser2" don't conform to the hex-format expected by the production APIKey validator, which is why src/database/users.py had to be weakened. A cleaner approach would be to update the session_hash values for these users in the test database image to use 32-character hex strings, keeping production validation strict.

🤖 Prompt for AI Agents

In `@tests/users.py` around lines 12 - 16, The test ApiKey literals (ApiKey.ADMIN, ApiKey.SOME_USER, ApiKey.OWNER_USER) are non-hex and forced a relaxation of the production APIKey validator; instead, update the test DB image so the users' session_hash values are 32-character hex strings and restore the strict validator in src/database/users.py; specifically, change the test DB records for users corresponding to "abc"/"normaluser"/"normaluser2" to use valid 32-char hex session_hash values, revert any weakened validation logic, and keep the ApiKey.StrEnum values matching those hex session_hashes (or leave the enum alone if it references test keys) so production validation remains strict.

I agree, which is why I had constructed the previous test database keys that way. But the new keys were chosen by @josvandervelde and are now used in both openml-python and openml java. We need to coordinate changing the keys again, so I am not updating the image for this PR.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

SubhadityaMukherjee

minor comments

src/database/users.py

tests/routers/openml/study_test.py

src/database/users.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/database/users.py`:
- Around line 12-14: The code accesses load_configuration()["development"]
directly which raises KeyError if the [development] section is missing; update
the logic around api_key_pattern to access the development section safely (e.g.,
use load_configuration().get("development", {}) or assign cfg =
load_configuration(); dev = cfg.get("development", {}) ) and then check
dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t
crash; modify the code that sets api_key_pattern (and any other direct
["development"] usages) to use .get and a sensible default.

coderabbitai · 2026-02-11T13:17:27Z

src/database/users.py

+api_key_pattern = r"^[0-9a-fA-F]{32}$"
+if load_configuration()["development"].get("allow_test_api_keys"):
+    api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"


⚠️ Potential issue | 🟡 Minor

["development"] will KeyError if the section is absent from config.toml.

Line 13 mixes strict (["development"]) and safe (.get("allow_test_api_keys")) dictionary access. If a config file omits the [development] section entirely, this crashes at import time. Use .get() consistently or handle the missing section:

Proposed fix

-api_key_pattern = r"^[0-9a-fA-F]{32}$" -if load_configuration()["development"].get("allow_test_api_keys"): - api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$" +api_key_pattern = r"^[0-9a-fA-F]{32}$" +_config = load_configuration() +if _config.get("development", {}).get("allow_test_api_keys"): + api_key_pattern = r"^([0-9a-fA-F]{32}|normaluser|normaluser2|abc)$"

#!/bin/bash # Check if [development] section is always present in config files fd "config.toml" --exec cat {}

🤖 Prompt for AI Agents

In `@src/database/users.py` around lines 12 - 14, The code accesses load_configuration()["development"] directly which raises KeyError if the [development] section is missing; update the logic around api_key_pattern to access the development section safely (e.g., use load_configuration().get("development", {}) or assign cfg = load_configuration(); dev = cfg.get("development", {}) ) and then check dev.get("allow_test_api_keys") before changing api_key_pattern so imports won’t crash; modify the code that sets api_key_pattern (and any other direct ["development"] usages) to use .get and a sensible default.

I’m somehow unable to accept all changes but all have been fixed so I’m just dismissing it

PGijsbers added 9 commits February 10, 2026 16:37

Update to newer database image

509eccf

Update validation to work with new user keys (temporarily)

c4a459e

Update constant with new unprocessed datasets

d327685

Update test users with their new IDs

c7a220b

Update for new ownership of studies

17b1794

Ignore ontology of features for now as it is not in Python API

89d623a

Add back in key for private dataset owner

f87ecf5

User private dataset owner

d866253

Update constants to match new test database state

cb1689a

sourcery-ai bot reviewed Feb 11, 2026

View reviewed changes

src/database/users.py Outdated Show resolved Hide resolved

src/database/users.py Outdated Show resolved Hide resolved

tests/routers/openml/study_test.py Show resolved Hide resolved

tests/routers/openml/migration/datasets_migration_test.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

PGijsbers added 2 commits February 11, 2026 09:31

Database setup script not needed with new image

10a17d6

Allow more lenient api key through configuration

a31bd5c

PGijsbers requested a review from SubhadityaMukherjee February 11, 2026 08:08

SubhadityaMukherjee previously requested changes Feb 11, 2026

View reviewed changes

src/database/users.py Show resolved Hide resolved

tests/routers/openml/study_test.py Show resolved Hide resolved

PGijsbers commented Feb 11, 2026

View reviewed changes

src/database/users.py Show resolved Hide resolved

Apply suggestion from @PGijsbers

9fb0bde

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

PGijsbers mentioned this pull request Feb 11, 2026

Add ontology support #237

Open

Wrap line to adhere to line limit

3a18200

PGijsbers merged commit 294ca47 into main Feb 12, 2026
9 checks passed

PGijsbers deleted the enh/use-new-db-image branch February 12, 2026 08:20

PGijsbers mentioned this pull request Feb 13, 2026

Build Docker image with prepopulated test database #3

Closed

Uh oh!

Conversation

PGijsbers commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

SubhadityaMukherjee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

codecov bot commented Feb 11, 2026 •

edited

Loading

coderabbitai bot Feb 11, 2026 •

edited

Loading

PGijsbers Feb 11, 2026 •

edited

Loading