Skip to content

Update to function as out-of-the-box test server#13

Open
PGijsbers wants to merge 16 commits intomainfrom
setup-test-locally
Open

Update to function as out-of-the-box test server#13
PGijsbers wants to merge 16 commits intomainfrom
setup-test-locally

Conversation

@PGijsbers
Copy link
Contributor

@PGijsbers PGijsbers commented Jan 26, 2026

Updating routing and data of the images to allow an out of the box test server on a local machine.

Currently the updated configuration allows running of the openml-python unit tests that require the test server (see openml/openml-python#1630).

Have to cross-check I didn't break other functionality in the process.

NGINX is now also listens to port 8000 on the docker network.
This is an important step to being able to start these `services`
and have them function as a local test server for openml-python
among others.

# Update openml.expdb.dataset with the same url
mysql -hdatabase -uroot -pok -e 'UPDATE openml_expdb.dataset DS, openml.file FL SET DS.url = FL.filepath WHERE DS.did = FL.id;'

Copy link
Contributor Author

@PGijsbers PGijsbers Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These removed updates are now embedded in the state of the database on the new image

sed -i -E 's/^(::1\t)localhost (.*)$/\1\2/g' /etc/hosts.new
cat /etc/hosts.new > /etc/hosts
rm /etc/hosts.new

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other containers updating /etc/hosts through configuration was sufficient.
For this one, the pre-existing /etc/hosts took precidence, so it needed to be updated.

- "8000:8000"
networks:
default:
ipv4_address: 172.28.0.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the static ip address is required so that we can add entries to /etc/hosts file of other containers, so they contact nginx when they resolve localhost.

@PGijsbers PGijsbers changed the title [WIP] Update to function as out-of-the-box test server Update to function as out-of-the-box test server Feb 5, 2026
@PGijsbers PGijsbers marked this pull request as ready for review February 5, 2026 15:25
@@ -1,4 +1,4 @@
CONFIG=api_key=AD000000000000000000000000000000;server=http://php-api:80/
CONFIG=api_key=abc;server=http://php-api:80/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, here the api key is set from AD000000000000000000000000000000 to abc ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AD000000000000000000000000000000 was the api key in the old test database image, but this has been changed to abc to match the test server database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation engine needs administrator access currently.

Comment on lines +1 to +2
apikey=normaluser
server=http://localhost:8000/api/v1/xml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here the api key is set from AD000000000000000000000000000000 to normaluser

So far, these were the keys for developers:

php-api (v1) test-server: normaluser
php-api (v1) local-server: AD000000000000000000000000000000

has anything changed here?

Also what are the api keys for python-api (v2), now that it will also be added to services with a frozen docker image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is just for when you spin up a openml-python container to use the Python API. They do not need administrator access, so I changed the key to normaluser which is a normal read-write account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python-based REST API uses the keys that are in the database. The server is unaffected, but I will need to update the keys that are used in its tests.

Copy link
Member

@josvandervelde josvandervelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I encountered some problems when using python to connect to the local running containers.

minio:
profiles: ["all", "minio", "evaluation-engine"]
image: openml/test-minio:v0.1.20241110
image: openml/test-minio:v0.1.20260204
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This minio contains most parquet files out of the box, but not all!

bash-5.1# ls /data/datasets/0000/0001
dataset_1.pq  phpFsFYVN
bash-5.1# ls /data/datasets/0000/0128
iris.arff

This is probably a mistake?

Also, it contains some weird files:

bash-5.1# ls /data/datasets/0000
 0000           '0000?C=S;O=A'   '0000?C=D;O=A' '0000?C=M;O=A' '0000?C=N;O=D'  ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the weird files are apache: https://httpd.apache.org/docs/2.4/mod/mod_autoindex.html
Harmless, but I'll update the wget command to exclude them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The omission of 128 was accidental, but turned out to be useful for the openml-python API tests that require an arff file (which isn't easily downloaded anymore if parquet files are present). I will hold off on adding that parquet file because:

  • I would need to update openml-python (or at least its tests) accordingly
  • Services should be able to handle a missing parquet file for now, as not all datasets have parquet files in production either

As for the reason it was skipped.. that's worth looking into. For now, I'll add a note to the readme.

my_task = openml.tasks.get_task(my_task.task_id)
from sklearn import compose, ensemble, impute, neighbors, preprocessing, pipeline, tree
clf = tree.DecisionTreeClassifier()
run = openml.runs.run_model_on_task(clf, my_task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get errors here:

OSError: Repetition level histogram size mismatch on

Traceback (most recent call last):
  File "/openml/openml/datasets/dataset.py", line 593, in _parse_data_from_pq
    data = pd.read_parquet(data_file)

It seems to have something to do with the pyarrow version in openml-python. Maybe unrelated to this PR, but I haven't seen these problems before. Do you see these problems as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I had sent a message on Slack about it. Basically, the openml-python image is so outdated the newly generated parquet files cannot be loaded. If you take the shell as an entrypoint and first update pyarrow and pandas, it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants