feat: add SLURM integration test workflow by timurcarstensen · Pull Request #28 · OpenEuroLLM/oellm-cli

timurcarstensen · 2026-01-30T10:51:00Z

Add a GitHub Actions workflow that sets up a real SLURM cluster with Apptainer on a GPU runner to test the schedule_evals workflow end-to-end.

Relevant files:

workflow to set-up SLURM & Apptainer on AWS EC2 GPU instance is in .github/workflows/build-and-push-apptainer.yml
tests/integration contains all the stuff needed to understand our slurm integration testing. Tests (1) dry run sbatch script generation/setup of the run dir (2) dataset download for all datasets needed for the tasks in task-groups.yaml (3) end-to-end scheduling and running of the first task in every task group (to save time). The latter tests both lm-eval and lighteval tasks

Regarding test suite support: this bumps lighteval to the latest version (installed from the github repo) and adapts the launch args accordingly

Add a GitHub Actions workflow that sets up a real SLURM cluster with Apptainer on a GPU runner to test the schedule_evals workflow end-to-end. - New workflow: slurm-integration.yml - Sets up SLURM (slurmctld + slurmd) on AWS GPU runner - Installs Apptainer and builds test container - Pre-downloads tiny-gpt2 model and arc_easy dataset - Runs integration test that validates full workflow - New test container: tests/integration/ci.def - Based on PyTorch with CUDA support - Includes lm_eval and dependencies - New integration test: tests/integration/test_slurm.py - Submits real SLURM job via schedule_evals - Waits for completion and validates results JSON - Updated clusters.yaml with CI cluster configuration

Switch to CPU instances (i7ie) since GPU quota is not available: - Remove GPU/GRES configuration from SLURM setup - Update test to support --dry-run mode for CPU-only testing - Validate sbatch script generation without actual job execution - Update CI cluster config to use debug partition Full GPU testing can be re-enabled later when quota is available.

timurcarstensen · 2026-02-04T21:19:36Z

@JeniaJitsev this is the base PR which #37 is built upon. Once this one is merged I'll merge #37

geoalgo

Can you give the current cost of running the pipeline on AWS once? (I see that the time out is currently set to 45 min, for 45 min the cost would 0.4$ per commit which would be a bit annoying)

Do we need GPU at all for testing also? If we could run on a CPU machine, it would be much cheaper

geoalgo · 2026-02-05T13:13:24Z

apptainer/jupiter.def

    uv pip install --system --break-system-packages nltk
+
+    # Pre-load lighteval registry to trigger tinyBenchmarks data download at build time
+    /opt/uv-tools/lighteval/bin/python -c "from lighteval.tasks.registry import Registry; Registry.load_all_task_configs(load_multilingual=False)"


Cant we use uvx instead rather than hardcoding the path here?

Best we can do is: $UV_TOOL_DIR/lighteval/bin/python -c "from lighteval.tasks.registry import Registry; Registry.load_all_task_configs(load_multilingual=True)"

oellm/utils.py

timurcarstensen · 2026-02-05T13:22:14Z

Can you give the current cost of running the pipeline on AWS once?

@geoalgo one test run right now takes about 15 minutes. The instance that I use is about 0.3-0.72 USD per hour depending on whether we use spot or on-demand instances (spot is fine for now). 0,075-0,18USD + some service charge for the runs-on service so at most 0.25 USD per run

geoalgo · 2026-02-05T13:43:56Z

Ok thanks, how hard is deactivating AWS? Is it enough to just comment the github action?

I am a bit worried about the complexity hit that we are taking here including needing to manage an AWS account (compared to having a manual / semi automatic integration test that would run in our clusters for instance).

We could merge it but we should have a very easy way to remove it.

timurcarstensen · 2026-02-05T13:49:16Z

Ok thanks, how hard is deactivating AWS? Is it enough to just comment the github action?

I am a bit worried about the complexity hit that we are taking here including needing to manage an AWS account (compared to having a manual / semi automatic integration test that would run in our clusters for instance).

We could merge it but we should have a very easy way to remove it.

it now auto runs on PRs to main and for changes the paths I defined in the workflow file, it's very easy to disable. Yes, you can just do a semi-automatic thing where it runs on one of the clusters we have access to. I would say the overhead of having an AWS is quite minimal.

timurcarstensen added 30 commits January 30, 2026 11:50

fix: runs-on command

2919aa3

fix: create munge key manually (not available on Ubuntu)

daae01e

fix: use correct CLI argument format (underscores + true/false)

7d7f001

fix: align CI container with production containers (leonardo/jureca)

2d03c0b

refactor: remove skip_checks, let schedule_evals handle downloads

65d9fd6

fix: gpu image

b032a75

fix: use sudo for apptainer build, fix log artifact permissions

087ac6c

fix: request large disk for container build

2b21708

fix: disk size rollback

7e4b712

fix: runs-on params

7175c69

fix: disk size

ae29d5a

fix: disk size

0429daf

fix: disk size

233b332

fix: disk size

06d448b

fix: disk size

8f5d63d

fix: disk size

f4f36ad

fix: apptainer args

05be578

fix: node config, compression algo

2fae854

fix: failure logging

935a30d

fix: transformers version mismatch with lm-eval

90d6bf8

fix: rely on existing infra for providing apptainer images

830222b

fix: rely on existing infra for providing apptainer images

8bf8124

fix: sudo for apptainer build

9211c1a

fix: volume size

ca1da6d

fix: apptainer build family spec

3c5ae95

feat: collect slurm logs on every run

45af408

feat: testing all task groups

b67ac52

feat: pytest for slurm integration testing

8c05af8

timurcarstensen and others added 8 commits February 2, 2026 20:16

fix: try swapfile

eae12b7

fix: try swapfile

b090fde

fix: go back to smaller instace

0c3508f

cleanup

3933f9c

cleanup

cc416e6

cleanup

e9291fd

add utils test

1a4bc15

Merge branch 'main' into slurm-integration-tests

0ce4c65

timurcarstensen requested a review from geoalgo February 2, 2026 20:41

timurcarstensen added 11 commits February 2, 2026 21:46

try newer version of lighteval

c7d7b46

update container image definitions

eb7641e

fix: add PIL dep

49ae782

fix: download lighteval deps at build time

791f98d

fix: lighteval flores task names

82e84f0

fix: new lighteval load multilingual tasks

d187b5c

fix: new lighteval task format

27b3dd9

fix: always upload logs for inspection

802ddd9

remove test filters

8932ba4

use HuggingFaceTB/SmolLM2-135M-Instruct for integration tests

678cb56

bump lighteval version

1767006

This was referenced Feb 3, 2026

add --venv_path option for running without containers #31

Merged

add --venv_path option for running without containers #37

Open

Merge branch 'main' into slurm-integration-tests

0a35d38

Merge branch 'main' into slurm-integration-tests

f276a50

geoalgo reviewed Feb 5, 2026

View reviewed changes

chore: log if we overwrite existing env vars

cf0b42e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SLURM integration test workflow#28

feat: add SLURM integration test workflow#28
timurcarstensen wants to merge 95 commits intomainfrom
slurm-integration-tests

timurcarstensen commented Jan 30, 2026 •

edited

Loading

Uh oh!

timurcarstensen commented Feb 4, 2026

Uh oh!

geoalgo left a comment •

edited

Loading

Uh oh!

geoalgo Feb 5, 2026

Uh oh!

timurcarstensen Feb 5, 2026

Uh oh!

Uh oh!

timurcarstensen commented Feb 5, 2026

Uh oh!

geoalgo commented Feb 5, 2026

Uh oh!

timurcarstensen commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timurcarstensen commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timurcarstensen commented Feb 4, 2026

Uh oh!

geoalgo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geoalgo Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

timurcarstensen Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timurcarstensen commented Feb 5, 2026

Uh oh!

geoalgo commented Feb 5, 2026

Uh oh!

timurcarstensen commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timurcarstensen commented Jan 30, 2026 •

edited

Loading

geoalgo left a comment •

edited

Loading