Conversation
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
- This saves them from premature killing by psmgmt
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
- Requires Slurm 23.11 or later or a backport of this function
run exactly once on every proc
610f2bc to
a72d3e0
Compare
|
Modified CI testing to use different container names for the srun/rshlaunch vs. SPANK jobs. While this didn't make any difference for GitHub Actions, using the same names prevented running these jobs simultaneously when running the CI locally through |
a72d3e0 to
cada056
Compare
| if (result == -1) { | ||
| sdprintf(1, "ERROR: spindleExitBE returned and error on location %s\n", args.location); | ||
| return -1; | ||
| if (!args.location) { |
There was a problem hiding this comment.
The commpath branch gets rid of location. Changing args.location to args.commpath should work, but I'd want to test that out. If your PR goes in first, I can update this in the commpath branch.
| j += env_value_len; | ||
| #if defined(CUSTOM_GETENV) && defined(CUSTOM_GETENV_FREE) | ||
| free(env_value); | ||
| if(env_value != env_value_str) { |
There was a problem hiding this comment.
I understand how this fixes the underlying problem, but it did take a while to figure out the code paths (which is an issue of the original function, not this fix). Would like the function to be more obviously correct, but this PR probably isn't the place for that.
|
I've found a bug in this when running inside an allocation on fewer nodes than were allocated. Hold off on merging until I can fix it. |
This pull request updates the Slurm SPANK plugin. It includes the commits from #52, for which the original description was:
The following additional changes are made:
parse_location_implwould segfault if a customgetenvwas used because memory would befreed which was not allocated throughmalloc.spindle_args_twas not initialized in the SPANK plugin.isBEProcfunction is adapted to also support selecting a single process per node to send the shutdown message.srun --spindleis used but the Spindle level isoff.RM=slurm-pluginKnown issue: Support for sessions in the SPANK plugin is not implemented and will be addressed in a future PR.