Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces asynchronous queuing for long-running VM operations (notably prepare_base) by scheduling host commands via at and tracking completion through new backend requests, alongside related refactors to base-preparation hooks and tests.
Changes:
- Added command queueing via
at(queue_command) and a newwait_jobrequest to poll job completion. - Refactored base preparation flow to support deferred finalization (
post_prepare_base) and renamed the subclass hook toafter_prepare_base. - Added/updated tests to exercise queued base preparation and removed an older download-related test.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| t/vm/d10_not_download.t | Removes test_dont_download coverage and its invocation. |
| t/q20_queue_req.t | Adds new tests for queued requests (wait_job/post_prepare_base) including failure and success paths. |
| t/30_request.t | Updates base-prep test to go through Ravada::Request->prepare_base and adds a Void-specific extra volume. |
| lib/Ravada/Volume/Void.pm | Adjusts prepare_base signature to accept an extra argument. |
| lib/Ravada/Volume/QCOW2.pm | Adds request-aware prepare_base path that queues work instead of running synchronously. |
| lib/Ravada/Volume/ISO.pm | Adjusts prepare_base signature for consistency with request-aware calls. |
| lib/Ravada/Volume/Class.pm | Updates prepare_base wrapper to pass through request context and defers post-processing if base file doesn’t exist yet. |
| lib/Ravada/VM/Void.pm | Refactors remove_file to use the new _remove_file_os helper. |
| lib/Ravada/VM/KVM.pm | Enhances remove_file to delete existing non-volume files via _remove_file_os. |
| lib/Ravada/VM.pm | Introduces queue_command, wait_job plumbing, _remove_file_os, and queue directory helpers. |
| lib/Ravada/Request.pm | Adds wait_job and post_prepare_base request args support plus check_requests validation behavior. |
| lib/Ravada/Domain/KVM.pm | Renames post_prepare_base hook to after_prepare_base. |
| lib/Ravada/Domain.pm | Refactors prepare_base workflow to schedule post_prepare_base and support queued per-volume base creation. |
| lib/Ravada.pm | Adds backend handlers for wait_job and post_prepare_base, and integrates check_requests error propagation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| my @base_img = $self->_do_prepare_base($with_cd, $request); | ||
|
|
||
| die "Error: No information files returned from prepare_base" | ||
| if !scalar (\@base_img); |
There was a problem hiding this comment.
This empty-result check is ineffective: scalar(\@base_img) is always truthy because it’s an array reference. If _do_prepare_base returns an empty list, this will not die. Use scalar(@base_img) (or @base_img) to correctly detect “no base images generated”.
| if !scalar (\@base_img); | |
| if !scalar(@base_img); |
| refresh_storage | ||
| set_time | ||
| open_exposed_ports | ||
| manage_pools | ||
| screenshot | ||
| prepare_base | ||
| wait_job | ||
| ); |
There was a problem hiding this comment.
wait_job is added as a non-duplicate command, but it is not added to any %COMMAND type group. As a result, Ravada::Request->type will classify it as 'long', which may bypass the intended disk/IO concurrency limits during base preparation. Consider adding wait_job (and likely post_prepare_base) to the appropriate command group (e.g. secondary for polling or disk for the finalization step) so scheduling/limits are consistent.
| sub _remove_file_os($self, @files) { | ||
| my %done; | ||
| for my $file (@files) { | ||
| next if $done{$file}; |
There was a problem hiding this comment.
_remove_file_os intends to de-duplicate file paths via %done, but %done is never updated, so duplicates won’t actually be skipped. Add $done{$file} = 1; after the duplicate check (or use a hash slice) so repeated paths don’t trigger repeated deletes/remote rm calls.
| next if $done{$file}; | |
| next if $done{$file}; | |
| $done{$file} = 1; |
t/q20_queue_req.t
Outdated
| $domain->add_volume( format => 'qcow2', size => 1*1024*1024); | ||
| } | ||
|
|
||
| my $n_vols = scalar( grep {$_ =~ /img|qcow2|raw/ } $domain->list_volumes); |
There was a problem hiding this comment.
The volume count regex grep { $_ =~ /img|qcow2|raw/ } ... is very loose (it matches any path containing those substrings) and can miscount volumes, making the subsequent is(scalar(@req), $n_vols+1) assertion flaky. Consider tightening it to match extensions explicitly (e.g., \.(img|qcow2|raw)$).
| my $n_vols = scalar( grep {$_ =~ /img|qcow2|raw/ } $domain->list_volumes); | |
| my $n_vols = scalar( grep { $_ =~ /\.(?:img|qcow2|raw)$/ } $domain->list_volumes ); |
| my ($job) = $err =~ m{^job (\d+) at}m; | ||
| die $err if !$job; | ||
| if ($err && $err =~ /warning: commands will be executed/) { | ||
| $err = ''; | ||
| } | ||
|
|
There was a problem hiding this comment.
queue_command parses the scheduled job id from stderr ($err), but at typically prints the “job N at …” line to stdout. With the current logic, $out will contain the job line (and gets warned), $job will be undef, and the method will die even on success. Parse $job from $out (and treat real errors from $err separately).
| my ($job) = $err =~ m{^job (\d+) at}m; | |
| die $err if !$job; | |
| if ($err && $err =~ /warning: commands will be executed/) { | |
| $err = ''; | |
| } | |
| warn "err='$err'" if $err; | |
| # 'at' typically prints "job N at ..." to stdout; try there first. | |
| my ($job) = $out =~ m{^job (\d+) at}m; | |
| # Fallback in case some implementations print the job line to stderr. | |
| if (!$job && $err) { | |
| ($job) = $err =~ m{^job (\d+) at}m; | |
| } | |
| if ($err) { | |
| # Ignore common informational warning about missing tty / atd behavior. | |
| if ($err =~ /warning: commands will be executed/) { | |
| $err = ''; | |
| # Ignore lines that only contain the job information. | |
| } elsif ($err !~ m{^job \d+ at}m) { | |
| die $err; | |
| } | |
| } | |
| die "Unable to queue job with at: could not parse job id" if !$job; |
| my $file = $volume->file; | ||
| my $file_base = $base{$volume->info->{target}}; | ||
| next if !$file_base; | ||
|
|
||
| my $base = Ravada::Volume->new( | ||
| file => $file_base | ||
| ,is_base => 1 | ||
| ,domain => $self | ||
| ); | ||
| $base->clone(file => $volume->file); | ||
|
|
There was a problem hiding this comment.
_clone_volumes_base clones the base into $volume->file without removing/replacing the existing volume file first. For formats like QCOW2, qemu-img create ... $file_clone will fail if $file_clone already exists, which is the normal case here. This needs to remove/rename the existing volume file before cloning (similar to Ravada::Volume::Class::_post_prepare_base which does vm->remove_file($self->file) before cloning).
| } | ||
| $self->post_prepare_base(); | ||
| $self->after_prepare_base(); | ||
| return @base_img; | ||
| } | ||
|
|
||
| =head2 post_prepare_base | ||
| =head2 after_prepare_base | ||
|
|
||
| Placeholder for optional method implemented in subclasses. This will | ||
| run after preparing the base files. | ||
|
|
||
| =cut | ||
|
|
||
| sub post_prepare_base($self) {} | ||
| sub after_prepare_base($self) {} | ||
|
|
||
| sub _before_post_prepare_base($self) { | ||
| $self->_data('is_base' => 0 ); | ||
| $self->_clone_volumes_base(); | ||
| $self->_data('is_base' => 1 ); | ||
| } | ||
|
|
||
| sub post_prepare_base($self) { } | ||
|
|
||
| sub _after_post_prepare_base($self) { | ||
| $self->after_prepare_base(); | ||
| } |
There was a problem hiding this comment.
after_prepare_base is invoked during _do_prepare_base and then invoked again via the post_prepare_base after-modifier (_after_post_prepare_base). This means subclass implementations (e.g. KVM’s backing-store setup) can run twice, and the first run may occur before queued base files exist. Consider running after_prepare_base only in one place, or making the first call conditional on whether async queueing is being used.
lib/Ravada/Volume/QCOW2.pm
Outdated
| sub prepare_base($self) { | ||
| sub prepare_base($self, $req=undef) { | ||
|
|
||
| cluck if !defined $req; |
There was a problem hiding this comment.
cluck if !defined $req; will emit stack traces whenever prepare_base is invoked without a request object, which is noisy in production and can make real warnings harder to spot. If this is just for debugging, gate it behind a debug flag or remove it once the new call path is stable.
| cluck if !defined $req; |
| $self->write_file("$file_queue.sh", | ||
| "#!/bin/sh\n" | ||
| ."@$command > $file_queue.out 2> $file_queue.err" | ||
| ); |
There was a problem hiding this comment.
The generated queue script interpolates @$command directly into a /bin/sh script without any shell-escaping/quoting, which can break when args contain spaces and can become command-injection if any arg is influenced by external input. Build the script with proper shell escaping (or avoid the shell entirely by writing a wrapper that execs the command safely).
| chmod(oct(744),"$file_queue.sh") or die "$! chmod $file_queue"; | ||
| } else { | ||
| my ( $out, $err) = $self->run_command("chmod",'744',"$file_queue.sh"); |
There was a problem hiding this comment.
The queued script is chmod’d to 0744, making it readable/executable by group/others and potentially exposing command lines and output paths under /run/user/.... Using a more restrictive mode like 0700 (and ensuring the queue dir has tight permissions) would reduce information exposure.
| chmod(oct(744),"$file_queue.sh") or die "$! chmod $file_queue"; | |
| } else { | |
| my ( $out, $err) = $self->run_command("chmod",'744',"$file_queue.sh"); | |
| chmod(oct(700),"$file_queue.sh") or die "$! chmod $file_queue"; | |
| } else { | |
| my ( $out, $err) = $self->run_command("chmod",'700',"$file_queue.sh"); |
Queue long requests using system command at