-
Notifications
You must be signed in to change notification settings - Fork 245
fix: write provision.json on any node controller failures, add E2E #7650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
760e44e to
7986052
Compare
7986052 to
a1ac625
Compare
a1ac625 to
753ddcc
Compare
753ddcc to
1b85a48
Compare
1b85a48 to
a1fc2d3
Compare
a1fc2d3 to
d7ba7dc
Compare
|
LGTM. Thanks for improving the logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 47 out of 71 changed files in this pull request and generated 5 comments.
| // ReturnErrorOnVMSSCreation indicates whether to return error on VMSS creation failure or fail the test immediately. | ||
| ReturnErrorOnVMSSCreation bool |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming ReturnErrorOnVMSSCreation to something more descriptive like AllowVMSSCreationFailure or PropagateVMSSCreationError, which better conveys its purpose of controlling error propagation behavior rather than just "returning" an error.
| // ReturnErrorOnVMSSCreation indicates whether to return error on VMSS creation failure or fail the test immediately. | |
| ReturnErrorOnVMSSCreation bool | |
| // PropagateVMSSCreationError controls whether VMSS creation failures should be propagated back to the caller as an error | |
| // (when true) or cause the test to fail immediately (when false). | |
| PropagateVMSSCreationError bool |
| }, | ||
| AKSNodeConfigMutator: func(config *aksnodeconfigv1.Configuration) { | ||
| // Intentionally causing a failure here | ||
| //config.Version = "v200" |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test Test_Ubuntu2204_Failure_Scriptless, there's a commented-out line that appears to be an alternative way to cause failure. This commented code should either be removed or have a clear comment explaining why it's kept for future reference.
| //config.Version = "v200" |
| defer func() { | ||
| if r := recover(); r != nil { | ||
| slog.Error("panic recovered in runProvision", "panic", r) | ||
| provisionResult.ExitCode = strconv.Itoa(240) | ||
| provisionResult.Error = fmt.Sprintf("panic during provisioning: %v", r) | ||
| err = fmt.Errorf("panic during provisioning: %v", r) | ||
| a.writeCompleteFileOnError(provisionResult, err) | ||
| } |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the panic recovery handler, writeCompleteFileOnError is being called inside the defer function. However, this function is also called at line 74 after runProvision returns. This could lead to duplicate file writes or unnecessary operations. Consider whether calling it within the defer is necessary, or if the panic should be allowed to propagate differently.
| } | ||
| baseDir := filepath.Dir(provisionJSONFilePath) | ||
| if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil { | ||
| slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr) | ||
| } | ||
| if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil { | ||
| slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr) | ||
| } |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the writeCompleteFileOnError function, errors during JSON marshaling and directory/file creation are logged but not returned. While this is appropriate for a best-effort cleanup function, consider whether the function should return early after any failure to avoid attempting subsequent operations that may also fail.
| } | |
| baseDir := filepath.Dir(provisionJSONFilePath) | |
| if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil { | |
| slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr) | |
| } | |
| if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil { | |
| slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr) | |
| } | |
| } else { | |
| baseDir := filepath.Dir(provisionJSONFilePath) | |
| if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil { | |
| slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr) | |
| } else if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil { | |
| slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr) | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
What type of PR is this?
/kind bug
What this PR does / why we need it:
aks node controller can have mutiple failures even before it starts running cse, capture everything
Which issue(s) this PR fixes:
Fixes #
Requirements:
Special notes for your reviewer:
Release note: