Skip to content

Conversation

@awesomenix
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

aks node controller can have mutiple failures even before it starts running cse, capture everything

Which issue(s) this PR fixes:

Fixes #

Requirements:

  • uses conventional commit messages
  • includes documentation
  • adds unit tests
  • tested upgrade from previous version
  • commits are GPG signed and Github marks them as verified

Special notes for your reviewer:

Release note:

none

@Devinwong
Copy link
Collaborator

LGTM. Thanks for improving the logging

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 47 out of 71 changed files in this pull request and generated 5 comments.

Comment on lines +188 to +189
// ReturnErrorOnVMSSCreation indicates whether to return error on VMSS creation failure or fail the test immediately.
ReturnErrorOnVMSSCreation bool
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming ReturnErrorOnVMSSCreation to something more descriptive like AllowVMSSCreationFailure or PropagateVMSSCreationError, which better conveys its purpose of controlling error propagation behavior rather than just "returning" an error.

Suggested change
// ReturnErrorOnVMSSCreation indicates whether to return error on VMSS creation failure or fail the test immediately.
ReturnErrorOnVMSSCreation bool
// PropagateVMSSCreationError controls whether VMSS creation failures should be propagated back to the caller as an error
// (when true) or cause the test to fail immediately (when false).
PropagateVMSSCreationError bool

Copilot uses AI. Check for mistakes.
},
AKSNodeConfigMutator: func(config *aksnodeconfigv1.Configuration) {
// Intentionally causing a failure here
//config.Version = "v200"
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test Test_Ubuntu2204_Failure_Scriptless, there's a commented-out line that appears to be an alternative way to cause failure. This commented code should either be removed or have a clear comment explaining why it's kept for future reference.

Suggested change
//config.Version = "v200"

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +160
defer func() {
if r := recover(); r != nil {
slog.Error("panic recovered in runProvision", "panic", r)
provisionResult.ExitCode = strconv.Itoa(240)
provisionResult.Error = fmt.Sprintf("panic during provisioning: %v", r)
err = fmt.Errorf("panic during provisioning: %v", r)
a.writeCompleteFileOnError(provisionResult, err)
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the panic recovery handler, writeCompleteFileOnError is being called inside the defer function. However, this function is also called at line 74 after runProvision returns. This could lead to duplicate file writes or unnecessary operations. Consider whether calling it within the defer is necessary, or if the panic should be allowed to propagate differently.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +199
}
baseDir := filepath.Dir(provisionJSONFilePath)
if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil {
slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr)
}
if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil {
slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr)
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the writeCompleteFileOnError function, errors during JSON marshaling and directory/file creation are logged but not returned. While this is appropriate for a best-effort cleanup function, consider whether the function should return early after any failure to avoid attempting subsequent operations that may also fail.

Suggested change
}
baseDir := filepath.Dir(provisionJSONFilePath)
if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil {
slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr)
}
if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil {
slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr)
}
} else {
baseDir := filepath.Dir(provisionJSONFilePath)
if writeErr := os.MkdirAll(baseDir, 0755); writeErr != nil {
slog.Error("failed to create directory for provision.json file", "path", baseDir, "error", writeErr)
} else if writeErr := os.WriteFile(provisionJSONFilePath, data, 0600); writeErr != nil {
slog.Error("failed to write provision.json file", "path", provisionJSONFilePath, "error", writeErr)
}
}

Copilot uses AI. Check for mistakes.
@awesomenix awesomenix enabled auto-merge (squash) January 15, 2026 00:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@awesomenix awesomenix merged commit 632a093 into main Jan 15, 2026
46 of 47 checks passed
@awesomenix awesomenix deleted the nishp/fix/howtossh branch January 15, 2026 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants