Skip to content

Conversation

@cpuck
Copy link
Contributor

@cpuck cpuck commented Jan 13, 2026

Updated the troubleshooting guide for the Update Service to address issues related to high memory usage and service termination. Added mitigation steps and PowerShell scripts for validation and cleanup.

Updated the troubleshooting guide for the Update Service to address issues related to high memory usage and service termination. Added mitigation steps and PowerShell scripts for validation and cleanup.
Copilot AI review requested due to automatic review settings January 13, 2026 23:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new mitigation step to the troubleshooting guide for Update Service memory issues. The new section provides PowerShell commands to delete failed Update action plan instances, which can contribute to high memory usage.

Changes:

  • Added a new "Remove failed Update action plan instances" section with PowerShell script to clean up old failed action plans
  • Added blank line at end of file for proper formatting

Comment on lines +104 to +115
Import-Module ECEClient -DisableNameChecking
$failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1
$instanceIDs = $failedUpdates.InstanceID
$eceClient = Create-ECEClusterServiceClient
$deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription
foreach ($actionPlanInstanceId in $instanceIDs) {
   # remove old instance
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
   $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait()
}
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PowerShell script lacks error handling that is present in other mitigation scripts in this document. This script performs state-changing delete operations that could fail. Add $ErrorActionPreference = "Stop" at the beginning of the script to ensure errors are caught and the script stops if operations fail.

Copilot generated this review using guidance from repository custom instructions.
foreach ($actionPlanInstanceId in $instanceIDs) {
   # remove old instance
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name case mismatch: The foreach loop uses $actionPlanInstanceId (lowercase 'd') but references it as $actionPlanInstanceID (uppercase 'ID') on line 113. This will cause the deletion to fail because PowerShell variable names are case-insensitive but the inconsistency indicates the wrong variable is being used. The loop variable should be used consistently.

Suggested change
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceId

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +115
Import-Module ECEClient -DisableNameChecking
$failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1
$instanceIDs = $failedUpdates.InstanceID
$eceClient = Create-ECEClusterServiceClient
$deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription
foreach ($actionPlanInstanceId in $instanceIDs) {
   # remove old instance
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
   $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait()
}
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script deletes action plan instances without defensive validation. Before performing the delete operations, add a check to verify that failed updates were found and provide user confirmation or output about what will be deleted. For example, check if $failedUpdates is empty and inform the user how many instances will be deleted.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +104 to +114
Import-Module ECEClient -DisableNameChecking
$failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1
$instanceIDs = $failedUpdates.InstanceID
$eceClient = Create-ECEClusterServiceClient
$deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription
foreach ($actionPlanInstanceId in $instanceIDs) {
   # remove old instance
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
   $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait()
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code formatting has inconsistent indentation with trailing spaces on lines 107, 109, 110, 111, and 112. Remove trailing whitespace and use consistent spacing to match the formatting style of other PowerShell scripts in this document.

Suggested change
Import-Module ECEClient -DisableNameChecking
$failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1
$instanceIDs = $failedUpdates.InstanceID
$eceClient = Create-ECEClusterServiceClient
$deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription
foreach ($actionPlanInstanceId in $instanceIDs) {
   # remove old instance
   $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
   $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait()
Import-Module ECEClient -DisableNameChecking
$failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1
$instanceIDs = $failedUpdates.InstanceID
$eceClient = Create-ECEClusterServiceClient
$deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription
foreach ($actionPlanInstanceId in $instanceIDs) {
# remove old instance
$deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID
$eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait()

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +102
Delete all failed Update action plans except for the last failed one.

Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section description is too brief for a potentially destructive operation. The instruction should warn users that this operation will permanently delete action plan instances and cannot be undone. Consider adding context about why keeping the last failed instance is important for troubleshooting.

Suggested change
Delete all failed Update action plans except for the last failed one.
> [!WARNING]
> The following script permanently deletes failed Update action plan instances and cannot be undone. Only run this step after you have collected any required logs or when explicitly instructed by support.
Delete all failed Update action plans except for the last failed one so that you retain the most recent failure for troubleshooting and comparison while removing older, no‑longer‑needed instances that can interfere with subsequent update attempts.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant