-
Notifications
You must be signed in to change notification settings - Fork 45
Revise TSG for Update Service memory issues #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated the troubleshooting guide for the Update Service to address issues related to high memory usage and service termination. Added mitigation steps and PowerShell scripts for validation and cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new mitigation step to the troubleshooting guide for Update Service memory issues. The new section provides PowerShell commands to delete failed Update action plan instances, which can contribute to high memory usage.
Changes:
- Added a new "Remove failed Update action plan instances" section with PowerShell script to clean up old failed action plans
- Added blank line at end of file for proper formatting
| Import-Module ECEClient -DisableNameChecking | ||
| $failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1 | ||
| $instanceIDs = $failedUpdates.InstanceID | ||
| $eceClient = Create-ECEClusterServiceClient | ||
| $deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription | ||
| foreach ($actionPlanInstanceId in $instanceIDs) { | ||
| # remove old instance | ||
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | ||
| $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait() | ||
| } |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PowerShell script lacks error handling that is present in other mitigation scripts in this document. This script performs state-changing delete operations that could fail. Add $ErrorActionPreference = "Stop" at the beginning of the script to ensure errors are caught and the script stops if operations fail.
| foreach ($actionPlanInstanceId in $instanceIDs) { | ||
| # remove old instance | ||
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name case mismatch: The foreach loop uses $actionPlanInstanceId (lowercase 'd') but references it as $actionPlanInstanceID (uppercase 'ID') on line 113. This will cause the deletion to fail because PowerShell variable names are case-insensitive but the inconsistency indicates the wrong variable is being used. The loop variable should be used consistently.
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | |
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceId |
| Import-Module ECEClient -DisableNameChecking | ||
| $failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1 | ||
| $instanceIDs = $failedUpdates.InstanceID | ||
| $eceClient = Create-ECEClusterServiceClient | ||
| $deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription | ||
| foreach ($actionPlanInstanceId in $instanceIDs) { | ||
| # remove old instance | ||
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | ||
| $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait() | ||
| } |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script deletes action plan instances without defensive validation. Before performing the delete operations, add a check to verify that failed updates were found and provide user confirmation or output about what will be deleted. For example, check if $failedUpdates is empty and inform the user how many instances will be deleted.
| Import-Module ECEClient -DisableNameChecking | ||
| $failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1 | ||
| $instanceIDs = $failedUpdates.InstanceID | ||
| $eceClient = Create-ECEClusterServiceClient | ||
| $deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription | ||
| foreach ($actionPlanInstanceId in $instanceIDs) { | ||
| # remove old instance | ||
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | ||
| $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait() |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code formatting has inconsistent indentation with trailing spaces on lines 107, 109, 110, 111, and 112. Remove trailing whitespace and use consistent spacing to match the formatting style of other PowerShell scripts in this document.
| Import-Module ECEClient -DisableNameChecking | |
| $failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1 | |
| $instanceIDs = $failedUpdates.InstanceID | |
| $eceClient = Create-ECEClusterServiceClient | |
| $deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription | |
| foreach ($actionPlanInstanceId in $instanceIDs) { | |
| # remove old instance | |
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | |
| $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait() | |
| Import-Module ECEClient -DisableNameChecking | |
| $failedUpdates = Get-ActionPlanInstances | ? { $_.Status -eq "Failed" -and $_.ActionPlanName -match "MAS Update" } | sort LastModifiedDateTime -Descending | select -Skip 1 | |
| $instanceIDs = $failedUpdates.InstanceID | |
| $eceClient = Create-ECEClusterServiceClient | |
| $deleteActionPlanInstanceDescription = New-Object Microsoft.AzureStack.Solution.Deploy.EnterpriseCloudEngine.Controllers.Models.DeleteActionPlanInstanceDescription | |
| foreach ($actionPlanInstanceId in $instanceIDs) { | |
| # remove old instance | |
| $deleteActionPlanInstanceDescription.ActionPlanInstanceID = $actionPlanInstanceID | |
| $eceClient.DeleteActionPlanInstance($deleteActionPlanInstanceDescription).Wait() |
| Delete all failed Update action plans except for the last failed one. | ||
|
|
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The section description is too brief for a potentially destructive operation. The instruction should warn users that this operation will permanently delete action plan instances and cannot be undone. Consider adding context about why keeping the last failed instance is important for troubleshooting.
| Delete all failed Update action plans except for the last failed one. | |
| > [!WARNING] | |
| > The following script permanently deletes failed Update action plan instances and cannot be undone. Only run this step after you have collected any required logs or when explicitly instructed by support. | |
| Delete all failed Update action plans except for the last failed one so that you retain the most recent failure for troubleshooting and comparison while removing older, no‑longer‑needed instances that can interfere with subsequent update attempts. |
Updated the troubleshooting guide for the Update Service to address issues related to high memory usage and service termination. Added mitigation steps and PowerShell scripts for validation and cleanup.