-
Notifications
You must be signed in to change notification settings - Fork 148
fix(poll): track consequent errors during polling W-18203875 #1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Increase timeout from 1 to 3 seconds in retry limit tests to ensure the retry limit is reached before timeout on all platforms. The 1-second timeout was causing race conditions on Windows where only 16 retries completed instead of the expected 20 due to execution overhead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace total retry limit with consecutive error retry limit to prevent infinite loops from repeated errors while allowing long-running operations to poll indefinitely until timeout. Key changes: - Track consecutive retryable errors separately from normal polling - Reset counter on successful status check - Default limit of 25 consecutive errors (configurable via SF_METADATA_POLL_ERROR_RETRY_LIMIT) - Remove PollingClient retryLimit to allow unlimited normal polling - Add error message for retry limit exceeded without wrapper duplication 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
src/client/metadataTransfer.ts
Outdated
| this.errorRetryLimit = calculateErrorRetryLimit(this.logger); | ||
| this.errorRetryLimitExceeded = undefined; | ||
|
|
||
| // Set a very high retryLimit for PollingClient to prevent it from stopping on errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo for me:
remove this comment, claude was still setting a high retryLimit after last changes
| const err = e as Error | SfError; | ||
|
|
||
| // Don't wrap the error retry limit exceeded error | ||
| if (err instanceof SfError && err.message.includes('consecutive retryable errors')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid wrapping mostly bc the final error printed to the user had duplicate msg parts like Metadata API request failed: Metadata API request failed
| try { | ||
| mdapiStatus = await this.checkStatus(); | ||
| // Reset error counter on successful status check | ||
| this.consecutiveErrorRetries = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
successful status != successful operation
a successful status check means the metadata API returned a valid response, it can still be InProgress.
The errors caught in the catch block are exceptions thrown by jsforce (network errors, parsing failures, etc)
| error: e, | ||
| count: this.errorRetryLimit, | ||
| }; | ||
| return { completed: true }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed: true to signal the polling client to stop
Increase the default consecutive error retry limit from 25 to 1000 to provide more tolerance for intermittent network issues during long-running deploy/retrieve operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
[skip ci]
|
Looks good to me! Tested this by disconnecting wifi mid deploy/retrieve.
|




What does this PR do?
Updates
MetadataTransferclass to track errors during retrieve/deploy polling.The polling is done using sfdx-core's
PollingClientwithout specifyingretryLimit, so we have it run until a timeout happens:https://github.com/forcedotcom/sfdx-core/blob/5564069767b85a96e73f8bf88dbdd3d7e4b5da03/src/status/pollingClient.ts#L131
During a retrieve/deploy with
sf, if the metadata api starts to return one of the retryable errors constantly (backend or metadata issue?) the polling keeps going until a timeout (--waitflag value) and then throws a generic "client timed out" error without much info about the real issue.This PR adds an error tracker to count consequent errors during polling checks, allowing to:
What issues does this PR fix or reference?
@W-18203875@
Functionality Before
SDR not defining a retry limit would cause consequent api errors to be retried until timeout and throw a generic timeout error
Functionality After
SDR tracks consequent errors, throws after 25 consequent errors during pollling and allows to customize the limit via env var.