fix(poll): track consequent errors during polling W-18203875 #1663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

iowillhoit merged 6 commits into main from cd/retry-limit

Jan 7, 2026

Member

cristiand391 commented Jan 6, 2026 •

edited

Loading

What does this PR do?

Updates MetadataTransfer class to track errors during retrieve/deploy polling.

The polling is done using sfdx-core's PollingClient without specifying retryLimit, so we have it run until a timeout happens:
https://github.com/forcedotcom/sfdx-core/blob/5564069767b85a96e73f8bf88dbdd3d7e4b5da03/src/status/pollingClient.ts#L131

During a retrieve/deploy with sf, if the metadata api starts to return one of the retryable errors constantly (backend or metadata issue?) the polling keeps going until a timeout (--wait flag value) and then throws a generic "client timed out" error without much info about the real issue.

This PR adds an error tracker to count consequent errors during polling checks, allowing to:

throw if the same error is being retried X times
still retry intermittent api flaky responses during long running poll

What issues does this PR fix or reference?

@W-18203875@

Functionality Before

SDR not defining a retry limit would cause consequent api errors to be retried until timeout and throw a generic timeout error

Functionality After

SDR tracks consequent errors, throws after 25 consequent errors during pollling and allows to customize the limit via env var.

Screenshot 2026-01-06 at 17 45 28

cristiand391 and others added 4 commits

January 6, 2026 12:43


          fix(poll): add dynamic retry limits for polling


          chore: lint lint

dcb0e18


          fix(test): fix timing issue in retry limit tests for Windows

32e5980

Increase timeout from 1 to 3 seconds in retry limit tests to ensure
the retry limit is reached before timeout on all platforms. The
1-second timeout was causing race conditions on Windows where only
16 retries completed instead of the expected 20 due to execution
overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>


          fix(poll): add consecutive error retry limit for polling

8eb7cab

Replace total retry limit with consecutive error retry limit to prevent
infinite loops from repeated errors while allowing long-running operations
to poll indefinitely until timeout.

Key changes:
- Track consecutive retryable errors separately from normal polling
- Reset counter on successful status check
- Default limit of 25 consecutive errors (configurable via SF_METADATA_POLL_ERROR_RETRY_LIMIT)
- Remove PollingClient retryLimit to allow unlimited normal polling
- Add error message for retry limit exceeded without wrapper duplication

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

cristiand391 requested a review from a team as a code owner

January 6, 2026 20:31

cristiand391 commented

View reviewed changes

src/client/metadataTransfer.ts Outdated

    
                  this.errorRetryLimit = calculateErrorRetryLimit(this.logger);

                  this.errorRetryLimitExceeded = undefined;

                  // Set a very high retryLimit for PollingClient to prevent it from stopping on errors

Member Author

cristiand391 Jan 6, 2026

todo for me:
remove this comment, claude was still setting a high retryLimit after last changes

cristiand391 commented

View reviewed changes

src/client/metadataTransfer.ts

    
                    const err = e as Error | SfError;

                    // Don't wrap the error retry limit exceeded error

                    if (err instanceof SfError && err.message.includes('consecutive retryable errors')) {

Member Author

cristiand391 Jan 6, 2026

avoid wrapping mostly bc the final error printed to the user had duplicate msg parts like Metadata API request failed: Metadata API request failed

cristiand391 commented

View reviewed changes

src/client/metadataTransfer.ts

    
                    try {

                      mdapiStatus = await this.checkStatus();

                      // Reset error counter on successful status check

                      this.consecutiveErrorRetries = 0;

Member Author

cristiand391 Jan 6, 2026

successful status != successful operation

a successful status check means the metadata API returned a valid response, it can still be InProgress.
The errors caught in the catch block are exceptions thrown by jsforce (network errors, parsing failures, etc)

cristiand391 commented

View reviewed changes

src/client/metadataTransfer.ts

    
                            error: e,

                            count: this.errorRetryLimit,

                          };

                          return { completed: true };

Member Author

cristiand391 Jan 6, 2026

completed: true to signal the polling client to stop

cristiand391 changed the title ~~fix(poll): add dynamic retry limits for polling W-18203875~~ fix(poll): track consequent errors during polling W-18203875

cristiand391 commented

View reviewed changes

src/client/metadataTransfer.ts Outdated Show resolved Hide resolved

cristiand391 and others added 2 commits

January 7, 2026 13:42


          fix(poll): increase default error retry limit to 1000

b34b383

Increase the default consecutive error retry limit from 25 to 1000
to provide more tolerance for intermittent network issues during
long-running deploy/retrieve operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>


          chore: update comment

c0146f5

[skip ci]

Contributor

iowillhoit commented Jan 7, 2026

Looks good to me! Tested this by disconnecting wifi mid deploy/retrieve. ENOTFOUND is one of the retry-able errors outlined here

Default retries is 1000 (deploy)

Screenshot 2026-01-07 at 2 52 55 PM

Overrode the default with SF_METADATA_POLL_ERROR_RETRY_LIMIT=50

Screenshot 2026-01-07 at 2 55 00 PM

When it reaches limit, it throws an error with the last known error

Same with retrieves

iowillhoit approved these changes

View reviewed changes

iowillhoit merged commit 588ed78 into main

3 checks passed

iowillhoit deleted the cd/retry-limit branch

January 7, 2026 21:07

cristiand391 mentioned this pull request

fix: bump deps + add NUTs for consecutive errors W-18203875 salesforcecli/plugin-deploy-retrieve#1495

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet