Skip to content

Resolve OTLP export failures instead of rejecting#6856

Closed
alexanderMontague wants to merge 1 commit intomainfrom
dlm-resolve-otel-export-failures
Closed

Resolve OTLP export failures instead of rejecting#6856
alexanderMontague wants to merge 1 commit intomainfrom
dlm-resolve-otel-export-failures

Conversation

@alexanderMontague
Copy link
Contributor

WHY are these changes introduced?

Related to incident #inv-18682 (February 2026 OTLP failures). Stacks on #6851.

When the OTLP endpoint is unreachable, the OTLPMetricExporter's internal HTTP retry mechanism produces an OTLPExporterError: Export failed with retryable status as an unhandled promise rejection. This bypasses all existing try/catch layers (analytics.ts, FailSafeOtelService) because:

  1. BaseOtelService.record() is synchronous — the export happens asynchronously via a detached forceFlush() call
  2. The OTLP transport's retry logic rejects independently of the export() callback pattern
  3. Node.js (v15+) treats unhandled promise rejections as uncaught exceptions, terminating the process

The existing .catch(() => {}) on forceFlush() only catches the InstantaneousMetricReader.onForceFlush() rejection, not the transport-level rejections from the HTTP exporter internals.

WHAT is this pull request doing?

Fixes the problem at the single chokepoint — InstantaneousMetricReader.onForceFlush():

  1. Resolves instead of rejecting on export failure — metrics export is never fatal
  2. Logs failures via diag.error() at the layer where the failure actually occurs
  3. Removes the redundant .catch(() => {}) from BaseOtelService since forceFlush() can no longer reject
  4. Adds tests for the InstantaneousMetricReader verifying it resolves on both success and failure

This makes all paths safe (forceFlush(), shutdown(), and any future callers) without needing scattered .catch() handlers.

How to test your changes?

npx vitest run packages/cli-kit/src/public/node/vendor/otel-js/export/InstantaneousMetricReader.test.ts
npx vitest run packages/cli-kit/src/private/node/otel-metrics.test.ts
npx vitest run packages/cli-kit/src/public/node/analytics.test.ts

Measuring impact

  • n/a - bug fix for OTLP endpoint outage resilience

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
  • I've considered possible documentation changes

🤖 Generated with Claude Code

The OTLPMetricExporter's internal HTTP retry mechanism can produce
unhandled promise rejections that bypass all try/catch layers and
crash the CLI process. Fix this at the source by resolving on export
failure in InstantaneousMetricReader instead of rejecting, and logging
via diag.error(). This makes forceFlush() and shutdown() paths both
safe without needing scattered .catch() handlers.

Co-Authored-By: Claude <noreply@anthropic.com>
@alexanderMontague alexanderMontague requested a review from a team as a code owner February 13, 2026 16:20
@github-actions
Copy link
Contributor

We detected some changes at packages/*/src and there are no updates in the .changeset.
If the changes are user-facing, run pnpm changeset add to track your changes and include them in the next release CHANGELOG.

Caution

DO NOT create changesets for features which you do not wish to be included in the public changelog of the next CLI release.

@github-actions
Copy link
Contributor

Coverage report

St.
Category Percentage Covered / Total
🟡 Statements 78.9% 14554/18447
🟡 Branches 73.22% 7222/9863
🟡 Functions 79.09% 3704/4683
🟡 Lines 79.25% 13761/17365

Test suite run success

3774 tests passing in 1455 suites.

Report generated by 🧪jest coverage report action from b8f39a0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant