Do not crash on metrics errors#6854
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Coverage report
Test suite run success3775 tests passing in 1455 suites. Report generated by 🧪jest coverage report action from 10c96e1 |
b53ecc5 to
dfa63b2
Compare
packages/cli-kit/src/public/node/vendor/otel-js/service/FailSafeOtelService.ts
Outdated
Show resolved
Hide resolved
dfa63b2 to
dccc1c6
Compare
f365302 to
54c045e
Compare
Differences in type declarationsWe detected differences in the type declarations generated by Typescript for this branch compared to the baseline ('main' branch). Please, review them to ensure they are backward-compatible. Here are some important things to keep in mind:
New type declarationspackages/cli-kit/dist/public/node/vendor/otel-js/service/FailSafeOtelService.d.tsimport type { MetricAttributes } from '@opentelemetry/api';
import { OtelService, OnRecordCallback } from './types.js';
/**
* Wraps an OtelService to ensure telemetry failures never crash the CLI.
*
* This service attempts to record metrics on a "fail-safe" basis - if recording
* fails, the error is logged (in verbose mode) but not propagated.
* This ensures telemetry infrastructure issues (like OTLP endpoint outages) don't
* block developer workflows.
*
* Only wraps the methods that are actually called from outside the service.
* Initialization errors are handled by the caller.
*
* Related: incident #inv-18682 (February 2026 OTLP failures)
*/
export declare class FailSafeOtelService {
private readonly inner;
private hasLoggedError;
constructor(inner: OtelService);
get serviceName(): string;
record<TAttributes extends MetricAttributes = any>(...args: Parameters<OnRecordCallback<TAttributes>>): void;
private logError;
}
Existing type declarationsWe found no diffs with existing type declarations |
dccc1c6 to
8363f46
Compare
…if analytics reporting fails
54c045e to
2048bb9
Compare
8363f46 to
10c96e1
Compare
|
We detected some changes at Caution DO NOT create changesets for features which you do not wish to be included in the public changelog of the next CLI release. |
| test('resolves without rejecting when export error is undefined', async () => { | ||
| const exporter = createMockExporter(ExportResultCode.FAILED) | ||
| const {reader, provider} = createReaderWithProvider(exporter) | ||
|
|
There was a problem hiding this comment.
We should perhaps test that the diag.error comes through in this and the above case.
| @@ -41,13 +41,12 @@ export class InstantaneousMetricReader extends MetricReader { | |||
| diag.error('PeriodicExportingMetricReader: metrics collection errors', ...errors) | |||
There was a problem hiding this comment.
Nit: This isn't anything we did in this PR, but this class name isn't correct any more -- perhaps a miss when vendoring?
| await provider.shutdown() | ||
| }) | ||
|
|
||
| test('resolves without rejecting on export failure', async () => { |
There was a problem hiding this comment.
Should we also add a test case for export returning synchronous exceptions?

WHY are these changes introduced?
Resolves incident #inv-18682 (February 2026 OTLP failures)
When the OTLP telemetry endpoint is unavailable or experiencing failures, the CLI currently crashes with unhandled errors, forcing users to run commands with
SHOPIFY_CLI_NO_ANALYTICS=1as a workaround. This violates the principle that telemetry should be transparent to users - it should never impact CLI functionality.Related to: PR #6851 (adds Bugsnag reporting for telemetry failures)
WHAT is this pull request doing?
Changes:
Always resolve the OTEL metrics emitter - There is no need to halt execution/reject the emitting process if OTEL reporting fails. This is a non-critical process, and we should allow things to continue to function normally even if this fails
HOW is this fixing things?
EXIT_CODE=0now as we are always resolving. Previously, they were exiting with1