Skip to content

Log on metrics submission errors#6851

Open
dmerand wants to merge 1 commit intomainfrom
dlm-log-metrics-errors
Open

Log on metrics submission errors#6851
dmerand wants to merge 1 commit intomainfrom
dlm-log-metrics-errors

Conversation

@dmerand
Copy link
Contributor

@dmerand dmerand commented Feb 11, 2026

WHY are these changes introduced?

Related to incident #inv-18682 (February 2026 OTLP failures)

When telemetry collection fails (e.g., OTLP endpoint unavailable), the CLI currently only logs to debug output, which means these failures go unnoticed unless users run with --verbose. This leaves the operations team blind to systemic telemetry infrastructure issues.

During the February 2026 incident, OTLP failures were silently occurring and only discovered when users manually reported CLI issues. We need visibility into telemetry failures without impacting the developer experience.

WHAT is this pull request doing?

Changes:

  1. Reports telemetry failures to Bugsnag (marked as 'expected_error' - infrastructure issue, not a CLI bug)
  2. Maintains existing behavior: telemetry failures remain transparent to users (no warnings/errors shown)
  3. Adds test coverage for the new error reporting behavior

Implementation:

  • Imports sendErrorToBugsnag in analytics.ts
  • Calls it in the catch block when telemetry reporting fails (either Monorail or OTLP)

What this achieves:

  • ✅ Operations team gets visibility into OTLP/Monorail outages
  • ✅ Users never see telemetry errors (transparent operation)
  • ✅ Protected against flooding Bugsnag (existing rate limits apply)

Protection against overwhelming error reporting:

  • Bugsnag reporting has built-in rate limiting: 300 reports per day per user
  • This limit is shared across ALL error types (CLI bugs + telemetry failures)
  • If OTLP has an extended outage, each user can only send a maximum of 300 reports total
  • Rate limit tracking is per-user (stored in local config), not global
  • Once the limit is hit, all subsequent errors are silently dropped with a debug log

How to test your changes?

Automated tests:

pnpm test packages/cli-kit/src/public/node/analytics.test.ts

The new test reports telemetry failures to Bugsnag verifies:

  • sendErrorToBugsnag is called when telemetry fails
  • Error is categorized as 'expected_error' (not a CLI bug)
  • CLI doesn't crash if Bugsnag reporting fails

Manual testing:
Manual testing in development is limited because Bugsnag reporting is disabled in local/development environments (isLocalEnvironment() returns true).

The changes have been verified through:

  • ✅ Code review of built output (changes are present)
  • ✅ Unit tests pass
  • ✅ TypeScript compilation succeeds

In production, this will work correctly when:

  • isLocalEnvironment() returns false
  • Bugsnag reporting is enabled
  • Telemetry failures will be caught and reported as expected

Note: This PR provides operational visibility. A separate PR addresses preventing CLI crashes from unhandled OTLP rejections.

Post-release steps

n/a

Measuring impact

  • n/a - this doesn't need measurement, e.g. a linting rule or a bug-fix

This is an operational improvement. Impact will be measured by:

  • Reduced time to detect telemetry infrastructure issues
  • Fewer incidents like inv-18682 going unnoticed

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
    • Uses existing cross-platform error reporting mechanism
  • I've considered possible documentation changes
    • No user-facing changes, no documentation needed

Copy link
Contributor Author

dmerand commented Feb 11, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Coverage report

St.
Category Percentage Covered / Total
🟡 Statements 78.9% 14556/18448
🟡 Branches 73.23% 7223/9863
🟡 Functions 79.09% 3704/4683
🟡 Lines 79.25% 13763/17366

Test suite run success

3772 tests passing in 1453 suites.

Report generated by 🧪jest coverage report action from 2048bb9

@dmerand dmerand force-pushed the dlm-log-metrics-errors branch from ad62394 to f365302 Compare February 12, 2026 14:24
@dmerand dmerand closed this Feb 12, 2026
@dmerand dmerand reopened this Feb 12, 2026
}
outputDebug(message)

if (error instanceof Error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to wrap this as sendErrorToBugsnag already does this internally.

@alexanderMontague alexanderMontague marked this pull request as ready for review February 13, 2026 17:17
@alexanderMontague alexanderMontague requested a review from a team as a code owner February 13, 2026 17:17
@github-actions
Copy link
Contributor

We detected some changes at packages/*/src and there are no updates in the .changeset.
If the changes are user-facing, run pnpm changeset add to track your changes and include them in the next release CHANGELOG.

Caution

DO NOT create changesets for features which you do not wish to be included in the public changelog of the next CLI release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants