From 0aed46fe6c0702507c37b72216fae3a22b3529b7 Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Mon, 29 Dec 2025 11:10:14 -0800 Subject: [PATCH 1/7] Add documentation for activator_autoscaler_reachable metric Document the new activator_autoscaler_reachable gauge metric that indicates whether the autoscaler is reachable from the activator component (1 = reachable, 0 = not reachable). --- .../observability/metrics/serving-metrics.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index c6d4343365..636feafe1e 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -92,6 +92,20 @@ Name | Type | Description `kn.configuration.name` | string | Knative Configuration name associated with this Revision `kn.revision.name` | string | The name of the Revision +### `activator_autoscaler_reachable` + +**Instrument Type:** Int64Gauge + +**Unit ([UCUM](https://ucum.org)):** {reachable} + +**Description:** Whether the autoscaler is reachable from the activator (1 = reachable, 0 = not reachable) + +This metric helps operators identify connectivity issues between the activator and autoscaler components. The metric is recorded: + +- When stats are successfully sent to the autoscaler (value = 1) +- When stats fail to send to the autoscaler (value = 0) +- Periodically every 5 seconds based on connection status check + ### HTTP metrics Since the activator receives and forwards requests to the user workload it has both HTTP server and client metrics. From 316a43d2a1a9fa67ac8d646ea188fac852756bad Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Tue, 30 Dec 2025 09:00:04 -0800 Subject: [PATCH 2/7] Correcting the metric name inline with knative standards --- docs/versioned/serving/observability/metrics/serving-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index 636feafe1e..a6116c5316 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -92,7 +92,7 @@ Name | Type | Description `kn.configuration.name` | string | Knative Configuration name associated with this Revision `kn.revision.name` | string | The name of the Revision -### `activator_autoscaler_reachable` +### `kn.activator.autoscaler.reachable` **Instrument Type:** Int64Gauge From 2e295c086ad64adb562f5b088a7bde1909d73619 Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Mon, 12 Jan 2026 22:25:10 -0800 Subject: [PATCH 3/7] Added a new metric --- .../observability/metrics/serving-metrics.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index a6116c5316..6337939cb2 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -106,6 +106,20 @@ This metric helps operators identify connectivity issues between the activator a - When stats fail to send to the autoscaler (value = 0) - Periodically every 5 seconds based on connection status check +### `kn.activator.autoscaler.connection_errors_total` + +**Instrument Type:** Int64Counter + +**Unit ([UCUM](https://ucum.org)):** {error} + +**Description:** Total number of autoscaler connection errors from the activator + +This counter increments each time the activator fails to communicate with the autoscaler. It complements the `kn.activator.autoscaler.reachable` gauge by providing a cumulative count of errors, which is useful for: + +- Detecting flaky connections that might be missed by point-in-time gauge sampling +- Creating rate-based alerts (e.g., alert if error rate exceeds threshold over 5 minutes) +- Tracking connection stability trends over time + ### HTTP metrics Since the activator receives and forwards requests to the user workload it has both HTTP server and client metrics. From 9b061b153fa8d3f57a53f10b0b9b7e4b02df57cb Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Wed, 14 Jan 2026 21:38:11 -0800 Subject: [PATCH 4/7] Added a new metric --- .../observability/metrics/serving-metrics.md | 29 +++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index 6337939cb2..5d4a57e692 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -92,29 +92,40 @@ Name | Type | Description `kn.configuration.name` | string | Knative Configuration name associated with this Revision `kn.revision.name` | string | The name of the Revision -### `kn.activator.autoscaler.reachable` +### `kn.activator.reachable` **Instrument Type:** Int64Gauge **Unit ([UCUM](https://ucum.org)):** {reachable} -**Description:** Whether the autoscaler is reachable from the activator (1 = reachable, 0 = not reachable) +**Description:** Whether a peer is reachable from the activator (1 = reachable, 0 = not reachable) -This metric helps operators identify connectivity issues between the activator and autoscaler components. The metric is recorded: +The following attributes are included with the metric + +Name | Type | Description +-|-|- +`peer` | string | The peer service the activator is connecting to (e.g., `autoscaler`) + +This metric helps operators identify connectivity issues between the activator and its peer components. The metric is recorded: -- When stats are successfully sent to the autoscaler (value = 1) -- When stats fail to send to the autoscaler (value = 0) -- Periodically every 5 seconds based on connection status check +- When a connection is established (value = 1) +- When a connection is lost (value = 0) -### `kn.activator.autoscaler.connection_errors_total` +### `kn.activator.connection_errors` **Instrument Type:** Int64Counter **Unit ([UCUM](https://ucum.org)):** {error} -**Description:** Total number of autoscaler connection errors from the activator +**Description:** Number of connection errors from the activator + +The following attributes are included with the metric + +Name | Type | Description +-|-|- +`peer` | string | The peer service the activator is connecting to (e.g., `autoscaler`) -This counter increments each time the activator fails to communicate with the autoscaler. It complements the `kn.activator.autoscaler.reachable` gauge by providing a cumulative count of errors, which is useful for: +This counter increments each time the activator fails to communicate with a peer. It complements the `kn.activator.reachable` gauge by providing a cumulative count of errors, which is useful for: - Detecting flaky connections that might be missed by point-in-time gauge sampling - Creating rate-based alerts (e.g., alert if error rate exceeds threshold over 5 minutes) From d97a4cd0ec4927452917897b932d4426c5533635 Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Thu, 15 Jan 2026 11:04:31 -0800 Subject: [PATCH 5/7] Apply suggestion from @dprotaso Co-authored-by: Dave Protasowski --- docs/versioned/serving/observability/metrics/serving-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index 5d4a57e692..c32c6b9b90 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -92,7 +92,7 @@ Name | Type | Description `kn.configuration.name` | string | Knative Configuration name associated with this Revision `kn.revision.name` | string | The name of the Revision -### `kn.activator.reachable` +### `kn.activator.stats.conn.reachable` **Instrument Type:** Int64Gauge From 89049fb3ffcf06322920be7e673c2de08b52e9b2 Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Thu, 15 Jan 2026 11:04:47 -0800 Subject: [PATCH 6/7] Apply suggestion from @dprotaso Co-authored-by: Dave Protasowski --- docs/versioned/serving/observability/metrics/serving-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index c32c6b9b90..5ec7c7d978 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -111,7 +111,7 @@ This metric helps operators identify connectivity issues between the activator a - When a connection is established (value = 1) - When a connection is lost (value = 0) -### `kn.activator.connection_errors` +### `kn.activator.stats.conn.errors` **Instrument Type:** Int64Counter From 8ac66e089db5b073aee442c9242986ccb4986be1 Mon Sep 17 00:00:00 2001 From: Prashanth Josyula Date: Thu, 15 Jan 2026 11:04:59 -0800 Subject: [PATCH 7/7] Apply suggestion from @dprotaso Co-authored-by: Dave Protasowski --- docs/versioned/serving/observability/metrics/serving-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/versioned/serving/observability/metrics/serving-metrics.md b/docs/versioned/serving/observability/metrics/serving-metrics.md index 5ec7c7d978..9f7167e62e 100644 --- a/docs/versioned/serving/observability/metrics/serving-metrics.md +++ b/docs/versioned/serving/observability/metrics/serving-metrics.md @@ -125,7 +125,7 @@ Name | Type | Description -|-|- `peer` | string | The peer service the activator is connecting to (e.g., `autoscaler`) -This counter increments each time the activator fails to communicate with a peer. It complements the `kn.activator.reachable` gauge by providing a cumulative count of errors, which is useful for: +This counter increments each time the activator fails to communicate with a peer. It complements the `kn.activator.stats.conn.reachable` gauge by providing a cumulative count of errors, which is useful for: - Detecting flaky connections that might be missed by point-in-time gauge sampling - Creating rate-based alerts (e.g., alert if error rate exceeds threshold over 5 minutes)