Skip to content

PropertiesMeterFilter.java filters distribution.slo, distribution.minimum-expected-value and distribution.maximum-expected-value for active metrics/LONG_TASK_TIMER #49190

@jtorkkel

Description

@jtorkkel

Spring Boot introduced active metrics in Spring Boot 3.

By default active metrics contains _count, _sum and _max (http_client_requests_active_seconds_gcount as example)

Spring has the following properties to control distribution metrics. Only one of them is applied in the priority order (in Spring 2 percentile and slo/sla worked simultaneously but OK that only one works simultaneously)

#1 autogenerated buckets with min and max
management.metrics.distribution.percentile-histogram.http=true
management.metrics.distribution.minimum-expected-value.http=20
management.metrics.distribution.maximum-expected-value.http=40s

#2 explicit buckets
management.metrics.distribution.slo.http=10, 50, 100, 500, 1s, 5s, 10s, 20s, 30s, 40s

#3 explicit percentiles
#management.metrics.distribution.percentiles.http = 0.5, 0.9, 0.999

Unfortunately active metrics configuration does not work as expected and is inconsistent.

Based on 48410 the normal metrics distribution configuration should be applied to active metrics but currently only percentiles can be applied. jonatan-ivanov proposed to create new issue.

The following are discarded:

  • slo are discarded from active metrics
  • minimum-expected-value are discarded from active metrics
  • maximum-expected-value are discarded from active metrics

They are discarded because MeterValue/getValue will return null if meterType is LONG_TASK_TIME. This same "filtering cause all 3 issues. slo is filtered in convertServiceLevelObjectives, and min/max calling convertMeterValue, both calling dolookup directly, instead of calling lookupWithFallbackToAll.

Percentiles works "accidentally" as it does not use "converted" and call lookupWithFallbackToAll (also percentile-histogram, expiry and buffer length are set because they also use lookupWithFallbackToAll without converter.)

Filtering min/max is very problematic as active metrics use default min value of 120s and max value 2h resulting buckets, which are suitable for real long running task which execute tens of minutes, but are completely false for normal http client and server calls are example. Due filtering min/max the correct values cannot be set.

Fixing fixing slo cause another issue. Once fixed there is no way to remove buckets from active metrics. Only option is to set only one bucket explicit for each metrics like below

management.metrics.distribution.slo.http.client.requests.active=5s
management.metrics.distribution.slo.http.server.requests.active=5s

The following does not work as filtered away by spring

management.metrics.distribution.slo.http.client.requests.active=
management.metrics.distribution.slo.http.server.requests.active=

And the following would cause error due validation error in validate/slo

management.metrics.distribution.slo.http.client.requests.active=0
management.metrics.distribution.slo.http.server.requests.active=0

To disable active metrics spring should detect "-1", "0", "remove" string and set internally active slo to "new double[0]" which pass validate and remove the buckets.

Spring has nice feature to enable percentile-histogram or percentile for all metrics buckets

management.prometheus.metrics.distribution.percentile-histogram.all=10, 50, 100, 500, 1s, 5s, 10s, 20s, 30s, 40s
management.prometheus.metrics.distribution.percentiles.all=0.5, 0.9

Unfortunately "all" work only when "lookupWithFallbackToAll" is used, thus does not work for

  • slo
  • min
  • max

which use convert which also discard for LONG_TASK_TIMER. Thus value limited as min/max must use at least explicit metrics name prefix.

Also disabling active metrics must be done for each active metrics. Would be great to have "all.active" which would disable all active distribution config.

Could be done by adding the following code into, would always remove active distribution.
Proper all.active would be more complex to allow mixing all.active and "xxx".active as dolookup by default should match also properties wo active all.active not defined.

lookupWithFallbackToAll

        private <T> T lookupWithFallbackToAll(Map<String, T> values, String name, T defaultValue) {
            if (values.isEmpty()) {
                return defaultValue;
            }
			// start add all.active handling, filter always. 
            if (name.endsWith("active") && values.containsKey("all.active")) {
                return (T) values.get("all.active");
            }
			// end add all.active handling/			
            return doLookup(values, name, () -> values.getOrDefault("all", defaultValue));
        }

Summary of the proposed fix:

  • remove meter type LONG_TASK_TIMER value filtering
  • Use always lookupWithFallbackToAll --> all distribution fields support "all" (DistributionStatisticConfig configure)[https://github.com/spring-projects/spring-boot/blob/main/module/spring-boot-micrometer-metrics/src/main/java/org/springframework/boot/micrometer/metrics/autoconfigure/PropertiesMeterFilter.java#L87]
  • support disabling slo for active metrics
  • add support for "all-active" to disable distribution for most/all metrics in

Tested using Springboot 3.5.8 and Java 21

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions