-
Notifications
You must be signed in to change notification settings - Fork 41.9k
Description
Spring Boot introduced active metrics in Spring Boot 3.
By default active metrics contains _count, _sum and _max (http_client_requests_active_seconds_gcount as example)
Spring has the following properties to control distribution metrics. Only one of them is applied in the priority order (in Spring 2 percentile and slo/sla worked simultaneously but OK that only one works simultaneously)
#1 autogenerated buckets with min and max
management.metrics.distribution.percentile-histogram.http=true
management.metrics.distribution.minimum-expected-value.http=20
management.metrics.distribution.maximum-expected-value.http=40s
#2 explicit buckets
management.metrics.distribution.slo.http=10, 50, 100, 500, 1s, 5s, 10s, 20s, 30s, 40s
#3 explicit percentiles
#management.metrics.distribution.percentiles.http = 0.5, 0.9, 0.999Unfortunately active metrics configuration does not work as expected and is inconsistent.
Based on 48410 the normal metrics distribution configuration should be applied to active metrics but currently only percentiles can be applied. jonatan-ivanov proposed to create new issue.
The following are discarded:
- slo are discarded from active metrics
- minimum-expected-value are discarded from active metrics
- maximum-expected-value are discarded from active metrics
They are discarded because MeterValue/getValue will return null if meterType is LONG_TASK_TIME. This same "filtering cause all 3 issues. slo is filtered in convertServiceLevelObjectives, and min/max calling convertMeterValue, both calling dolookup directly, instead of calling lookupWithFallbackToAll.
Percentiles works "accidentally" as it does not use "converted" and call lookupWithFallbackToAll (also percentile-histogram, expiry and buffer length are set because they also use lookupWithFallbackToAll without converter.)
Filtering min/max is very problematic as active metrics use default min value of 120s and max value 2h resulting buckets, which are suitable for real long running task which execute tens of minutes, but are completely false for normal http client and server calls are example. Due filtering min/max the correct values cannot be set.
Fixing fixing slo cause another issue. Once fixed there is no way to remove buckets from active metrics. Only option is to set only one bucket explicit for each metrics like below
management.metrics.distribution.slo.http.client.requests.active=5s
management.metrics.distribution.slo.http.server.requests.active=5sThe following does not work as filtered away by spring
management.metrics.distribution.slo.http.client.requests.active=
management.metrics.distribution.slo.http.server.requests.active=And the following would cause error due validation error in validate/slo
management.metrics.distribution.slo.http.client.requests.active=0
management.metrics.distribution.slo.http.server.requests.active=0To disable active metrics spring should detect "-1", "0", "remove" string and set internally active slo to "new double[0]" which pass validate and remove the buckets.
Spring has nice feature to enable percentile-histogram or percentile for all metrics buckets
management.prometheus.metrics.distribution.percentile-histogram.all=10, 50, 100, 500, 1s, 5s, 10s, 20s, 30s, 40s
management.prometheus.metrics.distribution.percentiles.all=0.5, 0.9
Unfortunately "all" work only when "lookupWithFallbackToAll" is used, thus does not work for
- slo
- min
- max
which use convert which also discard for LONG_TASK_TIMER. Thus value limited as min/max must use at least explicit metrics name prefix.
Also disabling active metrics must be done for each active metrics. Would be great to have "all.active" which would disable all active distribution config.
Could be done by adding the following code into, would always remove active distribution.
Proper all.active would be more complex to allow mixing all.active and "xxx".active as dolookup by default should match also properties wo active all.active not defined.
private <T> T lookupWithFallbackToAll(Map<String, T> values, String name, T defaultValue) {
if (values.isEmpty()) {
return defaultValue;
}
// start add all.active handling, filter always.
if (name.endsWith("active") && values.containsKey("all.active")) {
return (T) values.get("all.active");
}
// end add all.active handling/
return doLookup(values, name, () -> values.getOrDefault("all", defaultValue));
}Summary of the proposed fix:
- remove meter type
LONG_TASK_TIMERvalue filtering - Use always
lookupWithFallbackToAll--> all distribution fields support "all" (DistributionStatisticConfigconfigure)[https://github.com/spring-projects/spring-boot/blob/main/module/spring-boot-micrometer-metrics/src/main/java/org/springframework/boot/micrometer/metrics/autoconfigure/PropertiesMeterFilter.java#L87] - support disabling slo for active metrics
- add support for "all-active" to disable distribution for most/all metrics in
Tested using Springboot 3.5.8 and Java 21