[FEAT] Add per-model token and error Prometheus metrics (part of #699) by ardecode · Pull Request #813 · vllm-project/production-stack

ardecode · 2026-01-31T16:21:35Z

Summary

This PR adds model-level Prometheus metrics to improve observability in the router.

Metrics added

vllm:input_tokens_total with labels [server, model]
vllm:output_tokens_total with labels [server, model]
vllm:request_errors_total with labels [server, model, error_type]

Implementation details

Token usage is extracted from non-streaming responses.
Error metrics capture exception types for easier debugging.

Scope relative to #699

This PR implements part of the metrics proposed in #699.

Not included (follow-ups)

Per-model request count metrics
Per-model latency histograms/summaries
Resource or load indicators

Refs #699

gemini-code-assist · 2026-01-31T16:21:51Z

Summary of Changes

Hello @ardecode, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the observability of the router by introducing model-level Prometheus metrics. It enables detailed tracking of token usage for non-streaming requests and comprehensive error reporting, categorized by model and error type. This enhancement provides crucial insights into model performance and operational issues, forming a vital part of a larger initiative to bolster monitoring capabilities.

Highlights

New Metrics Introduced: Three new model-level Prometheus metrics have been added: "vllm:input_tokens_total", "vllm:output_tokens_total", and "vllm:request_errors_total".
Token Usage Tracking: Input and output token counts are now tracked for non-streaming requests, with labels for the "server" and "model".
Error Tracking: Request errors are now captured and tracked, including the "server", "model", and the specific "error_type" (exception class name).
Model Name Extraction: The "model" name is extracted from the incoming request body to provide granular metrics per model.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces valuable per-model Prometheus metrics for token usage and errors, which will significantly improve observability. The implementation is well-structured. I've identified a minor bug in the token counting logic that could lead to undercounting when a token count is zero and have provided a suggestion to fix it. Overall, this is a great addition.

src/vllm_router/services/request_service/request.py

Add model-level Prometheus metrics to enable granular observability: - vllm:input_tokens_total counter with [server, model] labels - vllm:output_tokens_total counter with [server, model] labels - vllm:request_errors_total counter with [server, model, error_type] labels Token tracking extracts usage data from non-streaming responses. Error tracking captures exception types for debugging. Part of vllm-project#699 Signed-off-by: ardecode <desaiarijit@gmail.com>

ardecode · 2026-01-31T16:30:16Z

@zerofishnoodles ready for review!

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Arijit Desai <desaiarijit@gmail.com>

ardecode · 2026-02-05T00:24:00Z

Hi @ruizhang0101 ready for review!

ruizhang0101

LGTM :)))

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

src/vllm_router/services/request_service/request.py Outdated Show resolved Hide resolved

ardecode force-pushed the feature/prometheus-model-metrics branch from 5f0a68b to 57321a4 Compare January 31, 2026 16:26

Update src/vllm_router/services/request_service/request.py

1a7b895

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Arijit Desai <desaiarijit@gmail.com>

ardecode force-pushed the feature/prometheus-model-metrics branch from d21b316 to 1a7b895 Compare January 31, 2026 17:20

ruizhang0101 approved these changes Feb 5, 2026

View reviewed changes

ruizhang0101 added 6 commits February 5, 2026 10:28

Merge branch 'main' into feature/prometheus-model-metrics

6de1c2e

Merge branch 'main' into feature/prometheus-model-metrics

76c473f

Merge branch 'main' into feature/prometheus-model-metrics

0914231

Merge branch 'main' into feature/prometheus-model-metrics

138572e

Merge branch 'main' into feature/prometheus-model-metrics

91cee7c

Merge branch 'main' into feature/prometheus-model-metrics

7dbb516

ruizhang0101 merged commit 5c93f5c into vllm-project:main Feb 10, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Add per-model token and error Prometheus metrics (part of #699)#813

[FEAT] Add per-model token and error Prometheus metrics (part of #699)#813
ruizhang0101 merged 8 commits intovllm-project:mainfrom
ardecode:feature/prometheus-model-metrics

ardecode commented Jan 31, 2026

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ardecode commented Jan 31, 2026

Uh oh!

ardecode commented Feb 5, 2026

Uh oh!

ruizhang0101 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ardecode commented Jan 31, 2026

Summary

Metrics added

Implementation details

Scope relative to #699

Not included (follow-ups)

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ardecode commented Jan 31, 2026

Uh oh!

ardecode commented Feb 5, 2026

Uh oh!

ruizhang0101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants