Skip to content

Conversation

@adrian-cowham
Copy link
Contributor

This allows inference to preserve existing behavior when enforcing per model rates limits/quotas. That is, enforcement is done at time of connection.

…ng. this allows inference to preserve existing behavior when enforcing per model rates limits/quotas. That is, enforcement is done at time of connection.
@chenghao-mou chenghao-mou requested a review from a team January 6, 2026 18:25
@chenghao-mou
Copy link
Member

chenghao-mou commented Jan 6, 2026

/test-stt

I had to try this command. It didn't work 🤷

But changes look good to me. QQ about quota: for fallbacks, the quota/limit is enforced inside Inference, right?

@chenghao-mou chenghao-mou merged commit f6946db into main Jan 7, 2026
17 of 18 checks passed
@chenghao-mou chenghao-mou deleted the ac/inference-model-query-param branch January 7, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants