Skip to content

Conversation

@athitten
Copy link
Contributor

@athitten athitten commented Feb 7, 2026

Adds inference_max_seq_len to ray mbridge deployment path. This was not exposed in the ray mbridge deployment path and was existing only in the pytriton path. Its needed to be set while running deployment for eval benchmarks like humaneval that have a large value of max_tokens.

Signed-off-by: Abhishree <abhishreetm@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@athitten athitten added r0.4.0 Cherry-pick PR to r0.4.0 release branch and removed deploy LLM scripts labels Feb 7, 2026
@athitten
Copy link
Contributor Author

athitten commented Feb 7, 2026

/ok to test 8668e30

@oyilmaz-nvidia oyilmaz-nvidia merged commit 0997912 into main Feb 9, 2026
26 checks passed
@oyilmaz-nvidia oyilmaz-nvidia deleted the athitten/inf_max_seqlen_ray branch February 9, 2026 21:08
ko3n1g pushed a commit that referenced this pull request Feb 9, 2026
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.4.0 Cherry-pick PR to r0.4.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants