[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template #30648

seokhyunan · 2025-12-14T14:02:50Z

Purpose

Summary

Fixes chat postprocessing to drop empty assistant tool_calls lists.
Ensures chat templates correctly identify these messages as text responses rather than tool calls, preventing assistant content from being omitted.
Leaves non-empty tool calls unchanged while continuing to normalize their arguments.

Problem

When using gpt-oss via the vllm serve Chat API, model_response.choices[0].message.model_dump(exclude_none=True) includes tool_calls=[].
If this empty list is passed back into the next payload’s messages, the chat template incorrectly routes logic to the tool-call branch. Consequently, the assistant's text content is dropped/ignored. (See the Testing section for details).

Fix

Modified _postprocess_messages in vllm/entrypoints/chat_utils.py to remove empty assistant tool_calls before argument normalization.
This ensures the chat template treats the message as standard assistant content, while valid tool calls still undergo argument parsing/normalization.

Test Plan

Test code

from openai import OpenAI
import json
import urllib

MODEL_ID = "openai/gpt-oss-20b"
client = OpenAI(base_url="http://localhost:8000/v1", api_key=MODEL_ID, timeout=1800)
DETOK_BASE = "http://localhost:8000/detokenize"

def vllm_openai_tokenizer_request(payload):
    url = DETOK_BASE
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {MODEL_ID}"
    }
    data = json.dumps(payload).encode("utf-8")
    req = urllib.request.Request(url, data=data, headers=headers)
    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read().decode("utf-8"))

def vllm_openai_detokenize_token_ids(token_ids):
        payload = {
            "model": MODEL_ID,
            "tokens": token_ids 
        }
        response = vllm_openai_tokenizer_request(payload)
        return response["prompt"]

messages_without_empty_tool_calls = [
    {'role': 'user', 'content': 'Calculate 1+1'},
    {'content': '2', 'role': 'assistant', 'reasoning': 'The user asks: "Calculate 1+1". The answer is 2.'},
    {'role': 'user', 'content': 'What did I ask you to do previously?'},
]

messages_with_empty_tool_calls = [
    {'role': 'user', 'content': 'Calculate 1+1'},
    {'tool_calls': [], 'content': '2', 'role': 'assistant', 'reasoning': 'The user asks: "Calculate 1+1". The answer is 2.'},
    {'role': 'user', 'content': 'What did I ask you to do previously?'},
]

payload_with_empty_tool_calls = {
    "model": MODEL_ID,
    "messages": messages_with_empty_tool_calls,
    "max_tokens": 512,
    "temperature": 0.2,
    "extra_body": {"return_token_ids": True},
}

payload_without_empty_tool_calls = {
    "model": MODEL_ID,
    "messages": messages_without_empty_tool_calls,
    "max_tokens": 512,
    "temperature": 0.2,
    "extra_body": {"return_token_ids": True},
}

response = client.chat.completions.create(**payload_without_empty_tool_calls)
prompt_tokens = response.prompt_token_ids
prompt_without_empty_tool_calls = vllm_openai_detokenize_token_ids(prompt_tokens)
print("Detokenized prompt without empty tool calls:")
print(prompt_without_empty_tool_calls)

response = client.chat.completions.create(**payload_with_empty_tool_calls)
prompt_tokens = response.prompt_token_ids
prompt_with_empty_tool_calls = vllm_openai_detokenize_token_ids(prompt_tokens)
print("\nDetokenized prompt with empty tool calls:")
print(prompt_with_empty_tool_calls)

print("\nAre the prompts identical?", prompt_without_empty_tool_calls == prompt_with_empty_tool_calls)

vllm serve command

vllm serve \
  --model "openai/gpt-oss-20b" \
  --api-key "openai/gpt-oss-20b" \
  --gpu-memory-utilization 0.8 \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Test Result

Before fix

Detokenized prompt without empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Detokenized prompt with empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Are the prompts identical? False

After fix

Detokenized prompt without empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|channel|>final<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Detokenized prompt with empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|channel|>final<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Are the prompts identical? True

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…plates Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

chatgpt-codex-connector · 2025-12-14T14:02:59Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

github-actions · 2025-12-14T14:02:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a bugfix to correctly handle empty tool_calls lists in assistant messages. The change in _postprocess_messages prevents chat templates from misinterpreting these messages as tool calls, which previously caused the assistant's text content to be dropped. The implementation is correct and robust, safely removing the empty tool_calls list while leaving non-empty ones unaffected. The provided test plan thoroughly demonstrates the issue and validates the fix. The change is well-targeted and improves the reliability of chat processing.

chaunceyjiang

Thanks~

…t template (vllm-project#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

…t template (vllm-project#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>

…t template (vllm-project#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

[Bugfix] Drop empty tool_calls lists to keep assistant replies in tem…

31668d9

…plates Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

seokhyunan requested review from aarnphm and chaunceyjiang as code owners December 14, 2025 14:02

seokhyunan changed the title ~~[Bugfix] Drop empty tool_calls lists to keep assistant replies in templates~~ [Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template Dec 14, 2025

mergify bot added the frontend label Dec 14, 2025

gemini-code-assist bot reviewed Dec 14, 2025

View reviewed changes

Merge branch 'main' into fix/chat-empty-tool-calls

3d58f7c

chaunceyjiang approved these changes Dec 15, 2025

View reviewed changes

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 15, 2025

chaunceyjiang enabled auto-merge (squash) December 15, 2025 02:05

chaunceyjiang merged commit b337647 into vllm-project:main Dec 15, 2025
49 checks passed

seokhyunan deleted the fix/chat-empty-tool-calls branch December 15, 2025 10:05

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025

[Bugfix] Drop empty tool_calls lists to keep assistant replies in cha…

a4e60bc

…t template (vllm-project#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

teddygood pushed a commit to teddygood/vllm that referenced this pull request Dec 16, 2025

[Bugfix] Drop empty tool_calls lists to keep assistant replies in cha…

bb3ef07

…t template (vllm-project#30648) Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template #30648

[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template #30648

Uh oh!

seokhyunan commented Dec 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 14, 2025

Uh oh!

github-actions bot commented Dec 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template #30648

[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template #30648

Uh oh!

Conversation

seokhyunan commented Dec 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Summary

Problem

Fix

Test Plan

Test code

vllm serve command

Test Result

Before fix

After fix

Uh oh!

chatgpt-codex-connector bot commented Dec 14, 2025

Uh oh!

github-actions bot commented Dec 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seokhyunan commented Dec 14, 2025 •

edited by github-actions bot

Loading