Skip to content

Conversation

@seokhyunan
Copy link
Contributor

@seokhyunan seokhyunan commented Dec 14, 2025

Purpose

Summary

  • Fixes chat postprocessing to drop empty assistant tool_calls lists.
  • Ensures chat templates correctly identify these messages as text responses rather than tool calls, preventing assistant content from being omitted.
  • Leaves non-empty tool calls unchanged while continuing to normalize their arguments.

Problem

  • When using gpt-oss via the vllm serve Chat API, model_response.choices[0].message.model_dump(exclude_none=True) includes tool_calls=[].
  • If this empty list is passed back into the next payload’s messages, the chat template incorrectly routes logic to the tool-call branch. Consequently, the assistant's text content is dropped/ignored. (See the Testing section for details).

Fix

  • Modified _postprocess_messages in vllm/entrypoints/chat_utils.py to remove empty assistant tool_calls before argument normalization.
  • This ensures the chat template treats the message as standard assistant content, while valid tool calls still undergo argument parsing/normalization.

Test Plan

Test code

from openai import OpenAI
import json
import urllib

MODEL_ID = "openai/gpt-oss-20b"
client = OpenAI(base_url="http://localhost:8000/v1", api_key=MODEL_ID, timeout=1800)
DETOK_BASE = "http://localhost:8000/detokenize"

def vllm_openai_tokenizer_request(payload):
    url = DETOK_BASE
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {MODEL_ID}"
    }
    data = json.dumps(payload).encode("utf-8")
    req = urllib.request.Request(url, data=data, headers=headers)
    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read().decode("utf-8"))

def vllm_openai_detokenize_token_ids(token_ids):
        payload = {
            "model": MODEL_ID,
            "tokens": token_ids 
        }
        response = vllm_openai_tokenizer_request(payload)
        return response["prompt"]

messages_without_empty_tool_calls = [
    {'role': 'user', 'content': 'Calculate 1+1'},
    {'content': '2', 'role': 'assistant', 'reasoning': 'The user asks: "Calculate 1+1". The answer is 2.'},
    {'role': 'user', 'content': 'What did I ask you to do previously?'},
]

messages_with_empty_tool_calls = [
    {'role': 'user', 'content': 'Calculate 1+1'},
    {'tool_calls': [], 'content': '2', 'role': 'assistant', 'reasoning': 'The user asks: "Calculate 1+1". The answer is 2.'},
    {'role': 'user', 'content': 'What did I ask you to do previously?'},
]

payload_with_empty_tool_calls = {
    "model": MODEL_ID,
    "messages": messages_with_empty_tool_calls,
    "max_tokens": 512,
    "temperature": 0.2,
    "extra_body": {"return_token_ids": True},
}

payload_without_empty_tool_calls = {
    "model": MODEL_ID,
    "messages": messages_without_empty_tool_calls,
    "max_tokens": 512,
    "temperature": 0.2,
    "extra_body": {"return_token_ids": True},
}

response = client.chat.completions.create(**payload_without_empty_tool_calls)
prompt_tokens = response.prompt_token_ids
prompt_without_empty_tool_calls = vllm_openai_detokenize_token_ids(prompt_tokens)
print("Detokenized prompt without empty tool calls:")
print(prompt_without_empty_tool_calls)

response = client.chat.completions.create(**payload_with_empty_tool_calls)
prompt_tokens = response.prompt_token_ids
prompt_with_empty_tool_calls = vllm_openai_detokenize_token_ids(prompt_tokens)
print("\nDetokenized prompt with empty tool calls:")
print(prompt_with_empty_tool_calls)

print("\nAre the prompts identical?", prompt_without_empty_tool_calls == prompt_with_empty_tool_calls)

vllm serve command

vllm serve \
  --model "openai/gpt-oss-20b" \
  --api-key "openai/gpt-oss-20b" \
  --gpu-memory-utilization 0.8 \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Test Result

Before fix

Detokenized prompt without empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Detokenized prompt with empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Are the prompts identical? False

After fix

Detokenized prompt without empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|channel|>final<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Detokenized prompt with empty tool calls:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-12-14

Reasoning: medium

# Valid channels: analysis, final. Channel must be included for every message.<|end|><|start|>developer<|message|><|end|><|start|>user<|message|>Calculate 1+1<|end|><|start|>assistant<|channel|>final<|message|>2<|end|><|start|>user<|message|>What did I ask you to do previously?<|end|><|start|>assistant

Are the prompts identical? True

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…plates

Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@seokhyunan seokhyunan changed the title [Bugfix] Drop empty tool_calls lists to keep assistant replies in templates [Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template Dec 14, 2025
@mergify mergify bot added the frontend label Dec 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix to correctly handle empty tool_calls lists in assistant messages. The change in _postprocess_messages prevents chat templates from misinterpreting these messages as tool calls, which previously caused the assistant's text content to be dropped. The implementation is correct and robust, safely removing the empty tool_calls list while leaving non-empty ones unaffected. The provided test plan thoroughly demonstrates the issue and validates the fix. The change is well-targeted and improves the reliability of chat processing.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 15, 2025
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) December 15, 2025 02:05
@chaunceyjiang chaunceyjiang merged commit b337647 into vllm-project:main Dec 15, 2025
49 checks passed
@seokhyunan seokhyunan deleted the fix/chat-empty-tool-calls branch December 15, 2025 10:05
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025
…t template (vllm-project#30648)

Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
joa-stdn pushed a commit to joa-stdn/vllm that referenced this pull request Dec 15, 2025
…t template (vllm-project#30648)

Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
teddygood pushed a commit to teddygood/vllm that referenced this pull request Dec 16, 2025
…t template (vllm-project#30648)

Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants