[Fix] Use pre-tokenized prompts in VLLMwithChatTemplate to avoid modifying model input by suhmily10 · Pull Request #2434 · open-compass/opencompass

suhmily10 · 2026-04-15T13:01:28Z

Summary

VLLMwithChatTemplate.generate() currently calls apply_chat_template(tokenize=False) to produce text, then manually strips the BOS token as a workaround for vLLM re-adding it during tokenization (add_special_tokens=True). This approach silently modifies the model's intended input sequence and can cause incorrect evaluation results for models whose chat templates deliberately include BOS.

This PR fixes the issue by:

Using apply_chat_template(tokenize=True) to obtain token IDs directly
Passing them as pre-tokenized prompts ({"prompt_token_ids": ...}) to vLLM

This preserves the exact token sequence the chat template produces, without any manual modification, and avoids the double-BOS problem entirely.

Motivation

The previous workaround (lines 128-134) had several issues:

Modifies model input — stripping BOS changes the token sequence the model was designed to receive
Fragile — only handles text-level BOS prefix; fails if the tokenizer represents BOS differently
Unnecessary — vLLM natively supports prompt_token_ids, which bypasses its internal tokenization entirely

Test plan

Verified the fix preserves the same token IDs that apply_chat_template produces (no extra/missing BOS)
Run evaluation with a model that has BOS in its chat template (e.g., LLaMA-based) and confirm results match expectations

Made with Cursor

…fying model input The previous code called apply_chat_template(tokenize=False) to get text, then stripped the BOS token as a workaround for vLLM re-adding it during tokenization. This approach modifies the model's intended input sequence. Instead, use apply_chat_template(tokenize=True) to obtain token IDs directly, and pass them as pre-tokenized prompts (prompt_token_ids) to vLLM. This preserves the exact token sequence the chat template produces without any manual modification. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Use pre-tokenized prompts in VLLMwithChatTemplate to avoid modifying model input#2434

[Fix] Use pre-tokenized prompts in VLLMwithChatTemplate to avoid modifying model input#2434
suhmily10 wants to merge 1 commit intoopen-compass:mainfrom
suhmily10:fix/vllm-avoid-modifying-input-sequence

suhmily10 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suhmily10 commented Apr 15, 2026

Summary

Motivation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant