Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Seikilos · 2025-01-22T06:51:46Z

Currently the Open AI API Impl (0.3.8) does not return the usage information (how many tokens were given, how many were returned) when streaming is enabled.

This has been an issue for Open AI as well for some time. They have, however, fixed it (https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156) by adding another option to the streaming interface:

"stream": true,
"stream_options": {
  "include_usage": true
}

Resulting in a usage block being returned on the DONE event.

Without this I am either forced to disable streaming, which is not optimal for slow models on my machine or never be able to reliably know when I reach the context limit.

(To get the configured context limit, I cannot use the Open AI API but have to resort to LM Studio's API implementation of /api/v0/models/. Not optimal but still a way to get the total context size. Also FYI, the completions API of LM (not OpenAI): /api/v0/chat/completions does also not contain the token stats during streaming)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Seikilos commented Jan 22, 2025

Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Comments

Seikilos commented Jan 22, 2025