Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Include "stream_options" for the streaming request in the Open AI API #184

Open
Seikilos opened this issue Jan 22, 2025 · 0 comments

Comments

@Seikilos
Copy link

Currently the Open AI API Impl (0.3.8) does not return the usage information (how many tokens were given, how many were returned) when streaming is enabled.

This has been an issue for Open AI as well for some time. They have, however, fixed it (https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156) by adding another option to the streaming interface:

"stream": true,
"stream_options": {
  "include_usage": true
}

Resulting in a usage block being returned on the DONE event.

Without this I am either forced to disable streaming, which is not optimal for slow models on my machine or never be able to reliably know when I reach the context limit.

(To get the configured context limit, I cannot use the Open AI API but have to resort to LM Studio's API implementation of /api/v0/models/. Not optimal but still a way to get the total context size. Also FYI, the completions API of LM (not OpenAI): /api/v0/chat/completions does also not contain the token stats during streaming)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant