-
Notifications
You must be signed in to change notification settings - Fork 47
Support vLLM 0.10.2 #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vLLM 0.10.2 #278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on modernizing the project's infrastructure by upgrading its core dependencies, most notably moving to vLLM version 0.10.2. This upgrade necessitated adjustments to the Docker build environment, updates to other related Python packages like Ray, and modifications to the vLLM engine's configuration to accommodate API changes, such as the deprecation of the V0 engine and changes in logging parameters. These changes ensure continued compatibility and leverage the latest improvements from the vLLM ecosystem.
Highlights
- vLLM Version Upgrade: The
vllmdependency has been updated to support versions up to0.10.2, enabling compatibility with the latest features and fixes. - Dependency Updates: The
ray[default]dependency has been upgraded to>=2.48.0, and the strict version constraint ontransformershas been removed. - Docker Environment Update: The Docker base image has been upgraded to
nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu22.04to align with newer CUDA versions. - vLLM Engine Configuration Alignment: The
use_v1parameter, related to the V0 engine, has been removed from vLLM configurations and tests, reflecting its deprecation in vLLM 0.10.2. - Dynamic Logging Configuration: Implemented conditional logic to handle changes in vLLM's logging parameter (
disable_log_requestsvs.enable_log_requests) based on the installed vLLM version.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates dependencies to support vllm version 0.10.2, including ray and transformers, and updates the CUDA version in the Dockerfile accordingly. The code changes primarily focus on removing the now-obsolete use_v1 configuration and adapting to a parameter rename in vllm.AsyncEngineArgs. My review has identified a critical issue in trinity/common/models/vllm_model.py related to this adaptation. The current implementation for conditionally setting an engine argument is likely to cause a runtime AttributeError. I have provided a detailed explanation and a suggested refactoring to resolve this potential bug. The other changes appear correct and consistent with the goal of the pull request.
|
/unittest-all |
Summary
Failed Tests
Skipped
Tests
Github Test Reporter by CTRF 💚 |
|
/unittest-module-common |
Summary
Failed Tests
Tests
Github Test Reporter by CTRF 💚 |
|
/unittest-module-common |
|
/unittest-all |
Summary
Failed Tests
Tests
Github Test Reporter by CTRF 💚 |
Summary
Skipped
Tests
Github Test Reporter by CTRF 💚 |
|
/unittest-module-common |
Summary
Tests
Github Test Reporter by CTRF 💚 |
Description
As the title says
Checklist
Please check the following items before code is ready to be reviewed.