To clarify, my goal is to obtain actual computer vision (CV) pipeline results—such as object detection, activity recognition, or other visual analytics—from the video using NVIDIA VSS APIs. Currently, I can get only text result using Summarization API, but I am seeking APIs that can return structured CV data (like detected objects, activities, or annotated results) in JSON or similar formats simultaniously.
When I tested NVIDIA VSS, I saw detected objects overlaid on the video frames, as shown in the screenshot below:
I am very interested in your VSS (Video Search and Summarization Agent) project. I rented 8 * A100 GPUs and followed your documentation for setup and testing.
When deploy VSS, I used nvila-highres model (nvila-lite-15b-highres-lita).
Overall, it works well, but I have a few issues related to video analysis.
Video summarization works effectively, but in the Q&A section, the accuracy is low. For example, after summarizing my custom traffic video, when I ask, “Are there any collisions or accidents here?” the response only mentions 1-2 accidents, even though there are many in the video.
Regarding the VSS API, I am unable to access sufficient APIs for alerts, audio summarization, and CV pipeline data. I reviewed the Swagger API after running the VSS service, and I noticed that the alerts API is only providing live stream data. I want an alerts API that can deliver summarized results similar to what I see in your NVIDIA VSS demonstrations.
You previously mentioned that the CV pipeline API would be discussed in a future update. Could you please check again the status of the alerts, audio summarization, and CV pipeline APIs?
In summary, I would like to improve the accuracy of the Q&A, and gain access to APIs for video alerts, audio summarization, and CV pipeline results.
If anything is unclear, please let me know. I appreciate your assistance with these issues.
I am developing a custom project using the NVIDIA VSS API. However, I couldn’t find some APIs that are available in NVIDIA VSS, such as alerts, audio summarization, and CV pipeline APIs.
I am building my project with FastAPI Swagger API.
I can get Swagger API after deploying VSS.
Currently, in Swagger API, it doesn’t include all service apis that provided in VSS.
Therefore, I need a comprehensive API documentation that includes all functions provided by NVIDIA VSS.
Could you please provide or guide me on how to access all these APIs?
Sure. At present, the FastAPI Swagger APIs may not be fully developed yet. We will confirm this as soon as possible. If there is any updates, we will reply promptly.
We suggest that you create a separate topic for this issue and attach your video. We will give it a try on our side.
We will provide the alert-related APIs in the upcoming version first.
Regarding the CV pipeline and the Audio APIs, they will take more time to be implemented. It is still not certain when they will be released.