How to get VSS API for CV Pipeline

Hello, NVDIA VSS Support team.

I am trying to get VSS API for CV Pipeline.

(VSS API Glossary — Video Search and Summarization Agent)

According to this doc, I can get summarization API for video. But I can’t get the API for CV Pipeline.

When deploy VSS, I used below overrides.yaml file.
overrides.txt (1.7 KB)

I successfully deployed NVIDIA VSS for CV Pipeline. But I can’t get CV pipeline api through doc.
Could you help with this?

Sincerely
Taopik H.

Could you please elaborate on what kind of CV pipeline api is needed? How do you plan to use these api and what feature do you want to achieve? Thanks

To clarify, my goal is to obtain actual computer vision (CV) pipeline results—such as object detection, activity recognition, or other visual analytics—from the video using NVIDIA VSS APIs. Currently, I can get only text result using Summarization API, but I am seeking APIs that can return structured CV data (like detected objects, activities, or annotated results) in JSON or similar formats simultaniously.

When I tested NVIDIA VSS, I saw detected objects overlaid on the video frames, as shown in the screenshot below:

How can I get the same results through an API? I haven’t been able to find a suitable API in the NVIDIA VSS Swagger documentation.

In summary, I am looking for an API that can provide CV pipeline data (such as detected objects on video frames) during the summarization process.

Thanks for considering. Looking forward to your response.

Sincerely
Taopik H.

Thank you for your suggestion. We will discuss this new feature.

Hello Support Team,

I am very interested in your VSS (Video Search and Summarization Agent) project. I rented 8 * A100 GPUs and followed your documentation for setup and testing.
When deploy VSS, I used nvila-highres model (nvila-lite-15b-highres-lita).
Overall, it works well, but I have a few issues related to video analysis.

  1. Video summarization works effectively, but in the Q&A section, the accuracy is low. For example, after summarizing my custom traffic video, when I ask, “Are there any collisions or accidents here?” the response only mentions 1-2 accidents, even though there are many in the video.

  2. Regarding the VSS API, I am unable to access sufficient APIs for alerts, audio summarization, and CV pipeline data. I reviewed the Swagger API after running the VSS service, and I noticed that the alerts API is only providing live stream data. I want an alerts API that can deliver summarized results similar to what I see in your NVIDIA VSS demonstrations.

You previously mentioned that the CV pipeline API would be discussed in a future update. Could you please check again the status of the alerts, audio summarization, and CV pipeline APIs?

In summary, I would like to improve the accuracy of the Q&A, and gain access to APIs for video alerts, audio summarization, and CV pipeline results.

If anything is unclear, please let me know. I appreciate your assistance with these issues.

Sincerely
Taopik H

Sure. Do you want to provide all the APIs in the FastAPI Swagger API page or our python cli client?

I am developing a custom project using the NVIDIA VSS API. However, I couldn’t find some APIs that are available in NVIDIA VSS, such as alerts, audio summarization, and CV pipeline APIs.

I am building my project with FastAPI Swagger API.
I can get Swagger API after deploying VSS.

Screenshot 2025-07-14 091106

Currently, in Swagger API, it doesn’t include all service apis that provided in VSS.
Therefore, I need a comprehensive API documentation that includes all functions provided by NVIDIA VSS.

Could you please provide or guide me on how to access all these APIs?

Sure. At present, the FastAPI Swagger APIs may not be fully developed yet. We will confirm this as soon as possible. If there is any updates, we will reply promptly.

Thank you very much. Looking forward to your response.

Have you tested this with CV enabled? Do you also see this issue when CV pipeline is enabled?

Yes. I have tested with CV enabled. But the result is same.

When deploy for CV enabled, I used below overrides.yaml according to doc.

nim-llm:
  env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: "0,1,2,3"
  - name: NIM_MAX_MODEL_LEN
    value: "128000"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit


vss:
  applicationSpecs:
    vss-deployment:
      securityContext:
        fsGroup: 0
        runAsGroup: 0
        runAsUser: 0
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5 # Or "openai-compat" or "custom" or "nvila"
          - name: MODEL_PATH
            value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
          - name: NVIDIA_VISIBLE_DEVICES
            value: "5,6,7"
          - name: INSTALL_PROPRIETARY_CODECS
            value: "true"
          - name: DISABLE_CV_PIPELINE
            value: "false"
          - name: GDINO_INFERENCE_INTERVAL
            value: "1"
          - name: NUM_CV_CHUNKS_PER_GPU
            value: "2"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit



nemo-embedding:
  applicationSpecs:
    embedding-deployment:
      containers:
        embedding-container:
          env:
          - name: NVIDIA_VISIBLE_DEVICES
            value: '4'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit


nemo-rerank:
  applicationSpecs:
    ranking-deployment:
      containers:
        ranking-container:
          env:
          - name: NVIDIA_VISIBLE_DEVICES
            value: '4'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit

After summarizing I can see Set-Of-Marks (SOM) videos but when Q&A, the accuracy is low.
This is the problem.

We suggest that you create a separate topic for this issue and attach your video. We will give it a try on our side.

We will provide the alert-related APIs in the upcoming version first.
Regarding the CV pipeline and the Audio APIs, they will take more time to be implemented. It is still not certain when they will be released.