Example-hybrid-rag

sstarrett4jc · December 2, 2024, 12:20am

Trying to run the microservies again on the example-hybrid-rag. I load the llama-3.1-8b-instruct container without any issue. I use the api key that I create on the build.nvidia site. I also put that key in the NVCF secrets area. Once I run the container I am able to get to the IP address with the curl test. That works perfectly. I’m also able to get to it with the NIM ChatUI.html. However when I try to get to it with the project using either remote or local I get issues. I know the NIM is running correctly because I can get to it with the Curl test and the NIM ChatUI. This used to work great, now I have issues. I’ve reinstalled Ubuntu 22.04. I have 196GB of memory on my system and two x A6000 GPUs. This is the error that I get "
*** ERR: Unable to process query. ***
Message: Response ended prematurely"

sstarrett4jc · December 2, 2024, 12:21am

I should also mention that I am running all of this locally. The NIM and the AI workbench project are on the same maching.

sstarrett4jc · December 2, 2024, 12:31am

More error informaiotn. This is from a fresh start. Looks like a docker issue as well. However I’m able to get Local to work with ungated model loads.

twhitehouse · December 2, 2024, 2:36am

Hi - just seeing this.

I believe that there’ve been difficulties in running the NIM locally on a Windows system.

Best thing would be to follow the directions to run the NIM on a remote.

We may consider refactoring the project to run the NIM locally via Docker Compose instead and I think this would resolve the issue.

sstarrett4jc · December 2, 2024, 2:53am

I just tried to run this from a remote system. I’m getting proxies errors again. I’ve ping the address to the system that has the NIM running and I do in fact get a reply. I even used the NIM ChatUI and that works as well. Just a reminder this used to work flawlessly.

edwli · December 2, 2024, 8:23pm

Thanks for letting us know, yes I have been able to reproduce the issue and have pushed a fix. See here.

Also when using a NIM microservice option locally, be sure to add the right configurations into the project for your particular host environment. See readme here.

lapporatory · May 28, 2025, 2:18am

Could you kindly provide an more detailed example on how to configure the below in W11 machine (WSL2):

Add the following under Environment > Mounts:

A Docker Socket Mount: This is a mount for the docker socket for the container to properly interact with the host Docker Engine.
- Type: Host Mount
- Target: /var/host-run
- Source: /var/run
- Description: Docker socket Host Mount
A Filesystem Mount: This is a mount to properly run and manage your LOCAL_NIM_HOME on the host from inside the project container for generating the model repo.
- Type: Host Mount
- Target: /mnt/host-home
- Source: (Your LOCAL_NIM_HOME location) , for example /mnt/c/Users/<my-user> for Windows or /home/<my-user> for Linux
- Description: Host mount for LOCAL_NIM_HOME

edwli · June 2, 2025, 8:40pm

Docker socket mount: The docker socket host mount is our early solution of mounting the docker service directly in the main project container so you are able to spin up your own ‘sidecar’ NIM container for inference that runs locally on your current system alongside your main project container directly from inside that main project container.

This socket is typically located on your host here: unix:///var/host-run/docker.sock so you can mount it into the main project container when prompted by AI Workbench.

Note: We have since started supporting Docker compose. This may be a more elegant solution for multi-container projects than this previous workaround solution.

Filesystem Mount: Every NIM container is required to have a $LOCAL_NIM_CACHE location attached to it. This is the location on the host machine that will contain and cache the downloaded model weights, etc for the NIM container that gets pulled/run. This can be any location on your host system that has write permissions enabled. For example, you can specify the /tmp location for this mount when prompted. Please see the NIM documentation for details.

Topic		Replies	Views
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	129	November 8, 2024
I have an issue when installing NVIDIA NIM on laptop Models nim	9	789	June 14, 2025
[SUPPORT] Workbench Example Project: NIM Anywhere NVIDIA AI Workbench workbench-example-project , nim	9	210	June 6, 2025
Workbench-example-hybrid-rag Microservice error NVIDIA AI Workbench	1	31	December 2, 2024
[SUPPORT] Workbench Example Project: Hybrid RAG NVIDIA AI Workbench workbench-example-project	107	2459	July 23, 2025
Launch the Reranker NIM : Failing to create container for Visual AI Agent nim , llama	9	124	May 23, 2025
NVIDIA AI Workbench Error When Running Local NVIDIA AI Workbench nim	3	50	June 30, 2025
NVIDIA NIM Container with CUDA out of Memory Problem Docker and NVIDIA Docker cuda , ubuntu , docker , nim , llama3-8b-instruct	2	572	September 20, 2024
A Simple Guide to Deploying Generative AI with NVIDIA NIM Technical Blog nim	9	785	September 8, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	55	March 28, 2025

Example-hybrid-rag

Related topics