Trying to run the microservies again on the example-hybrid-rag. I load the llama-3.1-8b-instruct container without any issue. I use the api key that I create on the build.nvidia site. I also put that key in the NVCF secrets area. Once I run the container I am able to get to the IP address with the curl test. That works perfectly. I’m also able to get to it with the NIM ChatUI.html. However when I try to get to it with the project using either remote or local I get issues. I know the NIM is running correctly because I can get to it with the Curl test and the NIM ChatUI. This used to work great, now I have issues. I’ve reinstalled Ubuntu 22.04. I have 196GB of memory on my system and two x A6000 GPUs. This is the error that I get "
*** ERR: Unable to process query. ***
Message: Response ended prematurely"
I should also mention that I am running all of this locally. The NIM and the AI workbench project are on the same maching.
More error informaiotn. This is from a fresh start. Looks like a docker issue as well. However I’m able to get Local to work with ungated model loads.
Hi - just seeing this.
I believe that there’ve been difficulties in running the NIM locally on a Windows system.
Best thing would be to follow the directions to run the NIM on a remote.
We may consider refactoring the project to run the NIM locally via Docker Compose instead and I think this would resolve the issue.
I just tried to run this from a remote system. I’m getting proxies errors again. I’ve ping the address to the system that has the NIM running and I do in fact get a reply. I even used the NIM ChatUI and that works as well. Just a reminder this used to work flawlessly.
Thanks for letting us know, yes I have been able to reproduce the issue and have pushed a fix. See here.
Also when using a NIM microservice option locally, be sure to add the right configurations into the project for your particular host environment. See readme here.
Could you kindly provide an more detailed example on how to configure the below in W11 machine (WSL2):
- Add the following under Environment > Mounts:
- A Docker Socket Mount: This is a mount for the docker socket for the container to properly interact with the host Docker Engine.
- Type:
Host Mount
- Target:
/var/host-run
- Source:
/var/run
- Description: Docker socket Host Mount
- Type:
- A Filesystem Mount: This is a mount to properly run and manage your LOCAL_NIM_HOME on the host from inside the project container for generating the model repo.
- Type:
Host Mount
- Target:
/mnt/host-home
- Source: (Your LOCAL_NIM_HOME location) , for example
/mnt/c/Users/<my-user>
for Windows or/home/<my-user>
for Linux - Description: Host mount for LOCAL_NIM_HOME
- Type:
Docker socket mount: The docker socket host mount is our early solution of mounting the docker service directly in the main project container so you are able to spin up your own ‘sidecar’ NIM container for inference that runs locally on your current system alongside your main project container directly from inside that main project container.
This socket is typically located on your host here: unix:///var/host-run/docker.sock
so you can mount it into the main project container when prompted by AI Workbench.
Note: We have since started supporting Docker compose. This may be a more elegant solution for multi-container projects than this previous workaround solution.
Filesystem Mount: Every NIM container is required to have a $LOCAL_NIM_CACHE
location attached to it. This is the location on the host machine that will contain and cache the downloaded model weights, etc for the NIM container that gets pulled/run. This can be any location on your host system that has write permissions enabled. For example, you can specify the /tmp
location for this mount when prompted. Please see the NIM documentation for details.