We recently purchased the Jetson Orin Nano (Developer Kit) and installed JetPack 6. After launching it, we proceeded with the initial tutorials, such as this text-to-text tutorial: text-generation-webui - NVIDIA Jetson AI Lab.
I followed the tutorial instructions precisely. However, I encountered an issue where the chatbot responds extremely slowly:
Model: llama-2-7b-chat.Q4_0.gguf Model loader: llama.cpp n_batch: 512
As the tutorial suggested, I set n-gpu-layers to 128. When I did this, the Jetson froze, and I had to restart it by unplugging and plugging
it back in.
With n-gpu-layers: 0, the chatbot at least works, but it is still extremely slow, and the Jetson becomes very sluggish overall.
Am I missing something in my setup? It seems like the model isn’t running on the GPU, given how slow it is.
Hi @vmikala , setting n-gpu-layers=0 disables CUDA, so it is running on the GPU only. See here for steps for mounting SWAP and freeing up more memory:
Also llama.cpp is not the most optimized, so I wouldn’t get too hung up on it before moving on. Although you might want to give Ollama a shot next (there is a page for that on Jetson AI Lab too), which while it also uses llama.cpp underneath, Ollama is generally easier to use and works better out-of-the-box.
There are smaller language models on this page that can be a better fit for Nano too: