Jetson orin nano local small models perform insanely slow

vmikala · June 6, 2024, 12:14pm

Hi community!

We recently purchased the Jetson Orin Nano (Developer Kit) and installed JetPack 6. After launching it, we proceeded with the initial tutorials, such as this text-to-text tutorial: text-generation-webui - NVIDIA Jetson AI Lab.

I followed the tutorial instructions precisely. However, I encountered an issue where the chatbot responds extremely slowly:

Model: llama-2-7b-chat.Q4_0.gguf
Model loader: llama.cpp
n_batch: 512

As the tutorial suggested, I set n-gpu-layers to 128. When I did this, the Jetson froze, and I had to restart it by unplugging and plugging

it back in.

With n-gpu-layers: 0, the chatbot at least works, but it is still extremely slow, and the Jetson becomes very sluggish overall.

Am I missing something in my setup? It seems like the model isn’t running on the GPU, given how slow it is.

Here is the terminal output for reference:

Terminal:

Output generated in 1211.71 seconds (0.02 tokens/s, 29 tokens, context 68, seed 806630856)

I would appreciate any advice or insights on how to resolve this issue. Thank you!

dusty_nv · June 6, 2024, 6:44pm

Hi @vmikala , setting n-gpu-layers=0 disables CUDA, so it is running on the GPU only. See here for steps for mounting SWAP and freeing up more memory:

github.com

dusty-nv/jetson-containers/blob/master/docs/setup.md#mounting-swap

# System Setup

Install the latest version of JetPack 4 on Nano/TX1/TX2, JetPack 5 on Xavier, or JetPack 6 on Orin.  The following versions are supported:

* JetPack 4.6.1+ (>= L4T R32.7.1)
* JetPack 5.1+  (>= L4T R35.2.1)
* JetPack 6.0 DP (L4T R36.2.0)
> [!NOTE]  
> <sup>- Building on/for x86 platforms isn't supported at this time (one can typically install/run packages the upstream way there)</sup><br>
> <sup>- The below steps are optional for [pulling/running](/docs/run.md) existing container images from registry, but recommended for building containers locally.</sup>

## Clone the Repo

This will download and install the jetson-containers utilities:

```bash
git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh
```

This file has been truncated. show original

Also llama.cpp is not the most optimized, so I wouldn’t get too hung up on it before moving on. Although you might want to give Ollama a shot next (there is a page for that on Jetson AI Lab too), which while it also uses llama.cpp underneath, Ollama is generally easier to use and works better out-of-the-box.

There are smaller language models on this page that can be a better fit for Nano too:

system · June 20, 2024, 6:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance Issues with LLM model on NVIDIA Jetson Orin NX (16GB) Jetson Orin NX generative_ai	2	1019	June 13, 2024
Jetpack6.2+TensorRT OOM issue Jetson Orin Nano generative_ai , llama	7	183	February 21, 2025
Jetpack6 llamacpppython Jetson AGX Orin generative_ai , llama	5	454	January 28, 2025
Problems with "Tutorial - text-generation-webui" Jetson Orin Nano generative_ai	6	346	February 24, 2025
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	8	223	May 12, 2025
Jetson Nano kept rebooting when installing Llamaspeak Jetson Nano jetson-inference , generative_ai	5	69	January 13, 2025
Jetson orin nano insanely slow inference speed? Jetson Orin Nano generative_ai	3	1114	May 6, 2024
NanoLLM Studio Error Jetson Orin Nano generative_ai , llama	2	39	February 17, 2025
Jetson Orin Nano Super Dev Kit Performance Jetson Orin Nano cudnn , gemma-2-9b-it , llama-31-8b-instruct , llama	6	686	January 28, 2025
VILA 1.5 3B on Jetson Orin Nano Jetson Orin Nano jetson-inference , inception , generative_ai	4	770	June 5, 2024

Jetson orin nano local small models perform insanely slow

Related topics