Inquiry on any updated support for tensorrt-llm support nvidia orin AGX?

It does not seem like tensorrt-llm v-0.12.0 supports llama3 models (I tried building engines for llama 3.2-1B), is it possible to build tensorrt-llm from source to get that support? If so, has anyone successfully done so? If there is any updated tensorrt-llm built specifically for the AGX to support newer models I would love to know.

I think the answer is for now, no, for the following reasons.

TensorRT-LLM/v0.12.0-jetson is the only version of trtllm that built the kernels for the jetson/tegra/aarch64 architecture.

TensorRT-LLM/v0.12.0-jetson support-matrix.md lists LLaMA/LLaMA 2/LLaMA 3/LLaMA 3.1


I attempted compilation of -b 0.21.0rc? TensorRT-LLM

It fails at several places on Jetson AGX Orin devkit.

the trtllm kernels won’t build on Orin.

trtllm is built against tensorrt 10.11.0.33. agx orin has tensorrt 10.7.0.23-1.
conan.io requires tensorrt python package version 10.10.0.31 to be installed so that it can then install version 10.11.0.33.

Collecting tensorrt_cu12==10.10.0.31 (from tensorrt~=10.10.0->-r /home/scott/.git/TensorRT-LLM/requirements.txt (line 23))
Using cached tensorrt_cu12-10.10.0.31.tar.gz (18 kB)
Preparing metadata (setup.py) … error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File “”, line 2, in
File “”, line 35, in
File “/tmp/pip-install-2bxsrteq/tensorrt-cu12_4ceb32d6a92145169c08a0fa6ccdfa28/setup.py”, line 71, in
raise RuntimeError(“TensorRT does not currently build wheels for Tegra systems”)
RuntimeError: TensorRT does not currently build wheels for Tegra systems
[end of output RuntimeError: TensorRT does not currently build wheels for Tegra systems]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

I used to be able to compile versions earlier than v0.19.0 or maybe 18.0 but they never worked as the kernels were no built.

Around v0.19.0 trtllm changed its compilation method to using conan.io software package manager running within its own python .venv.

I tried to build tensorrt-llm from source as so:

git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull

make -C docker release_build CUDA_ARCHS="'86-real;87-real'"

I followed the official tutorial.
I had to edit the Dockerfile.multi file but after that it seemed to have built correctly. But I can’t build any engines without running into the Aborted core dumped error. How were you even able to build anything above the official 0.12.0 version? Would you happen to have those docker images still? If so would they support llama3.2?

If not I don’t know how nvidia expects developers to do anything serious on their edge devices if we have to wait for them to provide the support.

TensorRT-LLM/v0.12.0-jetson is the only version of trtllm that built the kernels for the jetson/tegra/aarch64 architecture.

Every other release only builds the cpp *.cu kernel binaries for x86_64.

Up until TensorRT-LLM_15 I was able to build its wheel. but it was never functional as it did not build the kernels. I tried many different methods and looked a the trtllm code of pull requests where individuals tried to build for aarch64_Jetson. The one person that succeded is an Nvidia employee who was building for another purpose and when dusty-nv got trtllm to build v0.12.0-jetson so we would have something that worked.

At some point there existed somewhere at Nvidia the CMakeLists.txt files that would build the kernels for aarch64/jetson but that must be proprietary not-open source code because the files have never been published on {github.com, nvidia · GitLab, nv-tegra.nvidia.com that I’ve been able to find.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.