深度学习模型优化_基于TensorRT的Caffe_TensorFlow_DarkNet_PyTorch模型加速_支持MODNet视频抠像_yolov4_yolov3_ssd_mo.zip

共110个文件

py：40个

sh：8个

md：8个

65 浏览量 2025-07-15 13:44:07 上传评论收藏 192.12MB ZIP 举报

在当今人工智能领域，深度学习模型优化是提升算法性能、加速模型推理的关键技术。随着深度学习框架的多样化，如Caffe、TensorFlow、DarkNet、PyTorch等，各框架模型的优化方法也随之增多。TensorRT作为一个高性能的深度学习推理（Inference）加速器，专注于部署阶段的优化，它能够提高深度学习模型在GPU上的运行速度和效率，适用于自动驾驶、视频分析等对实时性要求极高的应用场景。本压缩包文件“深度学习模型优化_基于TensorRT的Caffe_TensorFlow_DarkNet_PyTorch模型加速_支持MODNet视频抠像_yolov4_yolov3_ssd_mo.zip”中，包含了多个基于TensorRT的模型优化案例和工具。其中，MODNet是一个专门用于视频抠像的深度学习模型，它的应用场景包括虚拟背景、视频会议、视频内容创建等。MODNet模型的优化能够在保证抠像效果的同时，大幅提高模型的推理速度，从而实现接近实时的视频处理能力。 YOLO（You Only Look Once）模型是目前广泛使用的目标检测模型之一，具有速度快、精度高的特点，其中YOLOv4和YOLOv3是该系列的两个非常流行的版本。YOLO模型在各种安防监控、智能交通、工业检测等领域都有着广泛的应用。YOLO模型的优化能够使得其在边缘计算设备上也能够运行得非常流畅。 SSD（Single Shot MultiBox Detector）模型同样是目标检测领域内的佼佼者，与YOLO相比，SSD在处理小目标方面有其独到之处。优化SSD模型能够进一步提高其在实际应用中的性能表现，尤其是在需要同时兼顾检测速度和精度的场合。在优化这些深度学习模型的过程中，TensorRT不仅可以加速模型推理过程，还可以对模型进行优化以充分利用GPU的计算资源。这包括层融合、精度校准、张量内存优化、并行执行等技术。通过TensorRT优化的模型可以在保持原有精度的前提下，大幅度提升推理速度，从而满足对实时性要求较高的应用场景的需求。此外，TensorRT的优化是针对GPU架构深度定制的，它能够智能地分析模型结构，自动执行优化操作，使得部署到不同GPU设备上的模型能够达到最优性能。因此，对于开发者来说，TensorRT大大简化了深度学习模型的部署流程，使他们可以更加专注于模型架构的设计和创新。本压缩包文件提供了深度学习模型优化的全面解决方案，不仅涵盖了多种流行的深度学习框架，还包含了专门针对视频处理中的MODNet模型优化，以及在实时目标检测领域广泛使用的YOLOv4、YOLOv3和SSD模型的优化。通过TensorRT的强大功能，使得这些模型在保持高精度的同时，实现了显著的速度提升，使得深度学习技术的应用更加广泛和高效。

资源推荐

资源详情

资源评论

收起资源包目录

深度学习模型优化_基于TensorRT的Caffe_TensorFlow_DarkNet_PyTorch模型加速_支持MODNet视频抠像_yolov4_yolov3_ssd_mo.zip （110个子文件）

graphsurgeon.patch-4.2 455B

graphsurgeon.patch-4.2.2 452B

graphsurgeon.patch-4.4 395B

libflattenconcat.so.5 73KB

libflattenconcat.so.6 73KB

calib_yolov4-int8-608.bin 16KB

calib_yolov3-spp-int8-608.bin 8KB

calib_yolov3-int8-608.bin 7KB

calib_yolov4-tiny-int8-416.bin 2KB

calib_yolov3-tiny-int8-416.bin 1KB

deploy.caffemodel 51.05MB

det3_relu.caffemodel 1.49MB

det2_relu.caffemodel 395KB

det1_relu.caffemodel 28KB

Makefile.config 7KB

trtNet.cpp 13KB

create_engines.cpp 8KB

create_engine.cpp 7KB

yolo_layer.cu 14KB

附赠资源.docx 42KB

.gitignore 475B

.gitmodules 113B

common.h 11KB

yolo_layer.h 5KB

trtNet.h 4KB

dog.jpg 160KB

dog_trt_yolov4_416.jpg 136KB

trt_modnet_youtube.jpg 53KB

image_trt_modnet.jpg 40KB

image.jpg 29KB

LICENSE 1KB

Makefile 1KB

Makefile 158B

Makefile 156B

Makefile 109B

README.md 34KB

README_x86.md 5KB

README_mAP.md 5KB

README.md 3KB

README.md 1KB

README.md 572B

README.md 263B

README.md 172B

modnet.onnx 24.7MB

ssd_mobilenet_v2_coco.pb 66.46MB

ssd_mobilenet_v1_coco.pb 27.76MB

ssd_mobilenet_v1_egohands.pb 21.56MB

ssd_mobilenet_v2_egohands.pb 15MB

avengers.png 954KB

golden_retriever.png 820KB

huskies.png 387KB

hands.png 215KB

deploy.prototxt 35KB

det3_relu.prototxt 6KB

det2_relu.prototxt 5KB

det1_relu.prototxt 4KB

pytrt.pxd 725B

yolo_to_onnx.py 42KB

mtcnn.py 17KB

yolo_with_plugins.py 12KB

build_engine.py 11KB

camera.py 10KB

modnet.py 9KB

onnx_to_tensorrt.py 9KB

mobilenetv2.py 7KB

trt_ssd_async.py 6KB

calibrator.py 6KB

trt_googlenet_async.py 6KB

plugins.py 6KB

modnet.py 6KB

trt_modnet.py 5KB

ssd.py 4KB

trt_googlenet.py 4KB

onnx_to_tensorrt.py 4KB

eval_yolo.py 4KB

trt_yolo.py 4KB

eval_ssd.py 3KB

visualization.py 3KB

trt_yolo_mjpeg.py 3KB

trt_yolo_cv.py 3KB

mjpeg.py 3KB

trt_ssd.py 3KB

trt_mtcnn.py 3KB

backbone.py 3KB

background.py 2KB

display.py 2KB

ssd_tf.py 2KB

export.py 2KB

yolo_classes.py 2KB

ssd_classes.py 2KB

gpu_cc.py 1KB

setup.py 1KB

writer.py 942B

test_onnx.py 797B

test_modnet.py 240B

__init__.py 0B

pytrt.pyx 5KB

download_yolo.sh 6KB

install_pycuda.sh 1KB

共 110 条

# tensorrt_demos This repo is a collection of examples demonstrating how to optimize Caffe/TensorFlow/DarkNet/PyTorch models with TensorRT. Highlights: * Run an optimized "MODNet" video matting model at ~21 FPS on Jetson Xavier NX. * Run an optimized "yolov4-416" object detector at ~4.6 FPS on Jetson Nano. * Run an optimized "yolov3-416" object detector at ~4.9 FPS on Jetson Nano. * Run an optimized "ssd_mobilenet_v1_coco" object detector ("trt_ssd_async.py") at 27~28 FPS on Jetson Nano. * Run an optimized "MTCNN" face detector at 6~11 FPS on Jetson Nano. * Run an optimized "GoogLeNet" image classifier at "~16 ms per image (inference only)" on Jetson Nano. Supported hardware: * NVIDIA Jetson - All NVIDIA Jetson Developer Kits, e.g. [Jetson AGX Orin DevKit](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/#advanced-features), [Jetson AGX Xavier DevKit](https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit), [Jetson Xavier NX DevKit](https://developer.nvidia.com/embedded/jetson-xavier-nx-devkit), Jetson TX2 DevKit, [Jetson Nano DevKit](https://developer.nvidia.com/embedded/jetson-nano-developer-kit). * x86_64 PC with modern NVIDIA GPU(s). Refer to [README_x86.md](https://github.com/jkjung-avt/tensorrt_demos/blob/master/README_x86.md) for more information. Table of contents ----------------- * [Prerequisite](#prerequisite) * [Demo #1: GoogLeNet](#googlenet) * [Demo #2: MTCNN](#mtcnn) * [Demo #3: SSD](#ssd) * [Demo #4: YOLOv3](#yolov3) * [Demo #5: YOLOv4](#yolov4) * [Demo #6: Using INT8 and DLA core](#int8_and_dla) * [Demo #7: MODNet](#modnet) <a name="prerequisite"></a> Prerequisite ------------ The code in this repository was tested on Jetson Nano, TX2, and Xavier NX DevKits. In order to run the demos below, first make sure you have the proper version of image (JetPack) installed on the target Jetson system. For example, [Setting up Jetson Nano: The Basics](https://jkjung-avt.github.io/setting-up-nano/) and [Setting up Jetson Xavier NX](https://jkjung-avt.github.io/setting-up-xavier-nx/). More specifically, the target Jetson system must have TensorRT libraries installed. * Demo #1 and Demo #2: works for TensorRT 3.x+, * Demo #3: requires TensoRT 5.x+, * Demo #4 and Demo #5: requires TensorRT 6.x+. * Demo #6 part 1: INT8 requires TensorRT 6.x+ and only works on GPUs with CUDA compute 6.1+. * Demo #6 part 2: DLA core requires TensorRT 7.x+ (is only tested on Jetson Xavier NX). * Demo #7: requires TensorRT 7.x+. You could check which version of TensorRT has been installed on your Jetson system by looking at file names of the libraries. For example, TensorRT v5.1.6 (JetPack-4.2.2) was present on one of my Jetson Nano DevKits. ```shell $ ls /usr/lib/aarch64-linux-gnu/libnvinfer.so* /usr/lib/aarch64-linux-gnu/libnvinfer.so /usr/lib/aarch64-linux-gnu/libnvinfer.so.5 /usr/lib/aarch64-linux-gnu/libnvinfer.so.5.1.6 ``` Furthermore, all demo programs in this repository require "cv2" (OpenCV) module for python3. You could use the "cv2" module which came in the JetPack. Or, if you'd prefer building your own, refer to [Installing OpenCV 3.4.6 on Jetson Nano](https://jkjung-avt.github.io/opencv-on-nano/) for how to build from source and install opencv-3.4.6 on your Jetson system. If you plan to run Demo #3 (SSD), you'd also need to have "tensorflow-1.x" installed. You could probably use the [official tensorflow wheels provided by NVIDIA](https://docs.nvidia.com/deeplearning/frameworks/pdf/Install-TensorFlow-Jetson-Platform.pdf), or refer to [Building TensorFlow 1.12.2 on Jetson Nano](https://jkjung-avt.github.io/build-tensorflow-1.12.2/) for how to install tensorflow-1.12.2 on the Jetson system. Or if you plan to run Demo #4 and Demo #5, you'd need to have "protobuf" installed. I recommend installing "protobuf-3.8.0" using my [install_protobuf-3.8.0.sh](https://github.com/jkjung-avt/jetson_nano/blob/master/install_protobuf-3.8.0.sh) script. This script would take a couple of hours to finish on a Jetson system. Alternatively, doing `pip3 install` with a recent version of "protobuf" should also work (but might run a little bit slowlier). In case you are setting up a Jetson Nano, TX2 or Xavier NX from scratch to run these demos, you could refer to the following blog posts. * [JetPack-4.6](https://jkjung-avt.github.io/jetpack-4.6/) * [JetPack-4.5](https://jkjung-avt.github.io/jetpack-4.5/) * [Setting up Jetson Xavier NX](https://jkjung-avt.github.io/setting-up-xavier-nx/) * [JetPack-4.4 for Jetson Nano](https://jkjung-avt.github.io/jetpack-4.4/) * [JetPack-4.3 for Jetson Nano](https://jkjung-avt.github.io/jetpack-4.3/) <a name="googlenet"></a> Demo #1: GoogLeNet ------------------ This demo illustrates how to convert a prototxt file and a caffemodel file into a TensorRT engine file, and to classify images with the optimized TensorRT engine. Step-by-step: 1. Clone this repository. ```shell $ cd ${HOME}/project $ git clone https://github.com/jkjung-avt/tensorrt_demos.git $ cd tensorrt_demos ``` 2. Build the TensorRT engine from the pre-trained googlenet (ILSVRC2012) model. Note that I downloaded the pre-trained model files from [BVLC caffe](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet) and have put a copy of all necessary files in this repository. ```shell $ cd ${HOME}/project/tensorrt_demos/googlenet $ make $ ./create_engine ``` 3. Build the Cython code. Install Cython if not previously installed. ```shell $ sudo pip3 install Cython $ cd ${HOME}/project/tensorrt_demos $ make ``` 4. Run the "trt_googlenet.py" demo program. For example, run the demo using a USB webcam (/dev/video0) as the input. ```shell $ cd ${HOME}/project/tensorrt_demos $ python3 trt_googlenet.py --usb 0 --width 1280 --height 720 ``` Here's a screenshot of the demo (JetPack-4.2.2, i.e. TensorRT 5). ![A picture of a golden retriever](https://raw.githubusercontent.com/jkjung-avt/tensorrt_demos/master/doc/golden_retriever.png) 5. The demo program supports 5 different image/video inputs. You could do `python3 trt_googlenet.py --help` to read the help messages. Or more specifically, the following inputs could be specified: * `--image test_image.jpg`: an image file, e.g. jpg or png. * `--video test_video.mp4`: a video file, e.g. mp4 or ts. An optional `--video_looping` flag could be enabled if needed. * `--usb 0`: USB webcam (/dev/video0). * `--rtsp rtsp://admin:[email protected]/live.sdp`: RTSP source, e.g. an IP cam. An optional `--rtsp_latency` argument could be used to adjust the latency setting in this case. * `--onboard 0`: Jetson onboard camera. In additional, you could use `--width` and `--height` to specify the desired input image size, and use `--do_resize` to force resizing of image/video file source. The `--usb`, `--rtsp` and `--onboard` video sources usually produce image frames at 30 FPS. If the TensorRT engine inference code runs faster than that (which happens easily on a x86_64 PC with a good GPU), one particular image could be inferenced multiple times before the next image frame becomes available. This causes problem in the object detector demos, since the original image could have been altered (bounding boxes drawn) and the altered image is taken for inference again. To cope with this problem, use the optional `--copy_frame` flag to force copying/cloning image frames internally. 6. Check out my blog post for implementation details: * [Running TensorRT Optimized GoogLeNet on Jetson Nano](https://jkjung-avt.github.io/tensorrt-googlenet/) <a name="mtcnn"></a> Demo #2: MTCNN -------------- This demo builds upon the previous one. It converts 3 sets of prototxt and caffemodel files into 3 TensorRT engines, namely the PNet, RNet and ONet. Then it combines the 3 engine files to implement MTCNN, a very good face detector. Assuming this repository has been cloned at "${HOME}/project/tens

评论收藏

内容反馈