# DCNv2 ONNX to TensorRT
This repo provides the code necessary to build a custom TensorRT plugin for a network containing the DCNv2 layer types. The setup here assumes that you have already created your ONNX model and just want to convert to TensorRT.
> NOTE: CUDA kernels for this plugin are slightly modified from [tensorRTIntegrate](https://github.com/dlunion/tensorRTIntegrate/blob/master/src/onnxplugin/plugins/DCNv2.cu)
> with a re-write of the TensorRT API to use the plugin itself.
## Overview
Just as a quick overview of what we want to do, once we have our container setup properly with all necessary packages and have our original ONNX model, we can simply follow these few commands to create the TensorRT engine using the custom plugin.
```sh
# Convert attributes of ONNX model
$ python scripts/insert_dcn_plugin.py --input=/models/original.onnx --output=/models/modified.onnx
# Build the TensorRT plugin
$ make -j$(nproc)
# Use trtexec to build and execute the TensorRT engine
$ trtexec --onnx=/models/modified.onnx --plugins=build/dcn_plugin.so --workspace=2000 --saveEngine=/models/dcnv2_trt_fp32.engine
# OR (for FP16)
$ trtexec --onnx=/models/modified.onnx --plugins=build/dcn_plugin.so --workspace=2000 --saveEngine=/models/dcnv2_trt_fp16.engine --fp16
```
Further explanations and customizations are shown below for a more detailed account of what's going on behind the scenes.
## Setup
This material was built on top of the [TensorRT NGC image](https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt) and tested for functionality. TensorRT container versions 20.07 and 20.08 were used for testing of this plugin. We will also need to download [OSS TensorRT](https://github.com/nvidia/TensorRT) so that we can use ONNX GraphSurgeon to make some slight modifications to our ONNX model file.
### With Dockerfile
The easiest way to get started is to use the provided [Dockerfile](Dockerfile) to create the Docker image with all the dependencies pre-installed. To do that, follow these steps:
```sh
# Build the docker image
$ bash scripts/docker/build.sh
# Launch an interactive container (you will need to provide the full path to the directory containing your ONNX model, /models in this example)
$ bash scripts/docker/launch.sh /models
```
You should now be inside the container with all of the dependencies installed. You should be able to run the commands above to modify the ONNX model, build the TensorRT plugin, and then create/run the TensorRT engine for the model.
### Without Dockerfile
If you don't choose to use the Dockerfile, it will be a little bit more work upfront. You will need to clone the TensorRT NGC container manually and install some dependencies before you are able to run the commands for ONNX model conversion, plugin building, and TensorRT engine creation.
The following commands should reproduce the environment that the Dockerfile creates:
```sh
# Clone the TensorRT container
$ docker pull nvcr.io/nvidia/tensorrt:20.08-py3
# Launch the container
$ docker run --gpus 1 \
-v <path_to_onnx_model>:/models \
--name <name_for_docker_container> \
--network host \
--rm \
-i -t \
nvcr.io/nvidia/tensorrt:20.08-py3 \
bash
```
Once inside the container, you will need to install a few things to before getting started:
```sh
# Clone OSS TensorRT
$ git clone -b master https://github.com/nvidia/TensorRT TensorRT
$ cd TensorRT/tools/onnx-graphsurgeon
$ make install
$ cd -
# Install Python bindings for TensorRT
$ /opt/tensorrt/python/python_setup.sh
```
This should give you the same environment that the Dockerfile above will give you. Then you should be able to go through the process of modifying the ONNX model, building the TensorRT plugin, and creating the TensorRT engine for the model.
## ONNX Model Conversion
We will need to do a slight conversion to our ONNX model so that we are able to convert it to a TensorRT engine. The first modification that we will make (which doesn't theoretically have to be done, but makes everything easier) is to replace the ONNX `Plugin` node with a more meaningful `DCNv2_TRT` node. At this point, this is just a placeholder since ONNX doesn't know how to interpret the DCNv2 layer anyway. To do that, we are going to use [ONNX-GraphSurgeon](https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon).
```python
dcn_nodes = [node for node in graph.nodes if node.op == "Plugin"]
for node in dcn_nodes:
node.op = "DCNv2_TRT"
```
This will simply rename all of the `Plugin` nodes to `DCNv2_TRT` and make them easier to find with our TensorRT plugin.
The second thing (arguably more important) is to convert the attributes of the layer from a string into the useable dictionary for the TensorRT plugin to use. Before this conversion, our attributes have 2 fields (`info` and `name`). The `info` field is a string of the following form:
```
{"dilation": [1, 1], "padding": [1, 1], "stride": [1, 1], "deformable_groups": 1}
```
What we actually want is to separate this string into individual attributes so they don't have to be parsed as a string by the TensorRT plugin creator (much more difficult). So we modify the ONNX graph with something similar to the following:
```python
# For each of the "Plugin" nodes
attrs = json.loads(node.attrs["info"])
node.attrs.update(attrs)
del node.attrs["info"]
```
The [insert_dcn_plugin.py script](scripts/insert_dcn_plugin.py) provided with this repo does exactly this and only requires the user to provide the path to the input ONNX model and name of the output model. It can be used as follows:
```sh
python scripts/insert_dcn_plugin.py --input models/<original_onnx_model>.onnx --output models/<modified_onnx_model>.onnx
```
## Plugin
Now that we have all our packages installed, we can now go ahead with building our TensorRT plugin that we will use to convert the ONNX model to TensorRT. We have provided a Makefile that compiles the `.cpp` and `.cu` files as well as links the appropriate libraries (including `-lcudart`, `-lcublas`, `-lnvinfer`, `-lnvparser`, etc.). To use it, you can simply run:
```sh
$ make -j$(nproc)
```
This will produce the necessary shared object file (i.e. `build/dcn_plugin.so`) that will be used to build the TensorRT engine.
## TensorRT Engine
To create the TensorRT engine and test it with the plugin, we will use `trtexec`. This will allow us to run synthetic data through the network to get an idea of the speed of the network as well as output a serialized engine that we can use later. Note that we are giving a large enough workspace as the plugin itself will use some of the workspace to determine the best operations to perform to create the most optimized engine.
```sh
$ trtexec --onnx=<path_to_onnx_model>.onnx --plugins=build/dcn_plugin.so --workspace=2000 --saveEngine=<path_to_output_trt_engine>.engine
```
## Explanation of TensorRT Plugin Development
Now that we have created the TensorRT engine, let's dive a little deeper into how we were able to do that.
### IPluginV2DynamicExt
The first thing that we want to point out is the we are going to base our Plugin off of the [IPluginV2DynamicExt](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin_v2_dynamic_ext.html) class which will give us the ability to use alot of the functionality that TensorRT already has built in. You can see where we built our plugin class around the IPluginV2DynamicExt class [here](DCNv2Plugin.h#L60).
The first thing we want to do is to create our constructor and destructor for our TensorRT plugin (in this case, `DCNv2PluginDynamic`). You can see an example of that [here](DCNv2Plugin.h#L63-69):
```
DCNv2PluginDynamic();
DCNv2PluginDynamic(const void* data, size_t length, const std::string& name);
DCNv2PluginDynamic(DCNv2Parameters param, const std::string& name);
~DCNv2PluginDynamic() override;
```
Note that here, we have 2 different ways that a `DCNv2PluginDynamic` can be created: eit