CUDNN Library
CUDNN Library
User Guide
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|2
Chapter1.
INTRODUCTION
cuDNN's convolution routines aim for performance competitive with the fastest GEMM
(matrix multiply) based implementations of such routines while using significantly less
memory.
cuDNN features customizable data layouts, supporting flexible dimension ordering,
striding, and subregions for the 4D tensors used as inputs and outputs to all of its
routines. This flexibility allows easy integration into any neural network implementation
and avoids the input/output transposition steps sometimes necessary with GEMM-based
convolutions.
cuDNN offers a context-based API that allows for easy multithreading and (optional)
interoperability with CUDA streams.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|1
Chapter2.
GENERAL DESCRIPTION
2.1.Programming Model
The cuDNN Library exposes a Host API but assumes that for operations using the GPU,
the necessary data is directly accessible from the device.
An application using cuDNN must initialize a handle to the library context by calling
cudnnCreate(). This handle is explicitly passed to every subsequent library function
that operates on GPU data. Once the application finishes using cuDNN, it can release
the resources associated with the library handle using cudnnDestroy() . This
approach allows the user to explicitly control the library's functioning when using
multiple host threads, GPUs and CUDA Streams. For example, an application can use
cudaSetDevice() to associate different devices with different host threads and in each
of those host threads, use a unique cuDNN handle which directs library calls to the
device associated with it. cuDNN library calls made with different handles will thus
automatically run on different devices. The device associated with a particular cuDNN
context is assumed to remain unchanged between the corresponding cudnnCreate()
and cudnnDestroy() calls. In order for the cuDNN library to use a different device
within the same host thread, the application must set the new device to be used by
calling cudaSetDevice() and then create another cuDNN context, which will be
associated with the new device, by calling cudnnCreate().
2.2.Notation
As of CUDNN v4 we have adopted a mathematicaly-inspired notation for layer inputs
and outputs using x,y,dx,dy,b,w for common layer parameters. This was done to
improve readability and ease of understanding of parameters meaning. All layers now
follow a uniform convention that during inference
y = layerFunction(x, otherParams).
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|2
General Description
where w is the matrix of filter weights, x is the previous layer's data (during
inference), y is the next layer's data, b is the bias and * is the convolution operator.
In backpropagation routines the parameters keep their meanings. dx,dy,dw,db
always refer to the gradient of the final network error function with respect to a given
parameter. So dy in all backpropagation routines always refers to error gradient
backpropagated through the network computation graph so far. Similarly other
parameters in more specialized layers, such as, for instance, dMeans or dBnBias refer to
gradients of the loss function wrt those parameters.
w is used in the API for both the width of the x tensor and convolution filter
matrix. To resolve this ambiguity we use w and filter notation interchangeably for
convolution filter weight matrix. The meaning is clear from the context since the
layer width is always referenced near it's height.
2.3.Tensor Descriptor
The cuDNN Library describes data holding images, videos and any other data with
contents with a generic n-D tensor defined with the following parameters :
The first two dimensions define respectively the batch number n and the number of
features maps c. This tensor definition allows for example to have some dimensions
overlapping each others within the same tensor by having the stride of one dimension
smaller than the product of the dimension and the stride of the next dimension. In
cuDNN, unless specified otherwise, all routines will support tensors with overlapping
dimensions for forward pass input tensors, however, dimensions of the output tensors
cannot overlap. Even though this tensor format supports negative strides (which can be
useful for data mirroring), cuDNN routines do not support tensors with negative strides
unless specified otherwise.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|3
General Description
NCHW
NHWC
CHWN
NCDHW
NDHWC
CDHWN
2.3.4.Fully-packed tensors
A tensor is defined as XYZ-fully-packed if and only if :
the number of tensor dimensions is equal to the number of letters preceding the
fully-packed suffix.
the stride of the i-th dimension is equal to the product of the (i+1)-th dimension by
the (i+1)-th stride.
the stride of the last dimension is 1.
2.3.5.Partially-packed tensors
The partially 'XYZ-packed' terminology only applies in a context of a tensor format
described with a superset of the letters used to define a partially-packed tensor. A
WXYZ tensor is defined as XYZ-packed if and only if :
the strides of all dimensions NOT referenced in the -packed suffix are greater or
equal to the product of the next dimension by the next stride.
the stride of each dimension referenced in the -packed suffix in position i is equal to
the product of the (i+1)-st dimension by the (i+1)-st stride.
if last tensor's dimension is present in the -packed suffix, it's stride is 1.
For example a NHWC tensor WC-packed means that the c_stride is equal to 1 and
w_stride is equal to c_dim x c_stride. In practice, the -packed suffix is usually with
slowest changing dimensions of a tensor but it is also possible to refer to a NCHW tensor
that is only N-packed.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|4
General Description
2.3.7.Overlapping tensors
A tensor is defined to be overlapping if a iterating over a full range of dimensions
produces the same address more than once.
In practice an overlapped tensor will have stride[i-1] < stride[i]*dim[i] for some of the i
from [1,nbDims] interval.
2.4.Thread Safety
The library is thread safe and its functions can be called from multiple host threads,
even with the same handle. When sharing a handle across host threads, extreme care
needs to be taken to ensure that any changes to the handle configuration in one thread
do not adversely affect cuDNN function calls in others. This is especially true for the
destruction of the handle. It is not recommended that multiple threads share the same
cuDNN handle.
2.5.Reproducibility (determinism)
By design, most of cuDNN's routines from a given version generate the same bit-wise
results across runs when executed on GPUs with the same architecture and the same
number of SMs. However, bit-wise reproducibility is not guaranteed across versions,
as the implementation of a given routine may change. With the current release, the
following routines do not guarantee reproducibility because they use atomic operations:
cudnnConvolutionBackwardFilter when
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 or
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3 is used
cudnnConvolutionBackwardData when
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 is used
cudnnPoolingBackward when CUDNN_POOLING_MAX is used
cudnnSpatialTfSamplerBackward
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|5
General Description
uninitialized data (including NaN). The storage data type for alpha[0], beta[0] is float
for HALF and FLOAT tensors, and double for DOUBLE tensors. These parameters are
passed using a host memory pointer.
For improved performance it is advised to use beta[0] = 0.0. Use a non-zero value for
beta[0] only when blending with prior values stored in the output tensor is needed.
In release n+1, the legacy API entry "foo" is remapped to a new API "foo_v<f>"
where f is some cuDNN version anterior to n.
Also in release n+1, the unsuffixed API entry "foo" is modified to have the same
signature as "foo_<n>". "foo_<n>" is retained as-is.
The deprecated former API entry with an anterior suffix _v<f> and new API entry
with suffix _v<n> are maintained in this release.
In release n+2, both suffixed entries of a given entry are removed.
As a rule of thumb, when a routine appears in two forms, one with a suffix and one with
no suffix, the non-suffixed entry is to be treated as deprecated. In this case, it is strongly
advised that users migrate to the new suffixed API entry to guarantee backwards
compatibility in the following cuDNN release. When a routine appears with multiple
suffixes, the unsuffixed API entry is mapped to the higher numbered suffix. In that
case it is strongly advised to use the non-suffixed API entry to guarantee backward
compatibiliy with the following cuDNN release.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|6
Chapter3.
CUDNN DATATYPES REFERENCE
This chapter describes all the types and enums of the cuDNN library API.
3.1.cudnnHandle_t
cudnnHandle_t is a pointer to an opaque structure holding the cuDNN library context.
The cuDNN library context must be created using cudnnCreate() and the returned
handle must be passed to all subsequent library function calls. The context should be
destroyed at the end using cudnnDestroy(). The context is associated with only one
GPU device, the current device at the time of the call to cudnnCreate(). However
multiple contexts can be created on the same GPU device.
3.2.cudnnStatus_t
cudnnStatus_t is an enumerated type used for function status returns. All cuDNN
library functions return their status, which can be one of the following values:
Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_INITIALIZED
CUDNN_STATUS_ALLOC_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|7
Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_ARCH_MISMATCH
CUDNN_STATUS_MAPPING_ERROR
CUDNN_STATUS_EXECUTION_FAILED
CUDNN_STATUS_INTERNAL_ERROR
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_LICENSE_ERROR
3.3.cudnnTensorDescriptor_t
cudnnCreateTensorDescriptor_t is a pointer to an opaque structure holding the
description of a generic n-D dataset. cudnnCreateTensorDescriptor() is used
to create one instance, and one of the routrines cudnnSetTensorNdDescriptor(),
cudnnSetTensor4dDescriptor() or cudnnSetTensor4dDescriptorEx() must be
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|8
3.4.cudnnFilterDescriptor_t
cudnnFilterDescriptor_t is a pointer to an opaque structure holding the description
of a filter dataset. cudnnCreateFilterDescriptor() is used to create one instance,
and cudnnSetFilterDescriptor() must be used to initialize this instance.
3.5.cudnnConvolutionDescriptor_t
cudnnConvolutionDescriptor_t is a pointer to an opaque structure holding the
description of a convolution operation. cudnnCreateConvolutionDescriptor()
is used to create one instance, and cudnnSetConvolutionNdDescriptor() or
cudnnSetConvolution2dDescriptor() must be used to initialize this instance.
3.6.cudnnNanPropagation_t
cudnnNanPropagation_t is an enumerated type used to indicate if some routines
should propagate Nan numbers. This enumerated type is used as a field for the
cudnnActivationDescriptor_t descriptor and cudnnPoolingDescriptor_t
descriptor.
Value
Meaning
CUDNN_NOT_PROPAGATE_NAN
CUDNN_PROPAGATE_NAN
3.7.cudnnActivationDescriptor_t
cudnnActivationDescriptor_t is a pointer to an opaque structure holding the
description of a activation operation. cudnnCreateActivationDescriptor() is used
to create one instance, and cudnnSetActivationDescriptor() must be used to
3.8.cudnnPoolingDescriptor_t
cudnnPoolingDescriptor_t is a pointer to an opaque structure holding
the description of a pooling operation. cudnnCreatePoolingDescriptor()
is used to create one instance, and cudnnSetPoolingNdDescriptor() or
cudnnSetPooling2dDescriptor() must be used to initialize this instance.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|9
3.9.cudnnOpTensorOp_t
cudnnOpTensorOp_t is an enumerated type used to indicate the tensor operation to be
used by the cudnnOpTensor() routine. This enumerated type is used as a field for the
cudnnOpTensorDescriptor_t descriptor.
Value
Meaning
CUDNN_OP_TENSOR_ADD
CUDNN_OP_TENSOR_MUL
CUDNN_OP_TENSOR_MIN
CUDNN_OP_TENSOR_MAX
3.10.cudnnOpTensorDescriptor_t
cudnnOpTensorDescriptor_t is a pointer to an opaque structure holding the
description of a tensor operation, used as a parameter to cudnnOpTensor().
cudnnCreateOpTensorDescriptor() is used to create one instance, and
cudnnSetOpTensorDescriptor() must be used to initialize this instance.
3.11.cudnnDataType_t
cudnnDataType_t is an enumerated type indicating the data type to which a tensor
Meaning
CUDNN_DATA_FLOAT
CUDNN_DATA_DOUBLE
CUDNN_DATA_HALF
3.12.cudnnTensorFormat_t
cudnnTensorFormat_t is an enumerated type used by
cudnnSetTensor4dDescriptor() to create a tensor with a pre-defined layout.
Value
Meaning
CUDNN_TENSOR_NCHW
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|10
Value
Meaning
rows, columns. The strides are implicitly defined
in such a way that the data are contiguous in
memory with no padding between images, feature
maps, rows, and columns; the columns are the
inner dimension and the images are the outermost
dimension.
CUDNN_TENSOR_NHWC
3.13.cudnnConvolutionMode_t
cudnnConvolutionMode_t is an enumerated type used by
cudnnSetConvolutionDescriptor() to configure a convolution descriptor. The
filter used for the convolution can be applied in two different ways, corresponding
mathematically to a convolution or to a cross-correlation. (A cross-correlation is
equivalent to a convolution with its filter rotated by 180 degrees.)
Value
Meaning
CUDNN_CONVOLUTION
CUDNN_CROSS_CORRELATION
3.14.cudnnConvolutionFwdPreference_t
cudnnConvolutionFwdPreference_t is an enumerated type used by
cudnnGetConvolutionForwardAlgorithm() to help the choice of the algorithm used
Meaning
CUDNN_CONVOLUTION_FWD_NO_WORKSPACE
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|11
Value
Meaning
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
In this configuration, the routine
cudnnGetConvolutionForwardAlgorithm() will
3.15.cudnnConvolutionFwdAlgo_t
cudnnConvolutionFwdAlgo_t is an enumerated type that exposes the different
Meaning
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
This algorithm expresses the convolution as a
CUDNN_CONVOLUTION_FWD_ALGO_GEMM
CUDNN_CONVOLUTION_FWD_ALGO_DIRECT
CUDNN_CONVOLUTION_FWD_ALGO_FFT
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED
This algorithm uses the Winograd Transform
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|12
3.16.cudnnConvolutionFwdAlgoPerf_t
cudnnConvolutionFwdAlgoPerf_t is a structure containing performance results
returned by cudnnFindConvolutionForwardAlgorithm().
Member Name
Explanation
cudnnConvolutionFwdAlgo_t algo
cudnnStatus_t status
float time
size_t memory
3.17.cudnnConvolutionBwdFilterPreference_t
cudnnConvolutionBwdFilterPreference_t is an enumerated type used by
cudnnGetConvolutionBackwardFilterAlgorithm() to help the choice of the
Meaning
CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST
In this configuration, the routine
cudnnGetConvolutionBackwardFilterAlgorithm()
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT
In this configuration, the routine
cudnnGetConvolutionBackwardFilterAlgorithm()
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|13
3.18.cudnnConvolutionBwdFilterAlgo_t
cudnnConvolutionBwdFilterAlgo_t is an enumerated type that exposes the different
Meaning
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3
CUDNN_CONVOLUTION_BWD_FILTER_WINOGRAD_NONFUSED
This algorithm uses the Winograd Transform
3.19.cudnnConvolutionBwdFilterAlgoPerf_t
cudnnConvolutionBwdFilterAlgoPerf_t is a structure containing performance
results returned by cudnnFindConvolutionBackwardFilterAlgorithm().
Member Name
Explanation
cudnnConvolutionBwdFilterAlgo_t algo
cudnnStatus_t status
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|14
Member Name
Explanation
CUDNN_STATUS_INTERNAL_ERROR if any
float time
size_t memory
3.20.cudnnConvolutionBwdDataPreference_t
cudnnConvolutionBwdDataPreference_t is an enumerated type used by
cudnnGetConvolutionBackwardDataAlgorithm() to help the choice of the
Meaning
CUDNN_CONVOLUTION_BWD_DATA_NO_WORKSPACE
CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT
In this configuration, the routine
cudnnGetConvolutionBackwardDataAlgorithm()
3.21.cudnnConvolutionBwdDataAlgo_t
cudnnConvolutionBwdDataAlgo_t is an enumerated type that exposes the different
Meaning
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|15
Value
Meaning
the matrix that holds the input tensor data. The
results are deterministic.
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED
This algorithm uses the Winograd Transform
3.22.cudnnConvolutionBwdDataAlgoPerf_t
cudnnConvolutionBwdDataAlgoPerf_t is a structure containing performance results
returned by cudnnFindConvolutionBackwardDataAlgorithm().
Member Name
Explanation
cudnnConvolutionBwdDataAlgo_t algo
cudnnStatus_t status
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|16
Member Name
Explanation
float time
size_t memory
3.23.cudnnSoftmaxAlgorithm_t
cudnnSoftmaxAlgorithm_t is used to select an implementation of the softmax
function used in cudnnSoftmaxForward() and cudnnSoftmaxBackward().
Value
Meaning
CUDNN_SOFTMAX_FAST
CUDNN_SOFTMAX_ACCURATE
CUDNN_SOFTMAX_LOG
3.24.cudnnSoftmaxMode_t
cudnnSoftmaxMode_t is used to select over which data the cudnnSoftmaxForward()
and cudnnSoftmaxBackward() are computing their results.
Value
Meaning
CUDNN_SOFTMAX_MODE_INSTANCE
CUDNN_SOFTMAX_MODE_CHANNEL
3.25.cudnnPoolingMode_t
cudnnPoolingMode_t is an enumerated type passed to
cudnnSetPoolingDescriptor() to select the pooling method to be used by
cudnnPoolingForward() and cudnnPoolingBackward().
Value
Meaning
CUDNN_POOLING_MAX
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|17
Value
Meaning
CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING
The values inside the pooling window will be
CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING
The values inside the pooling window will be
3.26.cudnnActivationMode_t
cudnnActivationMode_t is an enumerated type used to select the neuron activation
function used in cudnnActivationForward() and cudnnActivationBackward().
Value
Meaning
CUDNN_ACTIVATION_SIGMOID
CUDNN_ACTIVATION_RELU
CUDNN_ACTIVATION_TANH
CUDNN_ACTIVATION_CLIPPED_RELU
3.27.cudnnLRNMode_t
cudnnLRNMode_t is an enumerated type used to specify the mode of operation in
cudnnLRNCrossChannelForward() and cudnnLRNCrossChannelBackward().
Value
Meaning
CUDNN_LRN_CROSS_CHANNEL_DIM1
3.28.cudnnDivNormMode_t
cudnnDivNormMode_t is an enumerated type used to specify the
mode of operation in cudnnDivisiveNormalizationForward() and
cudnnDivisiveNormalizationBackward().
Value
Meaning
CUDNN_DIVNORM_PRECOMPUTED_MEANS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|18
Value
Meaning
and the gradient over means is computed
independently. In this mode to yield a net gradient
over the entire LCN computational graph the
destDiffMeans result should be backpropagated
through the user's means layer (which can
be impelemented using average pooling) and
added to the destDiffData tensor produced by
cudnnDivisiveNormalizationBackward.
3.29.cudnnBatchNormMode_t
cudnnBatchNormMode_t is an enumerated type used to specify the mode
of operation in cudnnBatchNormalizationForwardInference(),
cudnnBatchNormalizationForwardTraining(),
cudnnBatchNormalizationBackward() and cudnnDeriveBNTensorDescriptor()
routines.
Value
Meaning
CUDNN_BATCHNORM_PER_ACTIVATION
CUDNN_BATCHNORM_SPATIAL
3.30.cudnnRNNDescriptor_t
cudnnRNNDescriptor_t is a pointer to an opaque structure holding the description of
an RNN operation. cudnnCreateRNNDescriptor() is used to create one instance, and
cudnnSetRNNDescriptor() must be used to initialize this instance.
3.31.cudnnRNNMode_t
cudnnRNNMode_t is an enumerated type used to specify the type of network
used in the cudnnRNNForwardInference(), cudnnRNNForwardTraining(),
cudnnRNNBackwardData() and cudnnRNNBackwardWeights() routines.
Value
CUDNN_RNN_RELU
Meaning
A single-gate recurrent neural network with a ReLU
activation function.
In the forward pass the output ht for a given
iteration can be computed from the recurrent
input ht-1 and the previous layer input xt given
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|19
Value
Meaning
matrices W, R and biases bW, bR from the
following equation:
ht = ReLU(Wixt + Riht-1 + bWi + bRi)
Where ReLU(x) = max(x, 0).
CUDNN_RNN_TANH
CUDNN_LSTM
CUDNN_GRU
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|20
Value
Meaning
rt, h't represent the input, reset, new gates
respectively.
3.32.cudnnDirectionMode_t
cudnnDirectionMode_t is an enumerated type used to specify the recurrence
pattern in the cudnnRNNForwardInference(), cudnnRNNForwardTraining(),
cudnnRNNBackwardData() and cudnnRNNBackwardWeights() routines.
Value
Meaning
CUDNN_UNIDIRECTIONAL
CUDNN_BIDIRECTIONAL
3.33.cudnnRNNInputMode_t
cudnnRNNInputMode_t is an enumerated type used to specify the behavior of the
first layer in the cudnnRNNForwardInference(), cudnnRNNForwardTraining(),
cudnnRNNBackwardData() and cudnnRNNBackwardWeights() routines.
Value
Meaning
CUDNN_LINEAR_INPUT
CUDNN_SKIP_INPUT
3.34.cudnnDropoutDescriptor_t
cudnnDropoutDescriptor_t is a pointer to an opaque structure holding the
description of a dropout operation. cudnnCreateDropoutDescriptor() is used
to create one instance, cudnnSetDropoutDescriptor() is be used to initialize this
instance, cudnnDestroyDropoutDescriptor() is be used to destroy this instance.
3.35.cudnnSpatialTransformerDescriptor_t
cudnnSpatialTransformerDescriptor_t is a pointer to an opaque
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|21
3.36.cudnnSamplerType_t
cudnnSamplerType_t is an enumerated type passed to
cudnnSetSpatialTransformerNdDescriptor() to select the sampler type to be used
by cudnnSpatialTfSamplerForward() and cudnnSpatialTfSamplerBackward().
Value
Meaning
CUDNN_SAMPLER_BILINEAR
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|22
Chapter4.
CUDNN API REFERENCE
This chapter describes the API of all the routines of the cuDNN library.
4.1.cudnnGetVersion
size_t cudnnGetVersion()
This function returns the version number of the cuDNN Library. It returns the
CUDNN_VERSION define present in the cudnn.h header file. Starting with release R2, the
routine can be used to identify dynamically the current cuDNN Library used by the
application. The define CUDNN_VERSION can be used to have the same application linked
against different cuDNN versions using conditional compilation statements.
4.2.cudnnGetErrorString
const char * cudnnGetErrorString(cudnnStatus_t status)
4.3.cudnnCreate
cudnnStatus_t cudnnCreate(cudnnHandle_t *handle)
This function initializes the cuDNN library and creates a handle to an opaque
structure holding the cuDNN library context. It allocates hardware resources on
the host and device and must be called prior to making any other cuDNN library
calls. The cuDNN library context is tied to the current CUDA device. To use the
library on multiple devices, one cuDNN handle needs to be created for each device.
For a given device, multiple cuDNN handles with different configurations (e.g.,
different current CUDA streams) may be created. Because cudnnCreate allocates
some internal resources, the release of those resources by calling cudnnDestroy will
implicitly call cudaDeviceSynchronize; therefore, the recommended best practice
is to call cudnnCreate/cudnnDestroy outside of performance-critical code paths.
For multithreaded applications that use the same device from different threads, the
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|23
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_INITIALIZED
CUDNN_STATUS_ALLOC_FAILED
4.4.cudnnDestroy
cudnnStatus_t cudnnDestroy(cudnnHandle_t handle)
This function releases hardware resources used by the cuDNN library. This function
is usually the last call with a particular handle to the cuDNN library. Because
cudnnCreate allocates some internal resources, the release of those resources by
calling cudnnDestroy will implicitly call cudaDeviceSynchronize; therefore,
the recommended best practice is to call cudnnCreate/cudnnDestroy outside of
performance-critical code paths.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_INITIALIZED
4.5.cudnnSetStream
cudnnStatus_t cudnnSetStream(cudnnHandle_t handle, cudaStream_t streamId)
This function sets the cuDNN library stream, which will be used to execute all
subsequent calls to the cuDNN library functions with that particular handle. If the
cuDNN library stream is not set, all kernels use the default (NULL) stream. In particular,
this routine can be used to change the stream between kernel launches and then to reset
the cuDNN library stream back to NULL.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.6.cudnnGetStream
cudnnStatus_t cudnnGetStream(cudnnHandle_t handle, cudaStream_t *streamId)
This function gets the cuDNN library stream, which is being used to execute all calls to
the cuDNN library functions. If the cuDNN library stream is not set, all kernels use the
default NULL stream.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|24
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.7.cudnnCreateTensorDescriptor
cudnnStatus_t cudnnCreateTensorDescriptor(cudnnTensorDescriptor_t *tensorDesc)
This function creates a generic Tensor descriptor object by allocating the memory needed
to hold its opaque structure. The data is initialized to be all zero.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.8.cudnnSetTensor4dDescriptor
cudnnStatus_t
cudnnSetTensor4dDescriptor( cudnnTensorDescriptor_t
tensorDesc,
cudnnTensorFormat_t format,
cudnnDataType_t dataType,
int n,
int c,
int h,
int w )
This function initializes a previously created generic Tensor descriptor object into a
4D tensor. The strides of the four dimensions are inferred from the format parameter
and set in such a way that the data is contiguous in memory with no padding between
dimensions.
The total size of a tensor including the potential padding between dimensions is
limited to 2 Giga-elements of type datatype.
Param
In/out
Meaning
tensorDesc
input/
output
format
input
Type of format.
datatype
input
Data type.
input
Number of images.
input
input
input
The possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|25
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.9.cudnnSetTensor4dDescriptorEx
cudnnStatus_t
cudnnSetTensor4dDescriptorEx( cudnnTensorDescriptor_t tensorDesc,
cudnnDataType_t dataType,
int n,
int c,
int h,
int w,
int nStride,
int cStride,
int hStride,
int wStride );
This function initializes a previously created generic Tensor descriptor object into a
4D tensor, similarly to cudnnSetTensor4dDescriptor but with the strides explicitly
passed as parameters. This can be used to lay out the 4D tensor in any order or simply to
define gaps between dimensions.
At present, some cuDNN routines have limited support for strides; Those routines will
return CUDNN_STATUS_NOT_SUPPORTED if a Tensor4D object with an unsupported
stride is used. cudnnTransformTensor can be used to convert the data to a
supported layout.
The total size of a tensor including the potential padding between dimensions is
limited to 2 Giga-elements of type datatype.
Param
In/out
Meaning
tensorDesc
input/
output
datatype
input
Data type.
input
Number of images.
input
input
input
nStride
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|26
Param
In/out
Meaning
cStride
input
hStride
input
wStride
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.10.cudnnGetTensor4dDescriptor
cudnnStatus_t
cudnnGetTensor4dDescriptor( cudnnTensorDescriptor_t tensorDesc,
cudnnDataType_t *dataType,
int *n,
int *c,
int *h,
int *w,
int *nStride,
int *cStride,
int *hStride,
int *wStride )
This function queries the parameters of the previouly initialized Tensor4D descriptor
object.
Param
In/out
Meaning
tensorDesc
input
datatype
output
Data type.
output
Number of images.
output
output
output
nStride
output
cStride
output
hStride
output
wStride
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|27
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.11.cudnnSetTensorNdDescriptor
cudnnStatus_t
cudnnSetTensorNdDescriptor( cudnnTensorDescriptor_t
tensorDesc,
cudnnDataType_t dataType,
int nbDims,
int dimA[],
int strideA[])
In/out
Meaning
tensorDesc
input/
output
datatype
input
Data type.
nbDims
input
dimA
input
Array of dimension nbDims that contain the size of the tensor for every
dimension.
strideA
input
Array of dimension nbDims that contain the stride of the tensor for every
dimension.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|28
4.12.cudnnGetTensorNdDescriptor
cudnnStatus_t
cudnnGetTensorNdDescriptor( const cudnnTensorDescriptor_t
int nbDimsRequested,
cudnnDataType_t *dataType,
int *nbDims,
int dimA[],
int strideA[])
tensorDesc,
This function retrieves values stored in a previously initialized Tensor descriptor object.
Param
In/out
Meaning
tensorDesc
input
nbDimsRequested
input
datatype
output
Data type.
nbDims
output
dimA
output
strideA
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.13.cudnnDestroyTensorDescriptor
cudnnStatus_t cudnnDestroyTensorDescriptor(cudnnTensorDescriptor_t tensorDesc)
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|29
4.14.cudnnTransformTensor
cudnnStatus_t
cudnnTransformTensor( cudnnHandle_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
*alpha,
xDesc,
*x,
*beta,
yDesc,
*y )
This function copies the scaled data from one tensor to another tensor with a different
layout. Those descriptors need to have the same dimensions but not necessarily the
same strides. The input and output tensors must not overlap in any way (i.e., tensors
cannot be transformed in place). This function can be used to convert a tensor with an
unsupported format to a supported one.
Param
In/out
Meaning
handle
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the source
value with prior value in the destination tensor as follows: dstValue =
alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for
additional details.
xDesc
input
input
yDesc
input
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.15.cudnnAddTensor
cudnnStatus_t
cudnnAddTensor_(
www.nvidia.com
cuDNN Library
cudnnHandle_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
*alpha,
aDesc,
*A,
*beta,
cDesc,
*C )
DU-06702-001_v5.1|30
This function adds the scaled values of a bias tensor to another tensor. Each dimension
of the bias tensor A must match the corresponding dimension of the destination tensor
C or must be equal to 1. In the latter case, the same value from the bias tensor for those
dimensions will be used to blend into the C tensor.
Up to dimension 5, all tensor formats are supported. Beyond those dimensions, this
routine is not supported
Param
In/out
Meaning
handle
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the source
value with prior value in the destination tensor as follows: dstValue =
alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for
additional details.
aDesc
input
input
cDesc
input
input/
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.16.cudnnOpTensor
cudnnStatus_t
cudnnOpTensor(
www.nvidia.com
cuDNN Library
cudnnHandle_t
const cudnnOpTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
opTensorDesc,
*alpha1,
aDesc,
*A,
*alpha2,
bDesc,
*B,
*beta,
cDesc,
*C )
DU-06702-001_v5.1|31
If the input tensor B is the same tensor as the destination tensor C, then the input tensor
A also must be the same tensor as the destination tensor C.
Up to dimension 5, all tensor formats are supported. Beyond those dimensions, this
routine is not supported
Param
In/out
Meaning
handle
input
opTensorDesc input
alpha1,
alpha2, beta
input
Pointers to scaling factors (in host memory) used to blend the source value
with prior value in the destination tensor as indicated by the above op
equation. Please refer to this section for additional details.
aDesc,
bDesc, cDesc
input
A, B
input
input/
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|32
Return Value
Meaning
CUDNN_STATUS_EXECUTION_FAILED
4.17.cudnnSetTensor
cudnnStatus_t cudnnSetTensor(
cudnnHandle_t
const cudnnTensorDescriptor_t
void
const void
handle,
yDesc,
*y,
*valuePtr );
Meaning
handle input
yDesc input
input/output
valuePtrinput
Pointer in Host memory to a single value. All elements of the y tensor will
be set to value[0]. The data type of the element in value[0] has to match
the data type of tensor y.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.18.cudnnScaleTensor
cudnnStatus_t cudnnScaleTensor( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t yDesc,
void
*y,
const void
*alpha);
In/out
Meaning
handle
input
yDesc
input
input/
output
alpha
input
Pointer in Host memory to a single value that all elements of the tensor
will be scaled with. Please refer to this section for additional details.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|33
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.19.cudnnCreateFilterDescriptor
cudnnStatus_t cudnnCreateFilterDescriptor(cudnnFilterDescriptor_t *filterDesc)
This function creates a filter descriptor object by allocating the memory needed to hold
its opaque structure,
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.20.cudnnSetFilter4dDescriptor
cudnnStatus_t
cudnnSetFilter4dDescriptor( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
cudnnTensorFormat_t format,
int k,
int c,
int h,
int w )
This function initializes a previously created filter descriptor object into a 4D filter.
Filters layout must be contiguous in memory.
Tensor format CUDNN_TENSOR_NHWC has limited support in
cudnnConvolutionForward, cudnnConvolutionBackwardData and
cudnnConvolutionBackwardFilter; please refer to each function's documentation for
more information.
Param
In/out
Meaning
filterDesc
input/
output
datatype
input
Data type.
format
input
Type of format.
input
input
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|34
Param
In/out
Meaning
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.21.cudnnGetFilter4dDescriptor
cudnnStatus_t
cudnnGetFilter4dDescriptor( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t *dataType,
cudnnTensorFormat_t *format,
int *k,
int *c,
int *h,
int *w )
This function queries the parameters of the previouly initialized filter descriptor object.
Param
In/out
Meaning
filterDesc
input
datatype
output
Data type.
format
output
Type of format.
output
output
output
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|35
4.22.cudnnSetFilter4dDescriptor_v3
cudnnStatus_t
cudnnSetFilter4dDescriptor_v3( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
int k,
int c,
int h,
int w )
This function initializes a previously created filter descriptor object into a 4D filter.
Filters layout must be contiguous in memory. When using this routine to set up a filter
descriptor, the filter format is set to CUDNN_TENSOR_NCHW.
This routine is deprecated, cudnnSetFilter4dDescriptor should be used instead.
Param
In/out
Meaning
filterDesc
input/
output
datatype
input
Data type.
input
input
input
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.23.cudnnGetFilter4dDescriptor_v3
cudnnStatus_t
cudnnGetFilter4dDescriptor_v3( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t *dataType,
int *k,
int *c,
int *h,
int *w )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|36
This function queries the parameters of the previouly initialized filter descriptor object.
This routine is deprecated, cudnnGetFilter4dDescriptor should be used instead.
Param
In/out
Meaning
filterDesc
input
datatype
output
Data type.
output
output
output
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.24.cudnnSetFilter4dDescriptor_v4
cudnnStatus_t
cudnnSetFilter4dDescriptor_v4( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
cudnnTensorFormat_t format,
int k,
int c,
int h,
int w )
4.25.cudnnGetFilter4dDescriptor_v4
cudnnStatus_t
cudnnGetFilter4dDescriptor_v4( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t *dataType,
cudnnTensorFormat_t *format,
int *k,
int *c,
int *h,
int *w )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|37
4.26.cudnnSetFilterNdDescriptor
cudnnStatus_t
cudnnSetFilterNdDescriptor( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
cudnnTensorFormat_t format,
int nbDims,
int filterDimA[])
This function initializes a previously created filter descriptor object. Filters layout must
be contiguous in memory.
Tensor format CUDNN_TENSOR_NHWC has limited support in
cudnnConvolutionForward, cudnnConvolutionBackwardData and
cudnnConvolutionBackwardFilter; please refer to each function's documentation for
more information.
Param
In/out
Meaning
filterDesc
input/
output
datatype
input
Data type.
format
input
Type of format.
nbDims
input
filterDimA
input
Array of dimension nbDims containing the size of the filter for each
dimension.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.27.cudnnGetFilterNdDescriptor
cudnnStatus_t
cudnnGetFilterNdDescriptor( const cudnnFilterDescriptor_t wDesc,
int nbDimsRequested,
cudnnDataType_t *dataType,
cudnnTensorFormat_t *format,
int *nbDims,
int filterDimA[])
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|38
Param
In/out
Meaning
wDesc
input
nbDimsRequested
input
datatype
input
Data type.
format
output
Type of format.
nbDims
input
filterDimA
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.28.cudnnSetFilterNdDescriptor_v3
cudnnStatus_t
cudnnSetFilterNdDescriptor_v3( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
int nbDims,
int filterDimA[])
This function initializes a previously created filter descriptor object. Filters layout must
be contiguous in memory. When using this routine to set up a filter descriptor, the filter
format is set to CUDNN_TENSOR_NCHW.
This routine is deprecated, cudnnSetFilterNdDescriptor should be used instead.
Param
In/out
Meaning
filterDesc
input/
output
datatype
input
Data type.
nbDims
input
filterDimA
input
Array of dimension nbDims containing the size of the filter for each
dimension.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|39
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.29.cudnnGetFilterNdDescriptor_v3
cudnnStatus_t
cudnnGetFilterNdDescriptor_v3( const cudnnFilterDescriptor_t wDesc,
int nbDimsRequested,
cudnnDataType_t *dataType,
int *nbDims,
int filterDimA[])
In/out
Meaning
wDesc
input
nbDimsRequested
input
datatype
input
Data type.
nbDims
input
filterDimA
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.30.cudnnSetFilterNdDescriptor_v4
cudnnStatus_t
cudnnSetFilterNdDescriptor_v4( cudnnFilterDescriptor_t filterDesc,
cudnnDataType_t dataType,
cudnnTensorFormat_t format,
int nbDims,
int filterDimA[])
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|40
4.31.cudnnGetFilterNdDescriptor_v4
cudnnStatus_t
cudnnGetFilterNdDescriptor_v4( const cudnnFilterDescriptor_t wDesc,
int nbDimsRequested,
cudnnDataType_t *dataType,
cudnnTensorFormat_t *format,
int *nbDims,
int filterDimA[])
4.32.cudnnDestroyFilterDescriptor
cudnnStatus_t cudnnDestroyFilterDescriptor(cudnnFilterdDescriptor_t filterDesc)
Meaning
CUDNN_STATUS_SUCCESS
4.33.cudnnCreateConvolutionDescriptor
cudnnStatus_t cudnnCreateConvolutionDescriptor(cudnnConvolutionDescriptor_t
*convDesc)
This function creates a convolution descriptor object by allocating the memory needed to
hold its opaque structure,
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.34.cudnnSetConvolution2dDescriptor
cudnnStatus_t
cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
int pad_h,
int pad_w,
int u,
int v,
int upscalex,
int upscaley,
cudnnConvolutionMode_t mode )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|41
Param
In/out
Meaning
convDesc
input/
output
pad_h
input
pad_w
input
input
input
upscalex
input
upscaley
input
mode
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.35.cudnnGetConvolution2dDescriptor
cudnnStatus_t
cudnnGetConvolution2dDescriptor( const cudnnConvolutionDescriptor_t convDesc,
int* pad_h,
int* pad_w,
int* u,
int* v,
int* upscalex,
int* upscaley,
cudnnConvolutionMode_t *mode )
In/out
Meaning
convDesc
input/
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|42
Param
In/out
Meaning
pad_h
output
pad_w
output
output
output
upscalex
output
upscaley
output
mode
output
convolution mode.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.36.cudnnGetConvolution2dForwardOutputDim
cudnnStatus_t
cudnnGetConvolution2dForwardOutputDim( const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
inputTensorDesc,
const cudnnFilterDescriptor_t filterDesc,
int *n,
int *c,
int *h,
int *w )
Param
In/out
Meaning
convDesc
input
inputTensorDescinput
filterDesc
input
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|43
Param
In/out
Meaning
output
output
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_SUCCESS
4.37.cudnnSetConvolutionNdDescriptor
cudnnStatus_t
cudnnSetConvolutionNdDescriptor( cudnnConvolutionDescriptor_t convDesc,
int arrayLength,
int padA[],
int filterStrideA[],
int upscaleA[],
cudnnConvolutionMode_t mode,
cudnnDataType_t dataType )
This function initializes a previously created generic convolution descriptor object into
a n-D correlation. That same convolution descriptor can be reused in the backward path
provided it corresponds to the same layer. The convolution computation will done in the
specified dataType, which can be potentially different from the input/output tensors.
Param
In/out
Meaning
convDesc
input/
output
arrayLength
input
padA
input
filterStrideA
input
upscaleA
input
mode
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|44
Param
In/out
Meaning
datatype
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.38.cudnnGetConvolutionNdDescriptor
cudnnStatus_t
cudnnGetConvolutionNdDescriptor( const cudnnConvolutionDescriptor_t convDesc,
int arrayLengthRequested,
int *arrayLength,
int padA[],
int filterStrideA[],
int upscaleA[],
cudnnConvolutionMode_t *mode,
cudnnDataType_t *dataType )
In/out
Meaning
convDesc
input/
output
arrayLengthRequested
input
arrayLength
output
padA
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|45
Param
In/out
Meaning
filterStrideA
output
upscaleA
output
mode
output
datatype
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.39.cudnnGetConvolutionNdForwardOutputDim
cudnnStatus_t
cudnnGetConvolutionNdForwardOutputDim( const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
inputTensorDesc,
const cudnnFilterDescriptor_t filterDesc,
int nbDims,
int tensorOuputDimA[] )
This function returns the dimensions of the resulting n-D tensor of a nbDims-2-D
convolution, given the convolution descriptor, the input tensor descriptor and the filter
descriptor This function can help to setup the output tensor and allocate the proper
amount of memory prior to launch the actual convolution.
Each dimension of the (nbDims-2)-D images of the output tensor is computed as
followed:
outputDim = 1 + (inputDim + 2*pad - filterDim)/convolutionStride;
Param
In/out
Meaning
convDesc
input
inputTensorDesc,
input
filterDesc
input
nbDims
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|46
Param
In/out
tensorOuputDimA
output
Meaning
Array of dimensions nbDims that contains on exit of this routine the sizes
of the output tensor
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_SUCCESS
4.40.cudnnDestroyConvolutionDescriptor
cudnnStatus_t cudnnDestroyConvolutionDescriptor(cudnnConvolutionDescriptor_t
convDesc)
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|47
4.41.cudnnFindConvolutionForwardAlgorithm
cudnnStatus_t
cudnnFindConvolutionForwardAlgorithm( cudnnHandle_t
const cudnnTensorDescriptor_t
const cudnnFilterDescriptor_t
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionFwdAlgoPerf_t
*perfResults
)
handle,
xDesc,
wDesc,
yDesc,
It is recommend to run this function prior to allocating layer data; doing otherwise
may needlessly inhibit some algorithm options due to resource usage.
Param
In/out
Meaning
handle
input
xDesc
input
wDesc
input
convDesc
input
yDesc
input
requestedAlgoCount
input
returnedAlgoCount
output
perfResults
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|48
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
properly.
xDesc, wDesc or yDesc has fewer than 1
dimension.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
CUDNN_STATUS_ALLOC_FAILED
CUDNN_STATUS_INTERNAL_ERROR
4.42.cudnnFindConvolutionForwardAlgorithmEx
cudnnStatus_t
cudnnFindConvolutionForwardAlgorithmEx( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const cudnnFilterDescriptor_t
wDesc,
const void
*w,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
yDesc,
void
*y,
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionFwdAlgoPerf_t
*perfResults,
void
*workSpace,
size_t
workSpaceSizeInBytes
)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|49
These metrics are written in sorted fashion where the first element has the lowest
compute time.
This function is host blocking.
Param
In/out
Meaning
handle
input
xDesc
input
input
wDesc
input
input
convDesc
input
yDesc
input
input/
output
requestedAlgoCount
input
returnedAlgoCount
output
perfResults
output
workSpace
input
workSpaceSizeInBytes
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
properly.
xDesc, wDesc or yDesc has fewer than 1
dimension.
DU-06702-001_v5.1|50
Return Value
Meaning
CUDNN_STATUS_INTERNAL_ERROR
x, w or y is nil.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
4.43.cudnnGetConvolutionForwardAlgorithm
cudnnStatus_t
cudnnGetConvolutionForwardAlgorithm( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const cudnnFilterDescriptor_t
wDesc,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
yDesc,
cudnnConvolutionFwdPreference_t
preference,
size_t
memoryLimitInbytes,
cudnnConvolutionFwdAlgo_t
*algo
)
This function serves as a heuristic for obtaining the best suited algorithm for
cudnnConvolutionForward for the given layer specifications. Based on the input
preference, this function will either return the fastest algorithm or the fastest algorithm
within a given memory limit. For an exhaustive search for the fastest algorithm, please
use cudnnFindConvolutionForwardAlgorithm.
Param
In/out
Meaning
handle
input
xDesc
input
wDesc
input
convDesc
input
yDesc
input
preference
input
memoryLimitInBytes
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|51
Param
In/out
Meaning
algo
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.44.cudnnGetConvolutionForwardWorkspaceSize
cudnnStatus_t
cudnnGetConvolutionForwardWorkspaceSize( cudnnHandle_t
handle,
const
cudnnTensorDescriptor_t
xDesc,
const
cudnnFilterDescriptor_t
wDesc,
const
cudnnConvolutionDescriptor_t
convDesc,
const
cudnnTensor4dDescriptor_t
yDesc,
cudnnConvolutionFwdAlgo_t
algo,
size_t
*sizeInBytes
)
This function returns the amount of GPU memory workspace the user needs
to allocate to be able to call cudnnConvolutionForward with the specified
algorithm. The workspace allocated will then be passed to the routine
cudnnConvolutionForward. The specified algorithm can be the result of the call to
cudnnGetConvolutionForwardAlgorithm or can be chosen arbitrarily by the user.
Note that not every algorithm is available for every configuration of the input tensor
and/or every configuration of the convolution descriptor.
Param
In/
out
Meaning
handle
input
xDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|52
Param
In/
out
Meaning
wDesc
input
convDesc
input
yDesc
input
algo
input
sizeInBytes output Amount of GPU memory needed as workspace to be able to execute a forward
convolution with the specified algo
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.45.cudnnConvolutionForward
cudnnStatus_t
cudnnConvolutionForward( cudnnHandle_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnFilterDescriptor_t
const void
const cudnnConvolutionDescriptor_t
cudnnConvolutionFwdAlgo_t
void
size_t
workSpaceSizeInBytes,
const void
const cudnnTensorDescriptor_t
void
handle,
*alpha,
xDesc,
*x,
wDesc,
*w,
convDesc,
algo,
*workSpace,
*beta,
yDesc,
*y )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|53
Param
In/out
Meaning
handle
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue =
alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for
additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
wDesc
input
input
Data pointer to GPU memory associated with the filter descriptor wDesc.
convDesc
input
algo
input
workSpace
input
workSpaceSizeInBytes
input
yDesc
input
input/
output
Data pointer to GPU memory associated with the tensor descriptor yDesc
that carries the result of the convolution.
This function supports only four specific combinations of data types for xDesc, wDesc,
convDesc and yDesc. See the following for an exhaustive list of these configurations.
xDesc's, wDesc's and yDesc's
Data Type
TRUE_HALF_CONFIG
CUDNN_DATA_HALF
CUDNN_DATA_HALF
PSEUDO_HALF_CONFIG
CUDNN_DATA_HALF
CUDNN_DATA_FLOAT
FLOAT_CONFIG
CUDNN_DATA_FLOAT
CUDNN_DATA_FLOAT
DOUBLE_CONFIG
CUDNN_DATA_DOUBLE
CUDNN_DATA_DOUBLE
true:
algo is CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
xDesc and yDesc is NHWC HWC-packed
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|54
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|55
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
The possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|56
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
is not 4 or 5
The chosen algo does not support the parameters
provided; see above for exhaustive list of parameter
support for each algo
CUDNN_STATUS_MAPPING_ERROR
CUDNN_STATUS_EXECUTION_FAILED
4.46.cudnnConvolutionBackwardBias
cudnnStatus_t
cudnnConvolutionBackwardBias( cudnnHandle_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
)
handle,
*alpha,
dyDesc,
*dy,
*beta,
dbDesc,
*db
This function computes the convolution function gradient with respect to the bias, which
is the sum of every element belonging to the same feature map across all of the images of
the input tensor. Therefore, the number of elements produced is equal to the number of
features maps of the input tensor.
Param
In/out
Meaning
handle
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|57
Param
In/out
Meaning
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue =
alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for
additional details.
dyDesc
input
dy
input
dbDesc
input
db
output
Data pointer to GPU memory associated with the output tensor descriptor
dbDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.47.cudnnFindConvolutionBackwardFilterAlgorithm
cudnnStatus_t
cudnnFindConvolutionBackwardFilterAlgorithm( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const cudnnTensorDescriptor_t
dyDesc,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnFilterDescriptor_t
dwDesc,
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionBwdFilterAlgoPerf_t
*perfResults
)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|58
fashion where the first element has the lowest compute time.
This function is host blocking.
It is recommend to run this function prior to allocating layer data; doing otherwise
may needlessly inhibit some algorithm options due to resource usage.
Param
In/out
Meaning
handle
input
xDesc
input
dyDesc
input
convDesc
input
dwDesc
input
requestedAlgoCount
input
returnedAlgoCount
output
perfResults
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
properly.
xDesc, dyDesc or dwDesc has fewer than 1
dimension.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
CUDNN_STATUS_ALLOC_FAILED
CUDNN_STATUS_INTERNAL_ERROR
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|59
4.48.cudnnFindConvolutionBackwardFilterAlgorithmEx
cudnnStatus_t
cudnnFindConvolutionBackwardFilterAlgorithmEx( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const cudnnTensorDescriptor_t
dyDesc,
const void
*dy,
const
cudnnConvolutionDescriptor_t
convDesc,
const cudnnFilterDescriptor_t
dwDesc,
void
*dw,
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionBwdFilterAlgoPerf_t
*workSpace,
workSpaceSizeInBytes
*perfResults,
void
size_t
)
In/out
Meaning
handle
input
xDesc
input
input
Data pointer to GPU memory associated with the filter descriptor xDesc.
dyDesc
input
dy
input
convDesc
input
dwDesc
input
dw
input/
output
Data pointer to GPU memory associated with the filter descriptor dwDesc.
The content of this tensor will be overwritten with arbitary values.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|60
Param
In/out
Meaning
requestedAlgoCount
input
returnedAlgoCount
output
perfResults
output
workSpace
input
workSpaceSizeInBytes
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_INTERNAL_ERROR
cuDNN Library
properly.
xDesc, dyDesc or dwDesc has fewer than 1
dimension.
x, dy or dw is nil.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
www.nvidia.com
DU-06702-001_v5.1|61
4.49.cudnnGetConvolutionBackwardFilterAlgorithm
cudnnStatus_t
cudnnGetConvolutionBackwardFilterAlgorithm( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const cudnnTensorDescriptor_t
dyDesc,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnFilterDescriptor_t
dwDesc,
cudnnConvolutionBwdFilterPreference_t
memoryLimitInbytes,
*algo
preference,
size_t
cudnnConvolutionBwdFilterAlgo_t
This function serves as a heuristic for obtaining the best suited algorithm for
cudnnConvolutionBackwardFilter for the given layer specifications. Based on
the input preference, this function will either return the fastest algorithm or the
fastest algorithm within a given memory limit. For an exhaustive search for the fastest
algorithm, please use cudnnFindConvolutionBackwardFilterAlgorithm.
Param
In/out
Meaning
handle
input
xDesc
input
dyDesc
input
convDesc
input
dwDesc
input
preference
input
memoryLimitInbytes
input
It is to specify the maximum amount of GPU memory the user is willing to use
as a workspace. This is currently a placeholder and is not used.
algo
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
www.nvidia.com
cuDNN Library
The numbers of feature maps of the input tensor and output tensor differ.
The dataType of the two tensor descriptors or the filter are different.
DU-06702-001_v5.1|62
4.50.cudnnGetConvolutionBackwardFilterWorkspaceSize
cudnnStatus_t
cudnnGetConvolutionBackwardFilterWorkspaceSize( cudnnHandle_t
handle,
const cudnnTensorDescriptor_t
xDesc,
const cudnnTensorDescriptor_t
dyDesc,
const
cudnnConvolutionDescriptor_t convDesc,
const cudnnFilterDescriptor_t
dwDesc,
cudnnConvolutionFwdAlgo_t
algo,
size_t
*sizeInBytes
)
This function returns the amount of GPU memory workspace the user needs
to allocate to be able to call cudnnConvolutionBackwardFilter with the
specified algorithm. The workspace allocated will then be passed to the routine
cudnnConvolutionBackwardFilter. The specified algorithm can be the result of the
call to cudnnGetConvolutionBackwardFilterAlgorithm or can be chosen arbitrarily
by the user. Note that not every algorithm is available for every configuration of the
input tensor and/or every configuration of the convolution descriptor.
Param In/out
Meaning
handle
input
xDesc
input
dyDesc
input
convDescinput
dwDesc input
algo
input
sizeInBytes
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|63
Return Value
Meaning
CUDNN_STATUS_NOT_SUPPORTED
4.51.cudnnConvolutionBackwardFilter
cudnnStatus_t
cudnnConvolutionBackwardFilter
convDesc,
*workSpace,
( cudnnHandle_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnConvolutionDescriptor_t
handle,
*alpha,
xDesc,
*x,
dyDesc,
*dy,
cudnnConvolutionBwdFilterAlgo_t
void
algo,
size_t
workSpaceSizeInBytes,
const void
const cudnnFilterDescriptor_t
void
*beta,
dwDesc,
*dw )
This function computes the convolution gradient with respect to filter coefficients using
the specified algo, returning results in gradDesc.Scaling factors alpha and beta can be
used to scale the input tensor and the output tensor respectively.
Param
In/out
Meaning
handle
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue =
alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for
additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
dyDesc
input
dy
input
convDesc
input
algo
input
workSpace
input
workSpaceSizeInBytes
input
dwDesc
www.nvidia.com
cuDNN Library
input
DU-06702-001_v5.1|64
Param
In/out
Meaning
dw
input/
output
Data pointer to GPU memory associated with the filter gradient descriptor
dwDesc that carries the result.
This function supports only three specific combinations of data types for xDesc,
dyDesc, convDesc and dwDesc. See the following for an exhaustive list of these
configurations.
Data Type Configurations
PSEUDO_HALF_CONFIG
CUDNN_DATA_HALF
CUDNN_DATA_FLOAT
FLOAT_CONFIG
CUDNN_DATA_FLOAT
CUDNN_DATA_FLOAT
DOUBLE_CONFIG
CUDNN_DATA_DOUBLE
CUDNN_DATA_DOUBLE
true:
algo is CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 or
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
xDesc and dyDesc is NHWC HWC-packed
Data type configuration is PSEUDO_HALF_CONFIG or FLOAT_CONFIG
The convolution is 2-dimensional
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
Deterministic: No
xDesc Format Support: All
dyDesc Format Support: NCHW CHW-packed
Data Type Config Support: All except TRUE_HALF_CONFIG
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
Deterministic: Yes
xDesc Format Support: All
dyDesc Format Support: NCHW CHW-packed
Data Type Config Support: All
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT
Deterministic: Yes
xDesc Format Support: NCHW CHW-packed
dyDesc Format Support: NCHW CHW-packed
Data Type Config Support: PSEUDO_HALF_CONFIG, FLOAT_CONFIG
Notes:
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|65
Deterministic: No
xDesc Format Support: All
dyDesc Format Support: NCHW CHW-packed
Data Type Config Support: All except TRUE_HALF_CONFIG
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED
Deterministic: Yes
xDesc Format Support: All
yDesc Format Support: NCHW CHW-packed
Data Type Config Support: All except DOUBLE_CONFIG
Notes:
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
Deterministic: No
xDesc Format Support: All
dyDesc Format Support: NCDHW CDHW-packed
Data Type Config Support: All except TRUE_HALF_CONFIG
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3
Deterministic: No
xDesc Format Support: NCDHW-fully-packed
dyDesc Format Support: NCDHW-fully-packed
Data Type Config Support: All except TRUE_HALF_CONFIG
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|66
Return Value
Meaning
CUDNN_STATUS_NOT_SUPPORTED
number of dimensions
xDesc and dwDesc have a non-matching
number of dimensions
xDesc has fewer than three number of
dimensions
xDesc, dyDesc and dwDesc have a nonmatching data type.
xDesc and dwDesc have a non-matching
number of input feature maps per image.
striding
xDesc, dyDesc or dwDesc has a number of
dimensions that is not 4 or 5
The chosen algo does not support the
parameters provided; see above for
exhaustive list of parameter support for each
algo
CUDNN_STATUS_MAPPING_ERROR
CUDNN_STATUS_EXECUTION_FAILED
4.52.cudnnFindConvolutionBackwardDataAlgorithm
cudnnStatus_t
cudnnFindConvolutionBackwardDataAlgorithm(cudnnHandle_t
handle,
const cudnnFilterDescriptor_t
wDesc,
const cudnnTensorDescriptor_t
dyDesc,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
dxDesc,
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionBwdFilterAlgoPerf_t
*perfResults );
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|67
It is recommend to run this function prior to allocating layer data; doing otherwise
may needlessly inhibit some algorithm options due to resource usage.
Param
In/out
Meaning
handle
input
wDesc
input
dyDesc
input
convDesc
input
dxDesc
input
requestedAlgoCount
input
returnedAlgoCountoutput
perfResults
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
properly.
wDesc, dyDesc or dxDesc has fewer than 1
dimension.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
CUDNN_STATUS_ALLOC_FAILED
CUDNN_STATUS_INTERNAL_ERROR
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|68
4.53.cudnnFindConvolutionBackwardDataAlgorithmEx
cudnnStatus_t
cudnnFindConvolutionBackwardDataAlgorithmEx(cudnnHandle_t
handle,
const cudnnFilterDescriptor_t
wDesc,
const void
*w,
const cudnnTensorDescriptor_t
dyDesc,
const void
*dy,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
dxDesc,
void
*dx,
const int
requestedAlgoCount,
int
*returnedAlgoCount,
cudnnConvolutionBwdFilterAlgoPerf_t
*perfResults,
void
*workSpace,
size_t
workSpaceSizeInBytes );
In/out
Meaning
handle
input
wDesc
input
input
Data pointer to GPU memory associated with the filter descriptor wDesc.
dyDesc
input
dy
input
Data pointer to GPU memory associated with the filter descriptor dyDesc.
convDesc
input
dxDesc
input
dxDesc
input/
output
Data pointer to GPU memory associated with the tensor descriptor dxDesc.
The content of this tensor will be overwritten with arbitary values.
requestedAlgoCount
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|69
Param
In/out
Meaning
returnedAlgoCountoutput
perfResults
output
workSpace
input
workSpaceSizeInBytes
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_INTERNAL_ERROR
cuDNN Library
properly.
wDesc, dyDesc or dxDesc has fewer than 1
dimension.
w, dy or dx is nil.
Either returnedCount or perfResults is
nil.
requestedCount is less than 1.
www.nvidia.com
DU-06702-001_v5.1|70
4.54.cudnnGetConvolutionBackwardDataAlgorithm
cudnnStatus_t
cudnnGetConvolutionBackwardDataAlgorithm(
handle,
wDesc,
cudnnHandle_t
const cudnnFilterDescriptor_t
const cudnnTensorDescriptor_t
dyDesc,
const cudnnConvolutionDescriptor_t
convDesc,
const cudnnTensorDescriptor_t
dxDesc,
cudnnConvolutionBwdDataPreference_t
preference,
size_t
memoryLimitInbytes,
*algo
cudnnConvolutionBwdDataAlgo_t
)
This function serves as a heuristic for obtaining the best suited algorithm for
cudnnConvolutionBackwardData for the given layer specifications. Based on the
input preference, this function will either return the fastest algorithm or the fastest
algorithm within a given memory limit. For an exhaustive search for the fastest
algorithm, please use cudnnFindConvolutionBackwardDataAlgorithm.
Param
In/out
Meaning
handle
input
wDesc
input
dyDesc
input
convDesc
input
dxDesc
input
preference
input
memoryLimitInbytes
input
algo
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|71
Return Value
Meaning
4.55.cudnnGetConvolutionBackwardDataWorkspaceSize
cudnnStatus_t
cudnnGetConvolutionBackwardDataWorkspaceSize(
handle,
wDesc,
cudnnHandle_t
const cudnnFilterDescriptor_t
const cudnnTensorDescriptor_t
dyDesc,
cudnnConvolutionDescriptor_t
const
convDesc,
const cudnnTensorDescriptor_t
dxDesc,
cudnnConvolutionFwdAlgo_t
algo,
size_t
*sizeInBytes
This function returns the amount of GPU memory workspace the user needs
to allocate to be able to call cudnnConvolutionBackwardData with the
specified algorithm. The workspace allocated will then be passed to the routine
cudnnConvolutionBackwardData. The specified algorithm can be the result of the call
to cudnnGetConvolutionBackwardDataAlgorithm or can be chosen arbitrarily by
the user. Note that not every algorithm is available for every configuration of the input
tensor and/or every configuration of the convolution descriptor.
Param
In/out
Meaning
handle
input
wDesc
input
dyDesc
input
convDesc input
dxDesc
input
algo
input
sizeInBytesoutput
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|72
Return Value
Meaning
CUDNN_STATUS_NOT_SUPPORTED
4.56.cudnnConvolutionBackwardData
cudnnStatus_t
cudnnConvolutionBackwardData( cudnnHandle_t
handle,
const void
*alpha,
const cudnnFilterDescriptor_t
wDesc,
const void
*w,
const cudnnTensorDescriptor_t
dyDesc,
const void
*dy,
const cudnnConvolutionDescriptor_t convDesc,
cudnnConvolutionBwdDataAlgo_t
algo,
void
*workSpace,
size_t
workSpaceSizeInBytes,
const void
*beta,
const cudnnTensorDescriptor_t
dxDesc,
void
*dx );
This function computes the convolution gradient with respect to the output tensor using
the specified algo, returning results in gradDesc. Scaling factors alpha and beta can
be used to scale the input tensor and the output tensor respectively.
Param
In/out
Meaning
handle
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue =
alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for
additional details.
wDesc
input
input
Data pointer to GPU memory associated with the filter descriptor wDesc.
dyDesc
input
dy
input
Data pointer to GPU memory associated with the input differential tensor
descriptor dyDesc.
convDesc
input
algo
input
workSpace
input
workSpaceSizeInBytes
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|73
Param
In/out
Meaning
dxDesc
input
dx
input/
output
Data pointer to GPU memory associated with the output tensor descriptor
dxDesc that carries the result.
This function supports only three specific combinations of data types for wDesc,
dyDesc, convDesc and dxDesc. See the following for an exhaustive list of these
configurations.
Data Type Configurations
PSEUDO_HALF_CONFIG
CUDNN_DATA_HALF
CUDNN_DATA_FLOAT
FLOAT_CONFIG
CUDNN_DATA_FLOAT
CUDNN_DATA_FLOAT
DOUBLE_CONFIG
CUDNN_DATA_DOUBLE
CUDNN_DATA_DOUBLE
true:
algo is CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
dyDesc and dxDesc is NHWC HWC-packed
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
Deterministic: No
dyDesc Format Support: NCHW CHW-packed
dxDesc Format Support: All
Data Type Config Support: All except TRUE_HALF_CONFIG
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
Deterministic: Yes
dyDesc Format Support: NCHW CHW-packed
dxDesc Format Support: All
Data Type Config Support: All
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT
Deterministic: Yes
dyDesc Format Support: NCHW CHW-packed
dxDesc Format Support: NCHW HW-packed
Data Type Config Support: PSEUDO_HALF_CONFIG, FLOAT_CONFIG
Notes:
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|74
Deterministic: Yes
dyDesc Format Support: NCHW CHW-packed
dxDesc Format Support: NCHW HW-packed
Data Type Config Support: PSEUDO_HALF_CONFIG, FLOAT_CONFIG
Notes:
Deterministic: Yes
xDesc Format Support: NCHW CHW-packed
yDesc Format Support: All
Data Type Config Support: PSEUDO_HALF_CONFIG, FLOAT_CONFIG
Notes:
Deterministic: Yes
xDesc Format Support: NCHW CHW-packed
yDesc Format Support: All
Data Type Config Support: All except DOUBLE_CONFIG
Notes:
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
Deterministic: No
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|75
Deterministic: Yes
dyDesc Format Support: NCDHW-fully-packed
dxDesc Format Support: NCDHW-fully-packed
Data Type Config Support: All
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING
Deterministic: Yes
dyDesc Format Support: NCDHW CDHW-packed
dxDesc Format Support: NCDHW DHW-packed
Data Type Config Support: All except TRUE_HALF_CONFIG
Notes:
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
www.nvidia.com
cuDNN Library
striding
DU-06702-001_v5.1|76
Return Value
Meaning
CUDNN_STATUS_MAPPING_ERROR
CUDNN_STATUS_EXECUTION_FAILED
4.57.cudnnSoftmaxForward
cudnnStatus_t
cudnnSoftmaxForward( cudnnHandle_t
cudnnSoftmaxAlgorithm_t
cudnnSoftmaxMode_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
algorithm,
mode,
*alpha,
xDesc,
*x,
*beta,
yDesc,
*y );
In/out
Meaning
handle
input
algorithm input
mode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result
+ beta[0]*priorDstValue. Please refer to this section for additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
yDesc
input
output
Data pointer to GPU memory associated with the output tensor descriptor
yDesc.
The possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|77
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.58.cudnnSoftmaxBackward
cudnnStatus_t
cudnnSoftmaxBackward( cudnnHandle_t
cudnnSoftmaxAlgorithm_t
cudnnSoftmaxMode_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
algorithm,
mode,
*alpha,
yDesc,
*yData,
dyDesc,
*dy,
*beta,
dxDesc,
*dx );
In/out Meaning
handle
input
algorithm input
mode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
yDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor yDesc.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|78
Param
In/out Meaning
dyDesc
input
dy
input
Data pointer to GPU memory associated with the tensor descriptor dyData.
dxDesc
input
dx
output
Data pointer to GPU memory associated with the output tensor descriptor dxDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.59.cudnnCreatePoolingDescriptor
cudnnStatus_t cudnnCreatePoolingDescriptor( cudnnPoolingDescriptor_t*
poolingDesc )
This function creates a pooling descriptor object by allocating the memory needed to
hold its opaque structure,
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.60.cudnnSetPooling2dDescriptor
cudnnStatus_t
cudnnSetPooling2dDescriptor( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
cudnnNanPropagation_t maxpoolingNanOpt,
int windowHeight,
int windowWidth,
int verticalPadding,
int horizontalPadding,
int verticalStride,
int horizontalStride )
This function initializes a previously created generic pooling descriptor object into a 2D
description.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|79
Param
In/out
Meaning
poolingDesc input/
output
mode
input
maxpoolingNanOpt
input
windowHeightinput
windowWidthinput
verticalPadding
input
horizontalPadding
input
verticalStrideinput
horizontalStride
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
The object was set successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the parameters windowHeight, windowWidth, verticalStride,
horizontalStride is negative or mode or maxpoolingNanOpt has an invalid
enumerant value.
4.61.cudnnGetPooling2dDescriptor
cudnnStatus_t
cudnnGetPooling2dDescriptor( const cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t *mode,
cudnnNanPropagation_t *maxpoolingNanOpt,
int *windowHeight,
int *windowWidth,
int *verticalPadding,
int *horizontalPadding,
int *verticalStride,
int *horizontalStride )
In/out
Meaning
poolingDesc
input
mode
output
maxpoolingNanOpt
output
windowHeight
output
windowWidth
output
verticalPadding
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|80
Param
In/out
Meaning
horizontalPaddingoutput
verticalStride
output
horizontalStride output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.62.cudnnSetPoolingNdDescriptor
cudnnStatus_t
cudnnSetPoolingNdDescriptor( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
cudnnNanPropagation_t maxpoolingNanOpt,
int nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
In/out
Meaning
poolingDesc
input/
output
mode
input
maxpoolingNanOpt
input
nbDims
input
windowDimA
output
paddingA
output
strideA
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|81
4.63.cudnnGetPoolingNdDescriptor
cudnnStatus_t
cudnnGetPoolingNdDescriptor( const cudnnPoolingDescriptor_t poolingDesc,
int nbDimsRequested,
cudnnPoolingMode_t *mode,
cudnnNanPropagation_t *maxpoolingNanOpt,
int *nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
In/
out
Meaning
poolingDesc
input
nbDimsRequested
input
mode
maxpoolingNanOpt
input
nbDims
windowDimA output Array of dimension of at least nbDimsRequested that will be filled with the
window parameters from the provided pooling descriptor.
paddingA
output Array of dimension of at least nbDimsRequested that will be filled with the
padding parameters from the provided pooling descriptor.
strideA
output Array of dimension at least nbDimsRequested that will be filled with the stride
parameters from the provided pooling descriptor.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_SUPPORTED
4.64.cudnnSetPooling2dDescriptor_v3
cudnnStatus_t
cudnnSetPooling2dDescriptor_v3( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
int windowHeight,
int windowWidth,
int verticalPadding,
int horizontalPadding,
int verticalStride,
int horizontalStride )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|82
This function initializes a previously created generic pooling descriptor object into a 2D
description.
This routine is deprecated, cudnnSetPooling2dDescriptor should be used instead.
Param
In/out
Meaning
poolingDesc input/
output
mode
input
windowHeightinput
windowWidthinput
verticalPadding
input
horizontalPadding
input
verticalStrideinput
horizontalStride
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
The object was set successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the parameters windowHeight, windowWidth, verticalStride,
horizontalStride is negative or mode has an invalid enumerant value.
4.65.cudnnGetPooling2dDescriptor_v3
cudnnStatus_t
cudnnGetPooling2dDescriptor_v3( const cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t *mode,
int *windowHeight,
int *windowWidth,
int *verticalPadding,
int *horizontalPadding,
int *verticalStride,
int *horizontalStride )
In/out
Meaning
poolingDesc
input
mode
output
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|83
Param
In/out
Meaning
windowHeight
output
windowWidth
output
verticalPadding
output
horizontalPaddingoutput
verticalStride
output
horizontalStride output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.66.cudnnSetPoolingNdDescriptor_v3
cudnnStatus_t
cudnnSetPoolingNdDescriptor_v3( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
int nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
In/out
Meaning
poolingDesc
input/
output
mode
input
nbDims
input
windowDimA
output
paddingA
output
strideA
output
The possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|84
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.67.cudnnGetPoolingNdDescriptor_v3
cudnnStatus_t
cudnnGetPoolingNdDescriptor_v3( const cudnnPoolingDescriptor_t poolingDesc,
int nbDimsRequested,
cudnnPoolingMode_t *mode,
int *nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
Param
In/
out
Meaning
poolingDesc
input
nbDimsRequested
input
mode
nbDims
windowDimA output Array of dimension of at least nbDimsRequested that will be filled with the
window parameters from the provided pooling descriptor.
paddingA
output Array of dimension of at least nbDimsRequested that will be filled with the
padding parameters from the provided pooling descriptor.
strideA
output Array of dimension at least nbDimsRequested that will be filled with the stride
parameters from the provided pooling descriptor.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_NOT_SUPPORTED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|85
4.68.cudnnSetPooling2dDescriptor_v4
cudnnStatus_t
cudnnSetPooling2dDescriptor_v4( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
cudnnNanPropagation_t maxpoolingNanOpt,
int windowHeight,
int windowWidth,
int verticalPadding,
int horizontalPadding,
int verticalStride,
int horizontalStride )
4.69.cudnnGetPooling2dDescriptor_v4
cudnnStatus_t
cudnnGetPooling2dDescriptor_v4( const cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t *mode,
cudnnNanPropagation_t *maxpoolingNanOpt,
int *windowHeight,
int *windowWidth,
int *verticalPadding,
int *horizontalPadding,
int *verticalStride,
int *horizontalStride )
4.70.cudnnSetPoolingNdDescriptor_v4
cudnnStatus_t
cudnnSetPoolingNdDescriptor_v4( cudnnPoolingDescriptor_t poolingDesc,
cudnnPoolingMode_t mode,
cudnnNanPropagation_t maxpoolingNanOpt,
int nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
4.71.cudnnGetPoolingNdDescriptor_v4
cudnnStatus_t
cudnnGetPoolingNdDescriptor_v4( const cudnnPoolingDescriptor_t poolingDesc,
int nbDimsRequested,
cudnnPoolingMode_t *mode,
cudnnNanPropagation_t *maxpoolingNanOpt,
int *nbDims,
int windowDimA[],
int paddingA[],
int strideA[] )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|86
4.72.cudnnDestroyPoolingDescriptor
cudnnStatus_t cudnnDestroyPoolingDescriptor( cudnnPoolingDescriptor_t
poolingDesc )
Meaning
CUDNN_STATUS_SUCCESS
4.73.cudnnGetPooling2dForwardOutputDim
cudnnStatus_t
cudnnGetPooling2dForwardOutputDim( const cudnnPoolingDescriptor_t
const cudnnTensorDescriptor_t
int *outN,
int *outC,
int *outH,
int *outW )
poolingDesc,
inputDesc,
This function provides the output dimensions of a tensor after 2d pooling has been
applied
Each dimension h and w of the output images is computed as followed:
outputDim = 1 + (inputDim + 2*padding - windowDim)/poolingStride;
Param
In/out
Meaning
poolingDesc
input
inputDesc
input
output
output
output
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|87
4.74.cudnnGetPoolingNdForwardOutputDim
cudnnStatus_t
cudnnGetPoolingNdForwardOutputDim( const cudnnPoolingDescriptor_t
const cudnnTensorDescriptor_t
int nbDims,
int outDimA[] )
poolingDesc,
inputDesc,
This function provides the output dimensions of a tensor after Nd pooling has been
applied
Each dimension of the (nbDims-2)-D images of the output tensor is computed as
followed:
outputDim = 1 + (inputDim + 2*padding - windowDim)/poolingStride;
Param
In/out
Meaning
poolingDesc
input
inputDesc
input
nbDims
input
outDimA
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.75.cudnnPoolingForward
cudnnStatus_t
cudnnPoolingForward( cudnnHandle_t
const cudnnPoolingDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
www.nvidia.com
cuDNN Library
handle,
poolingDesc,
*alpha,
xDesc,
*x,
*beta,
yDesc,
*y );
DU-06702-001_v5.1|88
This function computes pooling of input values (i.e., the maximum or average of several
adjacent values) to produce an output with smaller height and/or width.
All tensor formats are supported, best performance is expected when using HWpacked tensors. Only 2 and 3 spatial dimensions are allowed.
Param
In/out
Meaning
handle
input
poolingDesc
input
alpha, beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue =
alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for
additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
yDesc
input
output
Data pointer to GPU memory associated with the output tensor descriptor
yDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|89
4.76.cudnnPoolingBackward
cudnnStatus_t
cudnnPoolingBackward( cudnnHandle_t handle,
const cudnnPoolingDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
poolingDesc,
*alpha,
yDesc,
*y,
dyDesc,
*dy,
xDesc,
*xData,
*beta,
dxDesc,
*dx )
In/
out
Meaning
handle
input
poolingDesc
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
yDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor yDesc.
dyDesc
input
dy
input
Data pointer to GPU memory associated with the tensor descriptor dyData.
xDesc
input
input
Data pointer to GPU memory associated with the output tensor descriptor xDesc.
dxDesc
input
dx
output Data pointer to GPU memory associated with the output tensor descriptor dxDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|90
Return Value
Meaning
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
4.77.cudnnActivationForward
cudnnStatus_t
cudnnActivationForward( cudnnHandle_t handle,
cudnnActivationDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
activationDesc,
*alpha,
srcDesc,
*srcData,
*beta,
destDesc,
*destData )
This routine applies a specified neuron activation function element-wise over each input
value.
In-place operation is allowed for this routine; i.e., xData and yData pointers
may be equal. However, this requires xDesc and yDesc descriptors to be identical
(particularly, the strides of the input and output must match for in-place operation to
be allowed).
All tensor formats are supported for 4 and 5 dimensions, however best performance
is obtained when the strides of xDesc and yDesc are equal and HW-packed. For more
than 5 dimensions the tensors must have their spatial dimensions packed.
Param
In/
out
Meaning
handle
input
activationDesc,
input Activation descriptor.
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|91
Param
In/
out
Meaning
yDesc
input
output Data pointer to GPU memory associated with the output tensor descriptor yDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.78.cudnnActivationBackward
cudnnStatus_t
cudnnActivationBackward( cudnnHandle_t
handle,
cudnnActivationDescriptor_t
activationDesc,
const void
*alpha,
const cudnnTensorDescriptor_t
srcDesc,
const void
*srcData,
const cudnnTensorDescriptor_t
srcDiffDesc,
const void
*srcDiffData,
const cudnnTensorDescriptor_t
destDesc,
const void
*destData,
const void
*beta,
const cudnnTensorDescriptor_t
destDiffDesc,
void
*destDiffData)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|92
Param
In/out Meaning
handle
input
activationDesc,
input
Activation descriptor.
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
yDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor yDesc.
dyDesc
input
dy
input
Data pointer to GPU memory associated with the tensor descriptor dyDesc.
xDesc
input
input
Data pointer to GPU memory associated with the output tensor descriptor xDesc.
dxDesc
input
dx
output
Data pointer to GPU memory associated with the output tensor descriptor dxDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|93
4.79.cudnnCreateActivationDescriptor
cudnnStatus_t
cudnnCreateActivationDescriptor( cudnnActivationDescriptor_t
*activationDesc )
This function creates a activation descriptor object by allocating the memory needed to
hold its opaque structure.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.80.cudnnSetActivationDescriptor
cudnnStatus_t
cudnnSetActivationDescriptor( cudnnActivationDescriptor_t
activationDesc,
cudnnActivationMode_t
cudnnNanPropagation_t
double
mode,
reluNanOpt,
reluCeiling )
In/out
Meaning
activationDesc,
input/
output
mode
input
reluNanOpt, input
reluCeiling
floating point number to specify the clipping threashod when the activation
mode is set to CUDNN_ACTIVATION_CLIPPED_RELU.
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
The object was set successfully.
CUDNN_STATUS_BAD_PARAM
mode or reluNanOpt has an invalid enumerant value.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|94
4.81.cudnnGetActivationDescriptor
cudnnStatus_t
cudnnGetActivationDescriptor( const cudnnActivationDescriptor_t
activationDesc,
cudnnActivationMode_t
cudnnNanPropagation_t
*reluNanOpt,
double
*reluCeiling )
*mode,
In/
out
activationDescinput
mode
Meaning
Handle to a previously created activation descriptor.
output floating point number to specify the clipping threashod when the activation mode
is set to CUDNN_ACTIVATION_CLIPPED_RELU.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.82.cudnnDestroyActivationDescriptor
cudnnStatus_t
cudnnDestroyActivationDescriptor( cudnnActivationDescriptor_t
activationDesc )
Meaning
CUDNN_STATUS_SUCCESS
4.83.cudnnActivationForward_v3
cudnnStatus_t
cudnnActivationForward_v3( cudnnHandle_t
cudnnActivationMode_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
www.nvidia.com
cuDNN Library
handle,
mode,
*alpha,
xDesc,
*xData,
*beta,
yDesc,
*yData )
DU-06702-001_v5.1|95
This routine applies a specified neuron activation function element-wise over each input
value.
This routine is deprecated, cudnnActivationForward should be used instead.
In-place operation is allowed for this routine; i.e., xData and yData pointers
may be equal. However, this requires xDesc and yDesc descriptors to be identical
(particularly, the strides of the input and output must match for in-place operation to
be allowed).
All tensor formats are supported for 4 and 5 dimensions, however best performance
is obtained when the strides of xDesc and yDesc are equal and HW-packed. For more
than 5 dimensions the tensors must have their spatial dimensions packed.
Param
In/
out
Meaning
handle
input
mode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
xDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor xDesc.
yDesc
input
output Data pointer to GPU memory associated with the output tensor descriptor yDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|96
Return Value
Meaning
CUDNN_STATUS_EXECUTION_FAILED
4.84.cudnnActivationBackward_v3
cudnnStatus_t
cudnnActivationBackward_v3( cudnnHandle_t
cudnnActivationMode_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
mode,
*alpha,
yDesc,
*y,
dyDesc,
*dy,
xDesc,
*x,
*beta,
dxDesc,
*dx );
In/out Meaning
handle
input
mode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the computation
result with prior value in the output layer as follows: dstValue = alpha[0]*result +
beta[0]*priorDstValue. Please refer to this section for additional details.
yDesc
input
input
Data pointer to GPU memory associated with the tensor descriptor yDesc.
dyDesc
input
dy
input
Data pointer to GPU memory associated with the tensor descriptor dyDesc.
xDesc
input
input
Data pointer to GPU memory associated with the output tensor descriptor xDesc.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|97
Param
In/out Meaning
dxDesc
input
dx
output
Data pointer to GPU memory associated with the output tensor descriptor dxDesc.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
4.85.cudnnActivationForward_v4
cudnnStatus_t
cudnnActivationForward_v4( cudnnHandle_t handle,
cudnnActivationDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
activationDesc,
*alpha,
srcDesc,
*srcData,
*beta,
destDesc,
*destData )
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|98
4.86.cudnnActivationBackward_v4
cudnnStatus_t
cudnnActivationBackward_v4( cudnnHandle_t
cudnnActivationDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const cudnnTensorDescriptor_t
const void
const void
const cudnnTensorDescriptor_t
void
handle,
activationDesc,
*alpha,
srcDesc,
*srcData,
srcDiffDesc,
*srcDiffData,
destDesc,
*destData,
*beta,
destDiffDesc,
*destDiffData)
4.87.cudnnCreateLRNDescriptor
cudnnStatus_t cudnnCreateLRNDescriptor( cudnnLRNDescriptor_t* poolingDesc )
This function allocates the memory needed to hold the data needed for LRN and
DivisiveNormalization layers operation and returns a descriptor used with subsequent
layer forward and backward calls.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.88.cudnnSetLRNDescriptor
cudnnStatus_t
CUDNNWINAPI cudnnSetLRNDescriptor( cudnnLRNDescriptor_t
unsigned
double
double
double
normDesc,
lrnN,
lrnAlpha,
lrnBeta,
lrnK );
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|99
Param
In/
out
Meaning
input
Normalization window width in elements. LRN layer uses a window [centerlookBehind, center+lookAhead], where lookBehind = floor( (lrnN-1)/2 ), lookAhead
= lrnN-lookBehind-1. So for n=10, the window is [k-4...k...k+5] with a total of 10
samples. For DivisiveNormalization layer the window has the same extents as above
in all 'spatial' dimensions (dimA[2], dimA[3], dimA[4]). By default lrnN is set to 5 in
cudnnCreateLRNDescriptor.
lrnAlpha
input
Value of the alpha variance scaling parameter in the normalization formula. Inside
the library code this value is divided by the window width for LRN and by (window
width)^#spatialDimensions for DivisiveNormalization. By default this value is set to
1e-4 in cudnnCreateLRNDescriptor.
lrnBeta
input
Value of the beta power parameter in the normalization formula. By default this
value is set to 0.75 in cudnnCreateLRNDescriptor.
lrnK
input
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.89.cudnnGetLRNDescriptor
cudnnStatus_t
CUDNNWINAPI cudnnGetLRNDescriptor( cudnnLRNDescriptor_t
unsigned
double
double
double
normDesc,
*lrnN,
*lrnAlpha,
*lrnBeta,
*lrnK );
This function retrieves values stored in the previously initialized LRN descriptor object.
Param
In/out
Meaning
normDesc
output
lrnN,
lrnAlpha,
lrnBeta, lrnK
output
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|100
4.90.cudnnDestroyLRNDescriptor
cudnnStatus_t cudnnDestroyLRNDescriptor(cudnnLRNDescriptor_t lrnDesc)
Meaning
CUDNN_STATUS_SUCCESS
4.91.cudnnLRNCrossChannelForward
cudnnStatus_t CUDNNWINAPI cudnnLRNCrossChannelForward(
cudnnHandle_t
handle,
cudnnLRNDescriptor_t
normDesc,
cudnnLRNMode_t
lrnMode,
const void*
alpha,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const void
*beta,
const cudnnTensorDescriptor_t
yDesc,
void
*y);
Param
In/
out
Meaning
handle
input
normDesc input
lrnMode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the layer output value
with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue
+ beta[0]*priorDstValue. Please refer to this section for additional details.
xDesc,
yDesc
input
input
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|101
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.92.cudnnLRNCrossChannelBackward
cudnnStatus_t CUDNNWINAPI cudnnLRNCrossChannelBackward(
cudnnHandle_t
handle,
cudnnLRNDescriptor_t
normDesc,
cudnnLRNMode_t
lrnMode,
const void*
alpha,
const cudnnTensorDescriptor_t
yDesc,
const void
*y,
const cudnnTensorDescriptor_t
dyDesc,
const void
*dy,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const void
*beta,
const cudnnTensorDescriptor_t
dxDesc,
void
*dx);
Param
In/
out
Meaning
handle
input
normDesc input
lrnMode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the layer output value with
prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue +
beta[0]*priorDstValue. Please refer to this section for additional details.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|102
Param
In/
out
Meaning
yDesc, y
input
Tensor descriptor and pointer in device memory for the layer's y data.
dyDesc,
dy
input
Tensor descriptor and pointer in device memory for the layer's input cumulative loss
differential data dy (including error backpropagation).
xDesc, x
input
Tensor descriptor and pointer in device memory for the layer's x data. Note that these
values are not modified during backpropagation.
dxDesc,
dx
output Tensor descriptor and pointer in device memory for the layer's resulting cumulative
loss differential data dx (including error backpropagation).
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.93.cudnnDivisiveNormalizationForward
cudnnStatus_t CUDNNWINAPI cudnnDivisiveNormalizationForward(
cudnnHandle_t
handle,
cudnnLRNDescriptor_t
normDesc,
cudnnDivNormMode_t
mode,
const void
*alpha,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const void
*means,
void
*temp,
void
*temp2,
const void
*beta,
const cudnnTensorDescriptor_t
yDesc,
void
*y );
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|103
described in "What is the Best Multi-Stage Architecture for Object Recognition", Jarrett
2009, Local Contrast Normalization Layer section. Note that Divisive Normalization
only implements the x/max(c, sigma_x) portion of the computation, where sigma_x
is the variance over the spatial neighborhood of x. The full LCN (Local Contrastive
Normalization) computation can be implemented as a two-step process:
x_m = x-mean(x);
y = x_m/max(c, sigma(x_m));
The "x-mean(x)" which is often referred to as "subtractive normalization" portion of the
computation can be implemented using cuDNN average pooling layer followed by a call
to addTensor.
Supported tensor formats are NCHW for 4D and NCDHW for 5D with any nonoverlapping non-negative strides. Only 4D and 5D tensors are supported.
Param
In/out
Meaning
handle
input
normDesc
input
divNormMode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the layer output
value with prior value in the destination tensor as follows: dstValue =
alpha[0]*resultValue + beta[0]*priorDstValue. Please refer to this section for
additional details.
xDesc,
yDesc
input
Tensor descriptor objects for the input and output tensors. Note that xDesc is
shared between x, means, temp and temp2 tensors.
input
means
input
Input means tensor data pointer in device memory. Note that this tensor can be
NULL (in that case it's values are assumed to be zero during the computation).
This tensor also doesn't have to contain means, these can be any values, a
frequently used variation is a result of convolution with a normalized positive
kernel (such as Gaussian).
temp,
temp2
workspaceTemporary tensors in device memory. These are used for computing intermediate
values during the forward pass. These tensors do not have to be preserved as
inputs from forward to the backward pass. Both use xDesc as their descriptor.
output
Possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|104
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_UNSUPPORTED
4.94.cudnnDivisiveNormalizationBackward
cudnnStatus_t
CUDNNWINAPI cudnnDivisiveNormalizationBackward(
cudnnHandle_t
handle,
cudnnLRNDescriptor_t
normDesc,
cudnnDivNormMode_t
mode,
const void
*alpha,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const void
*means,
const void
*dy,
void
*temp,
void
*temp2,
const void
*beta,
const cudnnTensorDescriptor_t
dxDesc,
void
*dx,
void
*dMeans );
Param
In/
out
Meaning
handle
input
normDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|105
Param
In/
out
mode
input
alpha,
beta
input
Pointers to scaling factors (in host memory) used to blend the layer output value
with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue
+ beta[0]*priorDstValue. Please refer to this section for additional details.
xDesc, x,
means
input
Tensor descriptor and pointers in device memory for the layer's x and means data.
Note: the means tensor is expected to be precomputed by the user. It can also
contain any valid values (not required to be actual means, and can be for instance
a result of a convolution with a Gaussian kernel).
dy
input
Tensor pointer in device memory for the layer's dy cumulative loss differential data
(error backpropagation).
temp,
temp2
workspace
Temporary tensors in device memory. These are used for computing intermediate
values during the backward pass. These tensors do not have to be preserved from
forward to backward pass. Both use xDesc as a descriptor.
dxDesc
input
dx,
dMeans
output Tensor pointers (in device memory) for the layer's resulting cumulative gradients dx
and dMeans (dLoss/dx and dLoss/dMeans). Both share the same descriptor.
Meaning
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_UNSUPPORTED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|106
4.95.cudnnBatchNormalizationForwardInference
cudnnStatus_t CUDNNWINAPI cudnnBatchNormalizationForwardInference(
cudnnHandle_t
handle,
cudnnBatchNormMode_t
mode,
const void
*alpha,
const void
*beta,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const cudnnTensorDescriptor_t
yDesc,
void
*y,
const cudnnTensorDescriptor_t
bnScaleBiasMeanVarDesc,
const void
*bnScale,
const void
*bnBias,
const void
*estimatedMean,
const void
*estimatedVariance,
double
epsilon );
This function performs the forward BatchNormalization layer computation for inference
phase. This layer is based on the paper "Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariate Shift", S. Ioffe, C. Szegedy, 2015.
Only 4D and 5D tensors are supported.
The input transformation performed by this function is defined as: y := alpha*y + beta
*(bnScale * (x-estimatedMean)/sqrt(epsilon + estimatedVariance)+bnBias)
The epsilon value has to be the same during training, backpropagation and inference.
Much higher performance when HW-packed tensors are used for all of x, dy, dx.
Param
Meaning
handle
mode
alpha, beta
Inputs. Pointers to scaling factors (in host memory) used to blend the
layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. Please refer to
this section for additional details.
xDesc, yDesc, x, y
Tensor descriptors and pointers in device memory for the layer's x and y
data.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|107
Param
Meaning
bnScaleBiasMeanVarDesc,
bnScaleData, bnBiasData
Inputs. Tensor descriptor and pointers in device memory for the batch
normalization scale and bias parameters (in the original paper bias is
referred to as beta and scale as gamma).
estimatedMean,
estimatedVariance
Inputs. Mean and variance tensors (these have the same descriptor
as the bias and scale). It is suggested that resultRunningMean,
resultRunningVariance from the cudnnBatchNormalizationForwardTraining
call accumulated during the training phase are passed as inputs here.
epsilon
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.96.cudnnBatchNormalizationForwardTraining
cudnnStatus_t CUDNNWINAPI cudnnBatchNormalizationForwardTraining(
cudnnHandle_t
handle,
cudnnBatchNormMode_t
mode,
const void
*alpha,
const void
*beta,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const cudnnTensorDescriptor_t
yDesc,
void
*y,
const cudnnTensorDescriptor_t
bnScaleBiasMeanVarDesc,
const void
*bnScale,
const void
*bnBias,
double
exponentialAverageFactor,
void
*resultRunningMean,
void
*resultRunningInvVariance,
double
epsilon,
void
*resultSaveMean,
void
*resultSaveVariance );
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|108
This function performs the forward BatchNormalization layer computation for training
phase.
Only 4D and 5D tensors are supported.
The epsilon value has to be the same during training, backpropagation and inference.
Meaning
handle
mode
alpha, beta
Inputs. Pointers to scaling factors (in host memory) used to blend the
layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. Please refer to
this section for additional details.
xDesc, yDesc, x, y
Tensor descriptors and pointers in device memory for the layer's x and y
data.
bnScaleBiasMeanVarDesc
Shared tensor descriptor desc for all the 6 tensors below in the argument
list. The dimensions for this tensor descriptor are dependent on the
normalization mode.
bnScale, bnBias
Inputs. Pointers in device memory for the batch normalization scale and
bias parameters (in original paper bias is referred to as beta and scale
as gamma). Note that bnBias parameter can replace the previous layer's
bias parameter for improved efficiency.
exponentialAverageFactor
resultRunningMean,
resultRunningVariance
Inputs/outputs. Running mean and variance tensors (these have the same
descriptor as the bias and scale). Both of these pointers can be NULL
but only at the same time. The value stored in resultRunningVariance
(or passed as an input in inference mode) is the moving average of
variance[x] where variance is computed either over batch or spatial
+batch dimensions depending on the mode. If these pointers are not
NULL, the tensors should be initialized to some reasonable values or to 0.
epsilon
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|109
Param
Meaning
resultSaveMean,
resultSaveInvVariance
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|110
4.97.cudnnBatchNormalizationBackward
cudnnStatus_t CUDNNWINAPI cudnnBatchNormalizationBackward(
cudnnHandle_t
handle,
cudnnBatchNormMode_t
mode,
const void
*alphaDataDiff,
const void
*betaDataDiff,
const void
*alphaParamDiff,
const void
*betaParamDiff,
const cudnnTensorDescriptor_t
xDesc,
const void
*x,
const cudnnTensorDescriptor_t
dyDesc,
const void
*dy,
const cudnnTensorDescriptor_t
dxDesc,
void
*dx,
const cudnnTensorDescriptor_t
bnScaleBiasDiffDesc,
const void
*bnScale,
void
*resultBnScaleDiff,
void
*resultBnBiasDiff,
double
epsilon,
const void
*savedMean,
const void
*savedInvVariance
);
The epsilon value has to be the same during training, backpropagation and inference.
Much higher performance when HW-packed tensors are used for all of x, dy, dx.
Param
Meaning
handle
mode
alphaDataDiff,
betaDataDiff
Inputs. Pointers to scaling factors (in host memory) used to blend the gradient
output dx with a prior value in the destination tensor as follows: dstValue =
alpha[0]*resultValue + beta[0]*priorDstValue. Please refer to this section for
additional details.
alphaParamDiff,
betaParamDiff
Inputs. Pointers to scaling factors (in host memory) used to blend the gradient
outputs dBnScaleResult and dBnBiasResult with prior values in the destination
tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue.
Please refer to this section for additional details.
Tensor descriptors and pointers in device memory for the layer's x data,
backpropagated differential dy (inputs) and resulting differential with respect
to x, dx (output).
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|111
Param
Meaning
bnScaleBiasDiffDesc
Shared tensor descriptor for all the 5 tensors below in the argument list
(bnScale, resultBnScaleDiff, resultBnBiasDiff, savedMean, savedInvVariance).
The dimensions for this tensor descriptor are dependent on normalization
mode. Note: The data type of this tensor descriptor must be 'float' for FP16
and FP32 input tensors, and 'double' for FP64 input tensors.
bnScale
Input. Pointers in device memory for the batch normalization scale parameter
(in original paper bias is referred to as gamma). Note that bnBias parameter is
not needed for this layer's computation.
resultBnScaleDiff,
resultBnBiasDiff
Outputs. Pointers in device memory for the resulting scale and bias
differentials computed by this routine. Note that scale and bias gradients are
not backpropagated below this layer (since they are dead-end computation
DAG nodes).
epsilon
savedMean,
savedInvVariance
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.98.cudnnDeriveBNTensorDescriptor
cudnnStatus_t CUDNNWINAPI cudnnDeriveBNTensorDescriptor(
cudnnTensorDescriptor_t derivedBnDesc,
const cudnnTensorDescriptor_t xDesc,
cudnnBatchNormMode_t mode);
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|112
xDesc is the descriptor for the layer's x data and has to be setup with proper
dimensions prior to calling this function.
Param
In/out
Meaning
derivedBnDescoutput
xDesc
input
mode
input
Possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.99.cudnnCreateRNNDescriptor
cudnnStatus_t cudnnCreateRNNDescriptor(cudnnRNNDescriptor_t * rnnDesc)
This function creates a generic RNN descriptor object by allocating the memory needed
to hold its opaque structure.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.100.cudnnDestroyRNNDescriptor
cudnnStatus_t cudnnDestroyRNNDescriptor(cudnnRNNDescriptor_t rnnDesc)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|113
Meaning
CUDNN_STATUS_SUCCESS
4.101.cudnnSetRNNDescriptor
cudnnStatus_t
cudnnSetRNNDescriptor( cudnnRNNDescriptor_t rnnDesc,
int hiddenSize,
int numLayers,
cudnnDropoutDescriptor_t dropoutDesc,
cudnnRNNInputMode_t inputMode,
cudnnDirectionMode_t direction,
cudnnRNNMode_t mode,
cudnnDataType_t dataType )
In/out
Meaning
rnnDesc
input/
output
hiddenSize
input
numLayers
input
dropoutDesc
input
inputMode
input
direction
input
mode
input
dataType
input
Math precision.
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|114
4.102.cudnnGetRNNWorkspaceSize
cudnnStatus_t
cudnnGetRNNWorkspaceSize( cudnnHandle_t
const cudnnRNNDescriptor_t
const int seqLength,
const cudnnTensorDescriptor_t
size_t
handle,
rnnDesc,
*xDesc,
*sizeInBytes)
This function is used to query the amount of work space required to execute the RNN
described by rnnDesc with inputs dimensions defined by xDesc.
Param
In/out
Meaning
handle
input
rnnDesc
input
seqLength
input
xDesc
input
sizeInBytes
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|115
4.103.cudnnGetRNNTrainingReserveSize
cudnnStatus_t
cudnnGetRNNTrainingReserveSize( cudnnHandle_t
const cudnnRNNDescriptor_t
const int seqLength,
const cudnnTensorDescriptor_t
size_t
handle,
rnnDesc,
*xDesc,
*sizeInBytes)
This function is used to query the amount of reserved space required for training
the RNN described by rnnDesc with inputs dimensions defined by xDesc.
The same reserve space must be passed to cudnnRNNForwardTraining,
cudnnRNNBackwardData and cudnnRNNBackwardWeights. Each of these calls
overwrites the contents of the reserve space, however it can safely be copied if reuse is
required.
Param
In/out
Meaning
handle
input
rnnDesc
input
seqLength
input
xDesc
input
sizeInBytes
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|116
4.104.cudnnGetRNNParamsSize
cudnnStatus_t
cudnnGetRNNParamsSize( cudnnHandle_t
const cudnnRNNDescriptor_t
const cudnnTensorDescriptor_t
size_t
cudnnDataType_t dataType)
handle,
rnnDesc,
xDesc,
*sizeInBytes,
This function is used to query the amount of parameter space required to execute the
RNN described by rnnDesc with inputs dimensions defined by xDesc.
Param
In/out
Meaning
handle
input
rnnDesc
input
xDesc
input
sizeInBytes
output
dataType
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_NOT_SUPPORTED
4.105.cudnnGetRNNLinLayerMatrixParams
cudnnStatus_t
cudnnGetRNNLinLayerMatrixParams( cudnnHandle_t
handle,
const cudnnRNNDescriptor_t rnnDesc,
const int layer,
const cudnnTensorDescriptor_t xDesc,
const cudnnFilterDescriptor_t wDesc,
const void * w,
const int linLayerID,
cudnnFilterDescriptor_t linLayerMatDesc,
void ** linLayerMat)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|117
This function is used to obtain a pointer and descriptor for the matrix parameters in
layer within the RNN described by rnnDesc with inputs dimensions defined by
xDesc.
Param
In/out
Meaning
handle
input
rnnDesc
input
layer
input
xDesc
input
wDesc
input
input
linLayerID
input
output
linLayerMat
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|118
Return Value
Meaning
4.106.cudnnGetRNNLinLayerBiasParams
cudnnStatus_t
cudnnGetRNNLinLayerBiasParams( cudnnHandle_t
handle,
const cudnnRNNDescriptor_t rnnDesc,
const int layer,
const cudnnTensorDescriptor_t xDesc,
const cudnnFilterDescriptor_t wDesc,
const void * w,
const int linLayerID,
cudnnFilterDescriptor_t linLayerBiasDesc,
void ** linLayerBias
This function is used to obtain a pointer and descriptor for the bias parameters in layer
within the RNN described by rnnDesc with inputs dimensions defined by xDesc.
Param
In/out
Meaning
handle
input
rnnDesc
input
layer
input
xDesc
input
wDesc
input
input
linLayerID
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|119
Param
In/out
Meaning
output
linLayerBias
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.107.cudnnRNNForwardInference
cudnnStatus_t
cudnnRNNForwardInference( cudnnHandle_t handle,
const cudnnRNNDescriptor_t rnnDesc,
const int seqLength,
const cudnnTensorDescriptor_t * xDesc,
const void * x,
const cudnnTensorDescriptor_t hxDesc,
const void * hx,
const cudnnTensorDescriptor_t cxDesc,
const void * cx,
const cudnnFilterDescriptor_t wDesc,
const void * w,
const cudnnTensorDescriptor_t *yDesc,
void * y,
const cudnnTensorDescriptor_t hyDesc,
void * hy,
const cudnnTensorDescriptor_t cyDesc,
void * cy,
void * workspace,
size_t workSpaceSizeInBytes)
This routine executes the recurrent neural network described by rnnDesc with
inputs x, hx, cx, weights w and outputs y, hy, cy. workspace is required
for intermediate storage. This function does not store data required for training;
cudnnRNNForwardTraining should be used for that purpose.
Param
In/out
Meaning
handle
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|120
Param
In/out
Meaning
rnnDesc
input
seqLength
input
xDesc
input
input
hxDesc
input
input
cxDesc
input
A fully packed tensor descriptor describing the initial cell state for
LSTM networks. The third dimension of the tensor depends on the
direction argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc:
input
wDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|121
Param
In/out
Meaning
input
yDesc
input
The first dimension of the tensor n must match the first dimension
of the tensor n in xDesc.
y
output
hyDesc
input
output
cyDesc
input
A fully packed tensor descriptor describing the final cell state for
LSTM networks. The third dimension of the tensor depends on the
direction argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc:
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|122
Param
In/out
Meaning
numLayers argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc. The tensor must be fully packed.
cy
output
workspace
input
workSpaceSizeInBytes
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
CUDNN_STATUS_ALLOC_FAILED
4.108.cudnnRNNForwardTraining
cudnnStatus_t
cudnnRNNForwardTraining( cudnnHandle_t handle,
const cudnnRNNDescriptor_t rnnDesc,
const int seqLength,
const cudnnTensorDescriptor_t *xDesc,
const void * x,
const cudnnTensorDescriptor_t hxDesc,
const void * hx,
const cudnnTensorDescriptor_t cxDesc,
const void * cx,
const cudnnFilterDescriptor_t wDesc,
const void * w,
const cudnnTensorDescriptor_t *yDesc,
void * y,
const cudnnTensorDescriptor_t hyDesc,
void * hy,
const cudnnTensorDescriptor_t cyDesc,
void * cy,
void * workspace,
size_t workSpaceSizeInBytes,
void * reserveSpace,
size_t reserveSpaceSizeInBytes)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|123
This routine executes the recurrent neural network described by rnnDesc with
inputs x, hx, cx, weights w and outputs y, hy, cy. workspace is required for
intermediate storage. reserveSpace stores data required for training. The same
reserveSpace data must be used for future calls to cudnnRNNBackwardData and
cudnnRNNBackwardWeights if these execute on the same input data.
Param
In/out
Meaning
handle
input
rnnDesc
input
xDesc
input
seqLength
input
input
hxDesc
input
input
cxDesc
input
A fully packed tensor descriptor describing the initial cell state for
LSTM networks. The third dimension of the tensor depends on the
direction argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc:
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|124
Param
In/out
Meaning
cx
input
wDesc
input
input
yDesc
input
The first dimension of the tensor n must match the first dimension
of the tensor n in xDesc.
y
output
hyDesc
input
output
cyDesc
input
A fully packed tensor descriptor describing the final cell state for
LSTM networks. The third dimension of the tensor depends on the
direction argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc:
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|125
Param
In/out
Meaning
output
workspace
input
workSpaceSizeInBytes
input
reserveSpace
input/
output
reserveSpaceSizeInBytes input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
CUDNN_STATUS_ALLOC_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|126
4.109.cudnnRNNBackwardData
cudnnStatus_t
cudnnRNNBackwardData( cudnnHandle_t handle,
const cudnnRNNDescriptor_t rnnDesc,
const int seqLength,
const cudnnTensorDescriptor_t * yDesc,
const void * y,
const cudnnTensorDescriptor_t * dyDesc,
const void * dy,
const cudnnTensorDescriptor_t dhyDesc,
const void * dhy,
const cudnnTensorDescriptor_t dcyDesc,
const void * dcy,
const cudnnFilterDescriptor_t wDesc,
const void * w,
const cudnnTensorDescriptor_t hxDesc,
const void * hx,
const cudnnTensorDescriptor_t cxDesc,
const void * cx,
const cudnnTensorDescriptor_t * dxDesc,
void * dx,
const cudnnTensorDescriptor_t dhxDesc,
void * dhx,
const cudnnTensorDescriptor_t dcxDesc,
void * dcx,
void * workspace,
size_t workSpaceSizeInBytes,
const void * reserveSpace,
size_t reserveSpaceSizeInBytes )
This routine executes the recurrent neural network described by rnnDesc with output
gradients dy, dhy, dhc, weights w and input gradients dx, dhx, dcx. workspace
is required for intermediate storage. The data in reserveSpace must have previously
been generated by cudnnRNNForwardTraining. The same reserveSpace data must be
used for future calls to cudnnRNNBackwardWeights if they execute on the same input
data.
Param
In/out
Meaning
handle
input
rnnDesc
input
seqLength
input
yDesc
input
The first dimension of the tensor n must match the first dimension
of the tensor n in dyDesc.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|127
Param
In/out
Meaning
input
dyDesc
input
input
dhyDesc
input
input
dcyDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|128
Param
In/out
Meaning
dcy
input
wDesc
input
input
hxDesc
input
input
cxDesc
input
A fully packed tensor descriptor describing the initial cell state for
LSTM networks. The third dimension of the tensor depends on the
direction argument passed to the cudnnSetRNNDescriptor call
used to initialize rnnDesc:
input
dxDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|129
Param
In/out
Meaning
dx
output
dhxDesc
input
output
dcxDesc
input
output
workspace
input
workSpaceSizeInBytes
input
reserveSpace
input/
output
reserveSpaceSizeInBytes input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|130
Return Value
Meaning
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
CUDNN_STATUS_ALLOC_FAILED
4.110.cudnnRNNBackwardWeights
cudnnStatus_t
cudnnRNNBackwardWeights( cudnnHandle_t handle,
const cudnnRNNDescriptor_t rnnDesc,
const int seqLength,
const cudnnTensorDescriptor_t * xDesc,
const void * x,
const cudnnTensorDescriptor_t hxDesc,
const void * hx,
const cudnnTensorDescriptor_t * yDesc,
const void * y,
const void * workspace,
size_t workSpaceSizeInBytes,
const cudnnFilterDescriptor_t dwDesc,
void * dw,
const void * reserveSpace,
size_t reserveSpaceSizeInBytes )
This routine accumulates weight gradients dw from the recurrent neural network
described by rnnDesc with inputs x, hx, and outputs y. The mode of operation in this
case is additive, the weight gradients calculated will be added to those already existing
in dw. workspace is required for intermediate storage. The data in reserveSpace must
have previously been generated by cudnnRNNBackwardData.
Param
In/out
Meaning
handle
input
rnnDesc
input
seqLength
input
xDesc
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|131
Param
In/out
Meaning
input
hxDesc
input
input
yDesc
input
The first dimension of the tensor n must match the first dimension
of the tensor n in dyDesc.
y
input
workspace
input
workSpaceSizeInBytes
input
dwDesc
input
dw
input/
output
reserveSpace
input
reserveSpaceSizeInBytes input
The possible error values returned by this function and their meanings are listed below.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|132
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
CUDNN_STATUS_ALLOC_FAILED
4.111.cudnnCreateDropoutDescriptor
cudnnStatus_t cudnnCreateDropoutDescriptor(cudnnRNNDescriptor_t * rnnDesc)
This function creates a generic dropout descriptor object by allocating the memory
needed to hold its opaque structure.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.112.cudnnDestroyDropoutDescriptor
cudnnStatus_t cudnnDestroyDropoutDescriptor(cudnnDropoutDescriptor_t rnnDesc)
Meaning
CUDNN_STATUS_SUCCESS
4.113.cudnnDropoutGetStatesSize
cudnnStatus_t
cudnnDropoutGetStatesSize( cudnnHandle_t handle,
size_t * sizeInBytes);
This function is used to query the amount of space required to store the states of the
random number generators used by cudnnDropoutForward function.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|133
Param
In/out
Meaning
handle
input
sizeInBytes
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.114.cudnnDropoutGetReserveSpaceSize
cudnnStatus_t
cudnnDropoutGetReserveSpaceSize( cudnnTensorDescriptor_t xDesc,
size_t * sizeInBytes);
This function is used to query the amount of reserve needed to run dropout with the
input dimensions given by xDesc. The same reserve space is expected to be passed to
cudnnDropoutForward and cudnnDropoutBackward, and its contents is expected
to remain unchanged between cudnnDropoutForward and cudnnDropoutBackward
calls.
Param
In/out
Meaning
xDesc
input
sizeInBytes
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
4.115.cudnnSetDropoutDescriptor
cudnnStatus_t
cudnnSetDropoutDescriptor( cudnnDropoutDescriptor_t dropoutDesc,
cudnnHandle_t handle,
float dropout,
void * states,
size_t stateSizeInBytes,
unsigned long long seed)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|134
pointed at by states argument while this function is running. The user is expected not
to change memory pointed at by states for the duration of the computation.
Param
In/out
Meaning
dropoutDesc
input/
output
handle
input
dropout
input
states
output
sizeInBytes
input
seed
input
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_INVALID_VALUE
CUDNN_STATUS_EXECUTION_FAILED
4.116.cudnnDropoutForward
cudnnStatus_t
cudnnDropoutForward( cudnnHandle_t handle,
const cudnnDropoutDescriptor_t dropoutDesc,
const cudnnTensorDescriptor_t xdesc,
const void * x,
const cudnnTensorDescriptor_t ydesc,
void * y,
void * reserveSpace,
size_t reserveSpaceSizeInBytes)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|135
Param
In/out
Meaning
handle
input
dropoutDesc
input
xDesc
input
input
yDesc
input
output
reserveSpace
output
reserveSpaceSizeInBytes input
Specifies size in bytes of the provided memory for the reserve space
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
4.117.cudnnDropoutBackward
cudnnStatus_t
cudnnDropoutBackward( cudnnHandle_t handle,
const cudnnDropoutDescriptor_t dropoutDesc,
const cudnnTensorDescriptor_t dydesc,
const void * dy,
const cudnnTensorDescriptor_t dxdesc,
void * dx,
void * reserveSpace,
size_t reserveSpaceSizeInBytes)
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|136
This function performs backward dropout operation over dy returning results in dx.
If during forward dropout operation value from x was propagated to y then during
backward operation value from dy will be propagated to dx, otherwise, dx value will be
set to 0.
Better performance is obtained for fully packed tensors
Param
In/out
Meaning
handle
input
dropoutDesc
input
dyDesc
input
dy
input
dxDesc
input
dx
output
reserveSpace
input
reserveSpaceSizeInBytes input
Specifies size in bytes of the provided memory for the reserve space
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|137
4.118.cudnnCreateSpatialTransformerDescriptor
cudnnStatus_t
cudnnCreateSpatialTransformerDescriptor(
cudnnSpatialTransformerDescriptor_t *stDesc)
This function creates a generic spatial transformer descriptor object by allocating the
memory needed to hold its opaque structure.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_ALLOC_FAILED
4.119.cudnnDestroySpatialTransformerDescriptor
cudnnStatus_t
cudnnDestroySpatialTransformerDescriptor(
cudnnSpatialTransformerDescriptor_t stDesc)
Meaning
CUDNN_STATUS_SUCCESS
4.120.cudnnSetSpatialTransformerNdDescriptor
cudnnStatus_t
cudnnSetSpatialTransformerNdDescriptor(
cudnnSpatialTransformerDescriptor_t
cudnnSamplerType_t
cudnnDataType_t
const int
const int
stDesc,
samplerType,
dataType,
nbDims,
dimA[]);
In/out
Meaning
stDesc
input/
output
samplerType
input
dataType
input
Data type.
nbDims
input
dimA
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|138
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
4.121.cudnnSpatialTfGridGeneratorForward
cudnnStatus_t
cudnnSpatialTfGridGeneratorForward(
cudnnHandle_t
const cudnnSpatialTransformerDescriptor_t
const void*
void*
handle,
stDesc,
theta,
grid)
This function generates a grid of coordinates in the input tensor corresponding to each
pixel from the output tensor.
Only 2d transformation is supported.
Param
In/out
Meaning
handle
input
stDesc
input
theta
input
grid
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
handle is NULL.
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|139
4.122.cudnnSpatialTfGridGeneratorBackward
cudnnStatus_t
cudnnSpatialTfGridGeneratorBackward(
cudnnHandle_t
const cudnnSpatialTransformerDescriptor_t
const void*
void*
handle,
stDesc,
dgrid,
dtheta)
In/out
Meaning
handle
input
stDesc
input
dgrid
input
dtheta
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
handle is NULL.
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
4.123.cudnnSpatialTfSamplerForward
cudnnStatus_t
cudnnSpatialTfSamplerForward(
cudnnHandle_t
const cudnnSpatialTransformerDescriptor_t
const void*
const cudnnTensorDescriptor_t
const void*
const void*
const void*
cudnnTensorDescriptor_t
void*
www.nvidia.com
cuDNN Library
handle,
stDesc,
alpha,
xDesc,
x,
grid,
beta,
yDesc,
y)
DU-06702-001_v5.1|140
This function performs a sampler operation and generates the output tensor using the
grid given by the grid generator.
Only 2d transformation is supported.
Param
In/out
Meaning
handle
input
stDesc
input
alpha,beta
input
xDesc
input
input
grid
input
yDesc
input
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
handle is NULL.
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|141
4.124.cudnnSpatialTfSamplerBackward
cudnnStatus_t
cudnnSpatialTfSamplerBackward(
cudnnHandle_t
const cudnnSpatialTransformerDescriptor_t
const void*
const cudnnTensorDescriptor_t
const void*
const void*
const cudnnTensorDescriptor_t
void*
const void*
const cudnnTensorDescriptor_t
const void*
const void*
const void*
void*
handle,
stDesc,
alpha,
xDesc,
x,
beta,
dxDesc,
dx,
alphaDgrid,
dyDesc,
dy,
grid,
betaDgrid,
dgrid)
In/out
Meaning
handle
input
stDesc
input
alpha,beta
input
xDesc
input
input
dxDesc
input
dx
output
alphaDgrid,betaDgrid
input
dyDesc
input
dy
input
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|142
Param
In/out
Meaning
grid
input
dgrid
output
The possible error values returned by this function and their meanings are listed below.
Return Value
Meaning
CUDNN_STATUS_SUCCESS
CUDNN_STATUS_BAD_PARAM
handle is NULL.
CUDNN_STATUS_NOT_SUPPORTED
CUDNN_STATUS_EXECUTION_FAILED
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|143
Chapter5.
ACKNOWLEDGMENTS
Some of the cuDNN library routines were derived from code developed by others and
are subject to the following:
5.1.University of Tennessee
Copyright (c) 2010 The University of Tennessee.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer listed in this license in the documentation and/or
other materials provided with the distribution.
* Neither the name of the copyright holders nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|144
Acknowledgments
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|145
Acknowledgments
www.nvidia.com
cuDNN Library
DU-06702-001_v5.1|146
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,
"MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE
MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR
PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA
Corporation assumes no responsibility for the consequences of use of such
information or for any infringement of patents or other rights of third parties
that may result from its use. No license is granted by implication of otherwise
under any patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and
replaces all other information previously supplied. NVIDIA Corporation products
are not authorized as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA
Corporation in the U.S. and other countries. Other company and product names
may be trademarks of the respective companies with which they are associated.
Copyright
2007-2016 NVIDIA Corporation. All rights reserved.
www.nvidia.com