I have a deep learning framework which uses 11G out of the 12G Tesla K80. To understand this I collected nvprof with --print-gpu-trace. The nvprof only shows 4 cuda memcpys of each 4Bytes.
What would be the best tool or option to trace this memory allocation on the GPU ?