How To Run CUDA C
How To Run CUDA C
What is CUDA?
CUDA is a model created by Nvidia for parallel computing platform and application
programming interface. CUDA is the parallel computing architecture of NVIDIA which
allows for dramatic increases in computing performance by harnessing the power of the
GPU.
What is Google Colab?
Google Colab is a free cloud service and the most important feature able to distinguish
Colab from other free cloud services is; Colab offers GPU and is completely free! With
Colab you can work on the GPU with CUDA C/C++ for free!
Want to learn from the best curated videos and practice problems, check out the C
Foundation Course for Basic to Advanced C.
CUDA code will not run on AMD CPU or Intel HD graphics unless you have NVIDIA
hardware inside your machine.On Colab you can take advantage of Nvidia GPU as well
as being a fully functional Jupyter Notebook with pre-installed Tensorflow and some
other ML/DL tools.
I guess, I’m done with the introduction. Let’s configure our learning environment.
Step 1: Go to https://colab.research.google.com in Browser and Click on New
Notebook.
Step 2: We need to switch our runtime from CPU to GPU. Click on Runtime >
Change runtime type > Hardware Accelerator > GPU > Save.
Step 3: Completely uninstall any previous CUDA versions.We need to refresh
the Cloud Instance of CUDA.
Step 4: Install CUDA Version 9 (You can just copy it in separate code block).
!wget
https://developer.nvidia.com/compute/cuda/9.2/Prod/local_installers/cu
da-repo-ubuntu1604-9-2-local_9.2.88-1_amd64 -O cuda-repo-ubuntu1604-9-
2-local_9.2.88-1_amd64.deb
!dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
!apt-get update
!apt-get install cuda-9.2
Step 5: Now you can check your CUDA installation by running the command
given below :
!nvcc --version
Output will be something like this:
vcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88
Step 6: Run the given command to install a small extension to run nvcc from
the Notebook cells.
!pip install git+git://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin
Step 8: Execute the code given below to check if CUDA is working or not.
Now we are ready to run CUDA C/C++ code right in your Notebook.
Important Note: To check the following code is working or not, write that code in a
separate code block and Run that only again when you update the code and re running it.
To run the code in your notebook, add the %%cu extension at the beginning of your code.
% % cu
#include <iostream>
int
main()
{
std::cout << "Welcome To GeeksforGeeks\n";
return 0;
}
Output:
Welcome To GeeksforGeeks
I suggests you to try program of find maximum element from vector to check that
everything works properly.
% % cu
#include <cstdio>
#include <iostream>
using namespace std;
__global__ void maxi(int* a, int* b, int n)
{
int block = 256 * blockIdx.x;
int max = 0;
for (int i = block; i < min(256 + block, n); i++) {
if (max < a[i]) {
max = a[i];
}
}
b[blockIdx.x] = max;
}
int main()
{
int n;
n = 3 >> 2;
int a[n];
for (int i = 0; i < n; i++) {
a[i] = rand() % n;
cout << a[i] << "\t";
}
cudaEvent_t start, end;
int *ad, *bd;
int size = n * sizeof(int);
cudaMalloc(&ad, size);
cudaMemcpy(ad, a, size, cudaMemcpyHostToDevice);
int grids = ceil(n * 1.0f / 256.0f);
cudaMalloc(&bd, grids * sizeof(int));
dim3 grid(grids, 1);
dim3 block(1, 1);
cudaEventCreate(&start);
cudaEventCreate(&end);
cudaEventRecord(start);
while (n > 1) {
maxi<<<grids, block> > >(ad, bd, n);
n = ceil(n * 1.0f / 256.0f);
cudaMemcpy(ad, bd, n * sizeof(int),
cudaMemcpyDeviceToDevice);
}
cudaEventRecord(end);
cudaEventSynchronize(end);
float time = 0;
cudaEventElapsedTime(&time, start, end);
int ans[2];
cudaMemcpy(ans, ad, 4, cudaMemcpyDeviceToHost);
cout << "The maximum element is : " << ans[0] << endl;
cout << "The time required : ";
cout << time << endl;
}
Output:
The maximum element is : 1338278816
The time required : 0.003392