0% found this document useful (0 votes)

181 views18 pages

Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA

The document describes a project to compare CPU and GPU performance for calculating the sum of prime numbers. It includes: 1) An overview of the research subjects including parallel processing, C++ programming, and CUDA programming. 2) Work assignments for group members on different aspects of the project. 3) Code examples for finding prime numbers and calculating their sum on the CPU and GPU, including flowcharts showing the parallel processing approach on the GPU.

Uploaded by

Huy Huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views18 pages

Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA

Uploaded by

Huy Huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

HO CHI MINH UNIVERSITY OF

TECHNOLOGY AND EDUCATION

FACULTY FOR HIGH QUALITY TRAINING

COMPUTER ORGANIZATION AND

ARCHITECTURE
***
FINAL ESSAY

PROJECT: THE GPU PROGRAMMING PARALLEL

WITH CUDA CORE SUPPORT

NGÔ TIẾN TÚ 20119175

LÊ TRỌNG HOÀNG 20119132
DƯƠNG HOÀNG GIA 20119129
NGUYỄN THỊ LÂM TRÚC 20119172
HỒ NGUYỄN MINH THƯ 20119166

Ho Chi Minh City, …., ………2022

1
Content
1. Introduction...............................................................................................................1
2. Overview...................................................................................................................2
2.1. Research subjects:............................................................................................2
2.2. Support Tools...................................................................................................2
2.3. Work Assignment.............................................................................................3
3. Sum of prime number................................................................................................4
3.1 Flowchart...........................................................................................................4
3.2. Coding..............................................................................................................5
3.3 Result.................................................................................................................9
4.Image processing problem........................................................................................10
4.1 Flowchart.........................................................................................................10
4.2 Coding.............................................................................................................11
4.3 Result...............................................................................................................13
5. CONCLUSIONS AND FUTURE WORK..............................................................14
6. References...............................................................................................................15

1
1. Introduction
The computer's invention paved the way for the digital age, and its
applications in computing and analyzing image data remain in high demand
today.
We learned about the architecture of a computer, including components such
as the Central Processing Unit (CPU), Arithmetic Logic Unit (ALU), Memory ,
I/O, Bus system, and the like, in the subject of Computer Architecture and
Organization. So, how can parallel computing processors be used to compare
and calculate data?
To come up with a solution, the team conducted research and discovered that
when programming on a computer, there is a difference in speed between
calculations or processing because the processor computes parallel between the
GPU and the CPU. To see the difference, the group studied programming
languages such as CUDA, C++, and others, as well as providing examples of
computation and image processing.
Since its original announcement in 2007, the Unified Computing Device
Architecture (CUDA) has evolved into the de facto standard for using Graphics
Computing Units (GPUs) for non-graphical applications. NVIDIA's CUDA is a
widely used parallel computing platform and programming model. It can only
be used with NVIDIA GPUs. OpenCL is a more complex version of CUDA that
is used to write parallel code for other types of GPUs such as AMD and Intel.
With simple programming APIs, CUDA allows the creation of batch parallel
applications that run on graphics processing units (GPUs). CUDA C or C++
allows C and C++ software developers to accelerate their applications and take
advantage of GPU power. CUDA programs are similar to plain C or C++
programs with the addition of keywords to take advantage of GPUS parallelism.
As a result, the team decided to research Programming on GPU Parallel
Processors with CUDA Core Support in order to gain a better insight into the
problem.
1
2. Overview

2.1. Research subjects:

- Parallel Processing is a computing technique that uses two or more processors
(CPUs) to handle different parts of a larger task. The amount of time it takes to
run a program can be reduced by splitting up different parts of a task across
multiple processors.
- C++ programming is an object-oriented programming language developed by
renowned computer scientist Bjorne Stroustrop as part of the C language
family's evolution. It was created as a cross-platform enhancement to C that
would give developers more control over memory and system resources.
- CUDA programming is a parallel computing platform and application
programming interface (API) that enables software to use specific types of
graphics processing units (GPUs) for general-purpose processing, a technique
known as general-purpose computing on GPUs (GPGPU).

2.2. Support Tools

- Microsoft Visual Studio is an integrated development environment (IDE)
developed by Microsoft for various types of software development,
including computer programs, websites, web apps, web services, and
mobile apps. Completion tools, compilers, and other features are included
to make the software development process easier.

2
2.3. Work Assignment

Works
Lê Trọng Dương Nguyễn Thị Hồ Nguyễn
Members Ngô Tiến Tú
Hoàng Hoàng Gia Lâm Trúc Minh Thư

Overview of
the Cuda ü ü
programming

The
functioning
ü ü ü
of parallel
processors

Program ü ü ü

Powerpoint ü ü

3
Report ü ü ü ü ü

4
3. Sum of prime number
3.1 Flowchart

Figure 1: Sum of prime number problem’s flowchart

- Accoding to flowchart, we design 2 codes for CPU and GPU

5
3.2. Coding

void FPN(int *s, int n)//Find prime numbers

{
int i, j;
bool t;
for (i = 2; i <= n; i++)
{
t = true;
for (j = 2; j < i; j++)
{
if (i % j == 0)
{
t = false;
break;
}
}
if (t == true)
{
s[i] = i;
}
else
{
s[i] = 0;
}
}
}
void ttcpu(int *s, int *tt, int n)//Calculate the sum of prime numbers
{
int i;
tt[0] = 0;
for (i = 2; i <= n; i++)
{
tt[0] = tt[0] + s[i];
}
}
int main(void)
{
//Introduce program
printf("Program : Find the sum of prime numbers from 1 to n using CPU \n\n");
//Declare variable
int n, i, *s, start, end, *tt;
double time_use;
//Enter the value n (1000 < n < 9999)
printf_s("Enter n : \nn=");
scanf_s("%d", &n);
//Memory allocation
s = (int*)malloc((n + 1) * sizeof(int));
tt = (int*)malloc(sizeof(int));
//Start recording time
start = clock();
//Find primes and calculate the sum of primes from 1 to n
FPN(s, n);
ttcpu(s, tt, n);
//Finish recording time
end = clock();
//Find time CPU used
time_use = (double)(end - start) / CLOCKS_PER_SEC;
//Print results
printf("Total number of primes from 1 to %d : %d\n", n, tt[0]);

6
printf_s("Time used : %lfs\n", time_use);
//Release memory
free(s);
free(tt);
return(0);

7
Figure 2: Code for sum of prime number on CPU

global void FPN(int *s, int n)//Find prime numbers

{
int i = blockIdx.x * blockDim.x + threadIdx.x, j;
bool t;
if ((i >= 2) && (i <=n))
{
if (i == 2)
{
t = true;
}
else
{
for (j = 2; j < i; j++)
{
t = true;
if (i % j == 0)
{
t = false;
break;
}
}
}
if (t == true)
{

s[i] = i;
}
else
{
s[i] = 0;
}
}
}
__global__ void ttgpu(int *s, int *tt, int n)//Calculate the sum of prime numbers
{
int i;
tt[0] = 0;
for (i = 2; i <= n; i++)
{
tt[0] = tt[0] + s[i];
}
}
int main(void)
{
//Introduce program
printf("Program : Find the sum of prime numbers from 1 to n using GPU \n\n");
//Declare variable
int n, i, block, * s, * a, * tt;
clock_t start, end;
double time_use;
//Enter the value n (1000 < n < 9999)
printf_s("Enter n : \nn=");
scanf_s("%d", &n);
//Memory allocation
cudaMallocManaged((void**)&s, (n + 1) * sizeof(int));
cudaMallocManaged((void**)&tt, sizeof(int));
//Start recording time
start = clock();
//Find primes and calculate the sum of primes from 1 to n

8
FPN <<<n+1, 1 >>> (s, n);
ttgpu <<<1, 1 >>> (s, tt, n);
cudaDeviceSynchronize();
//Finish recording timeinish recording time
end = clock();
//Find time GPU used
time_use = (double)(end - start) / CLOCKS_PER_SEC;
//Print results
printf_s("Total number of primes from 1 to %d : %d\n", n, tt[0]);
printf_s("Time used : %lfs\n",time_use);
//Release memory
cudaFree(s);
cudaFree(tt);
return;
}

9
Figure 3: Code for sum of prime number on GPU

3.3 Result

Figure 4: The sum of prime numbers programing using CPU from 1 to 9173

Figure 5: The sum of prime numbers programing using GPU from 1 to 9173
Two above figures show the execution time of CPU and GPU when each
of them running the program (Sum of prime numbers from 1 to n – where n is
entered by user). The result show that the execution time of CPU is lower than
GPU. Therefore, we can conclude that for this problem, CPU is better than GPU
in calculation.

10
Figure 6: GPU is used when program running

4.Image processing problem

4.1 Flowchart
We design the flowchart like this, the first step is to read the picture from
storage. After that, the program will get the properties of this picture. It will
process each pixel from begin to end. In this case, the program will increase
color red. This image using in this problem is ppm format, so it has 8 bits to
represent for red color. If the pixel is processing have red properties plus 50
greater than 255, it will have the value is 255 and in contrast, it will have the
value of it plus 50.
Following the flowchart, we write 2 programs that can do similar things to get
the same result. The first one is the code on CPU and the other one is on GPU.

11
Figure 7: The flowchart of image process

4.2 Coding

int main()
{
clock_t start, end;
double time_use; // Time usage

ifstream image;
ofstream newimage;
image.open("apollo.ppm");
newimage.open("newimage.ppm");
start = clock(); // The initial time
//copy over header information
string type = "", width = "", heigh = "", RGB = "";

12
image >> type;
image >> width;
image >> heigh;
image >> RGB;

newimage << type << endl;

newimage << width << " " << heigh << endl;
newimage << RGB << endl;

//cout << type << width << heigh << RGB << endl;

string red = "", green = "", blue = "";

int r = 0, g = 0, b = 0;
while (image.eof()==false)
{
image >> red;
image >> green;
image >> blue;

stringstream redstream(red);
stringstream greenstream(green);
stringstream bluestream(blue);

redstream >> r;
greenstream >> g;
bluestream >> b;

if (r + 50 >= 255)
r = 255;
else
r += 50;

newimage << r << " " << g << " " << b << endl;
}
end = clock(); // get the end of time use
//image.close();
time_use = (double)(end - start) / CLOCKS_PER_SEC; //Copute the time usage
cout << "CPU time: " << time_use;

return 0;
}

Figure 8: Program of image processing on CPU

global void Histogram_CUDA(unsigned char* Image, int* Histogram);

void Histogram_Calculation_CUDA(unsigned char* Image, int Height, int Width, int Channels, int* Histogram){
unsigned char* Dev_Image = NULL;
int* Dev_Histogram = NULL;

//allocate cuda variable memory

cudaMalloc((void**)&Dev_Image, Height * Width * Channels);
cudaMalloc((void**)&Dev_Histogram, 256 * sizeof(int));

//copy CPU data to GPU

cudaMemcpy(Dev_Image, Image, Height * Width * Channels, cudaMemcpyHostToDevice);
cudaMemcpy(Dev_Histogram, Histogram, 256 * sizeof(int), cudaMemcpyHostToDevice);

dim3 Grid_Image(Width, Height);

Histogram_CUDA << <Grid_Image, 1 >> >(Dev_Image, Dev_Histogram);

13
//copy memory back to CPU from GPU
cudaMemcpy(Histogram, Dev_Histogram, 256 * sizeof(int), cudaMemcpyDeviceToHost);

//free up the memory of GPU

cudaFree(Dev_Histogram);
cudaFree(Dev_Image);
}

global void Histogram_CUDA(unsigned char* Image, int* Histogram){

int x = blockIdx.x;
int y = blockIdx.y;

int Image_Idx = x + y * gridDim.x;

atomicAdd(&Histogram[Image[Image_Idx]], 1);
}
Figure 9: Program of image processing on GPU

4.3 Result

Figure 4: CPU time of process image

Figure 10: GPU time of process image

Conclusion: For image processing, GPU is faster than CPU. Because of Vonn-
Newman architecture, the latency of GPU is greater than CPU. When the
program run in CPU, it just uses the data stored in main memory. But when the
program run in GPU, it has to copy data from main memory to GPU’s memory
and process it. When it finishes, the data processed has to copied from GPU’s
memory to main memory to read by CPU. So, GPU is more latency than CPU.
But for bigger calculations, execution time is very greater than transmit time, so
we can ignore it and GPU is stronger than CPU in this situation.

14
5. CONCLUSIONS AND FUTURE WORK

The group learned and investigated how to program on GPU parallel

processors using the Cuda and C++ programming languages in this topic.
However, the topic only covers a few minor applications, such as
programming two examples of how to calculate and process images in
order to compare CPU and GPU performance. As a result, the obtained
result shows the CPU and GPU execution times when each of them runs
the program (Sum of primes from 1 to n - where n is entered by the user).
The results show that the CPU takes less time to execute than the GPU.
Therefore, we can conclude that in this case, the CPU is superior to the
GPU in terms of computing. In addition, GPU outperforms CPU in image
processing. The GPU has a higher latency than the CPU due to the Vonn-
Newman architecture. When the program is run on the CPU, it only
accesses the data in main memory. When running in GPU, however, the
program must copy data from main memory to GPU memory and process
it. When it's done, the data it's processed must be copied from the GPU's
memory to main memory, where it can be read by the CPU. As a result,
the GPU has a higher latency than the CPU. However, because the
execution time is much longer than the transmit time for larger
calculations, we can ignore it and conclude that the GPU is superior to the
CPU in this case. We can develop a method to optimize processor speed
when doing parallel computation by analyzing and comparing the speeds
of this GPU and CPU.

15
6. References
[1] November 2006, NVIDIA, The Book: CUDA C++ PROGRAMMING
GUIDE,www.nvidia.com
[2] Jaegeun Han & Bharatkumar Sharma,The Book: Learn CUDA
Programming_A beginner's guide to GPU programming and parallel
computing with CUDA 10.x and C/C++.
[3] Bhaumik Vaidya, The Book: Hands-On GPU-Accelerated Computer
Vision with OpenCV and CUDA_Effective techniques for processing
complex image data in real-time using GPUs
[4] Eric Young & Frank Jargstorff, The Book: Image Processing & Video
algorithsm with CUDA.

DX Log
No ratings yet
DX Log
30 pages
Problem OTTGAME
No ratings yet
Problem OTTGAME
5 pages
CODE: CPP - Assignment01.Opt1: Assignment Topic Assignment Duration
No ratings yet
CODE: CPP - Assignment01.Opt1: Assignment Topic Assignment Duration
2 pages
BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
No ratings yet
BARTpho: Pre-Trained Sequence-to-Sequence Models For Vietnamese
50 pages
Rivatuner Statistics Server V7.3.2
No ratings yet
Rivatuner Statistics Server V7.3.2
35 pages
284497072 CHƯƠNG TRINH VẼ CAY AND OR PHAN TICH CU PHAP CAU VA PHAN TICH THƠ PDF
No ratings yet
284497072 CHƯƠNG TRINH VẼ CAY AND OR PHAN TICH CU PHAP CAU VA PHAN TICH THƠ PDF
19 pages
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
No ratings yet
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
6 pages
# Team Username Isleader
No ratings yet
# Team Username Isleader
56 pages
Trac Nghiem KTMT
No ratings yet
Trac Nghiem KTMT
21 pages
NguyenTranHuongGiang BTLT1
No ratings yet
NguyenTranHuongGiang BTLT1
6 pages
History of Dart
No ratings yet
History of Dart
2 pages
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
No ratings yet
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
6 pages
MIPS Reference Data Card
No ratings yet
MIPS Reference Data Card
2 pages
Oolt Ict Ds Ai k63 Part1
No ratings yet
Oolt Ict Ds Ai k63 Part1
2 pages
Lab3 Scan-Chain Insertion and ATPG Using DFTADVISOR and FASTSCAN
No ratings yet
Lab3 Scan-Chain Insertion and ATPG Using DFTADVISOR and FASTSCAN
38 pages
ĐỀ CƯƠNG ÔN TẬP HP. TACN. CNTT - 10-23
No ratings yet
ĐỀ CƯƠNG ÔN TẬP HP. TACN. CNTT - 10-23
6 pages
Chapter 7 Problems
No ratings yet
Chapter 7 Problems
3 pages
Line-Tracking Sensor
No ratings yet
Line-Tracking Sensor
9 pages
MAD101-Chap 3
No ratings yet
MAD101-Chap 3
319 pages
HW2 - TCMT - Nhóm A
No ratings yet
HW2 - TCMT - Nhóm A
4 pages
ĐỀ THI GIỮA KÌ +ĐÁP ÁN-XSTK
No ratings yet
ĐỀ THI GIỮA KÌ +ĐÁP ÁN-XSTK
6 pages
Dynamic RAM
No ratings yet
Dynamic RAM
7 pages
Địa Chỉ Các Bit Trong Các Thanh Ghi Của PIC16F877A PDF
No ratings yet
Địa Chỉ Các Bit Trong Các Thanh Ghi Của PIC16F877A PDF
4 pages
Lab4 - 2 Eng
No ratings yet
Lab4 - 2 Eng
20 pages
BTL L13 Nhom 07 Filepdf
No ratings yet
BTL L13 Nhom 07 Filepdf
24 pages
CHƯƠNG TRÌNH VẼ CÂY AND/OR PHÂN TÍCH CÚ PHÁP CÂU VÀ PHÂN TÍCH THƠ
100% (2)
CHƯƠNG TRÌNH VẼ CÂY AND/OR PHÂN TÍCH CÚ PHÁP CÂU VÀ PHÂN TÍCH THƠ
19 pages
EMSY435664-He Thong Nhung
No ratings yet
EMSY435664-He Thong Nhung
6 pages
ĐÁP ÁN LÍ THUYẾT AN TOÀN VÀ BẢO MẬT THÔNG TIN
No ratings yet
ĐÁP ÁN LÍ THUYẾT AN TOÀN VÀ BẢO MẬT THÔNG TIN
40 pages
Trư NG ĐH Bách Khoa Hà N I: A. 05. Gold Mining
No ratings yet
Trư NG ĐH Bách Khoa Hà N I: A. 05. Gold Mining
28 pages
Code Ptit
No ratings yet
Code Ptit
27 pages
Câu hỏi DOM - DOM Quiz - Chương trình học của IT4409 - MOOC daotao.ai
No ratings yet
Câu hỏi DOM - DOM Quiz - Chương trình học của IT4409 - MOOC daotao.ai
4 pages
Lab 2 Modern Cryptography - Des And Aes: Name: Bùi Tấn Lộc ID: 18521002
No ratings yet
Lab 2 Modern Cryptography - Des And Aes: Name: Bùi Tấn Lộc ID: 18521002
5 pages
Answers
No ratings yet
Answers
15 pages
2223 CSC14003 21CLC0607 HW01 Solution
No ratings yet
2223 CSC14003 21CLC0607 HW01 Solution
5 pages
Report SRAM 6T Cell Design - Analysis Nisha-1306184446
No ratings yet
Report SRAM 6T Cell Design - Analysis Nisha-1306184446
51 pages
Embedded Systems Design Using The MSP430FR2355 LaunchPad 2nd Edition Brock J. Lameres 2024 Scribd Download
100% (2)
Embedded Systems Design Using The MSP430FR2355 LaunchPad 2nd Edition Brock J. Lameres 2024 Scribd Download
49 pages
BÀI 1- GIỚI THIỆU PHẦN MỀM SOLIDWORKS
No ratings yet
BÀI 1- GIỚI THIỆU PHẦN MỀM SOLIDWORKS
10 pages
Thiet Ke Vi Mach Cmos Vlsi Tap 1 Tong Van On, 322 Trang (Cuuduongthancong - Com)
No ratings yet
Thiet Ke Vi Mach Cmos Vlsi Tap 1 Tong Van On, 322 Trang (Cuuduongthancong - Com)
169 pages
Phu Luc - Cac Lenh Assembly Cua CPU NIOS II
No ratings yet
Phu Luc - Cac Lenh Assembly Cua CPU NIOS II
19 pages
Final 222 2009 Sol
No ratings yet
Final 222 2009 Sol
6 pages
DSP Lab 1
No ratings yet
DSP Lab 1
10 pages
Assignment
No ratings yet
Assignment
10 pages
Project Report: Demonstration of Types of Viruses and Its Mechanism
No ratings yet
Project Report: Demonstration of Types of Viruses and Its Mechanism
11 pages
Os Lab Manual - 0 PDF
No ratings yet
Os Lab Manual - 0 PDF
56 pages
Lab4 1-Eng
No ratings yet
Lab4 1-Eng
20 pages
ĐỀ TÀI BTL MT2013 - 2023
No ratings yet
ĐỀ TÀI BTL MT2013 - 2023
6 pages
Top-Down Parsing: Programming Language Application
No ratings yet
Top-Down Parsing: Programming Language Application
4 pages
Digital IC Design Lab v1p0 20170324 PDF
No ratings yet
Digital IC Design Lab v1p0 20170324 PDF
68 pages
Ngon Ngu Lap Trinh C++ Va Cau Truc Du Lieu Nguyen Viet Huong (Cuuduongthancong - Com)
No ratings yet
Ngon Ngu Lap Trinh C++ Va Cau Truc Du Lieu Nguyen Viet Huong (Cuuduongthancong - Com)
261 pages
01 Prim's Algorithm For Minimum Spanning Tree (MST)
No ratings yet
01 Prim's Algorithm For Minimum Spanning Tree (MST)
9 pages
Struct - Inlab - BK-LMS
No ratings yet
Struct - Inlab - BK-LMS
13 pages
Nhap Mon CNTT
No ratings yet
Nhap Mon CNTT
110 pages
T NG H P NLHĐH Đã Nén
No ratings yet
T NG H P NLHĐH Đã Nén
564 pages
Subject: PRF192-PFC Workshop 05
No ratings yet
Subject: PRF192-PFC Workshop 05
13 pages
Group9 IT4520Q
No ratings yet
Group9 IT4520Q
6 pages
Chapter 6 - Behavioral Modeling
No ratings yet
Chapter 6 - Behavioral Modeling
71 pages
Web Devlopment Roadmap 1
No ratings yet
Web Devlopment Roadmap 1
9 pages
Đề cương - TA2
No ratings yet
Đề cương - TA2
26 pages
UML Questionnaire - : WWW - Win.tue - NL/ Clange/empanada/survey
No ratings yet
UML Questionnaire - : WWW - Win.tue - NL/ Clange/empanada/survey
34 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
h17343 Vxrail VSRN Citrix Design Guide
No ratings yet
h17343 Vxrail VSRN Citrix Design Guide
23 pages
Nca Aiio
No ratings yet
Nca Aiio
11 pages
Conceptsforauto I
No ratings yet
Conceptsforauto I
23 pages
64fa3b51cff2ce1546237d83 Liqid SmartStack 090723
No ratings yet
64fa3b51cff2ce1546237d83 Liqid SmartStack 090723
3 pages
Maya Render Log
No ratings yet
Maya Render Log
102 pages
Options.4.cod23 (Esport)
100% (1)
Options.4.cod23 (Esport)
9 pages
AVL Guia de Instalação
No ratings yet
AVL Guia de Instalação
36 pages
Secret Key Cryptography Using Graphics Cards
No ratings yet
Secret Key Cryptography Using Graphics Cards
14 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
33 pages
NVIDIA Jetson Nano System-on-Module: Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC
No ratings yet
NVIDIA Jetson Nano System-on-Module: Maxwell GPU + ARM Cortex-A57 + 4GB LPDDR4 + 16GB eMMC
38 pages
1a. Overview
No ratings yet
1a. Overview
18 pages
Passware Kit Forensic Datasheet
No ratings yet
Passware Kit Forensic Datasheet
2 pages
OpenCL Programming Guide
No ratings yet
OpenCL Programming Guide
61 pages
Medal Log 20250604
No ratings yet
Medal Log 20250604
34 pages
NVIDIA - Success Factors Behind $1 Trillion Ecosystem
No ratings yet
NVIDIA - Success Factors Behind $1 Trillion Ecosystem
11 pages
Akhil
No ratings yet
Akhil
81 pages
Brain Tumor Classification Using Vision Transformer (Vit) : Import As Import As
No ratings yet
Brain Tumor Classification Using Vision Transformer (Vit) : Import As Import As
11 pages
Nvidia Ai Enterprise Quick Start Guide
No ratings yet
Nvidia Ai Enterprise Quick Start Guide
55 pages
LIOYDS
No ratings yet
LIOYDS
3 pages
Final - Project Saratandsakshi Advanced Candlesticks Patterns Detection
No ratings yet
Final - Project Saratandsakshi Advanced Candlesticks Patterns Detection
10 pages
80-NU141-1 A Adreno OpenGL ES Developer Guide
No ratings yet
80-NU141-1 A Adreno OpenGL ES Developer Guide
170 pages
2021-11 KaVo Driver 21.1 Release
No ratings yet
2021-11 KaVo Driver 21.1 Release
9 pages
Static Lighting Tricks in Halo
No ratings yet
Static Lighting Tricks in Halo
17 pages
Nvidia Tesla: Gpu Accelerators
No ratings yet
Nvidia Tesla: Gpu Accelerators
3 pages
GPU Based Acceleration of WRF Model: A Review
No ratings yet
GPU Based Acceleration of WRF Model: A Review
4 pages
Workshop 15 - Fea-Dem Transient Coupling: Roc K Y. Es S S. Co
100% (1)
Workshop 15 - Fea-Dem Transient Coupling: Roc K Y. Es S S. Co
61 pages
Snapdragon Processors
No ratings yet
Snapdragon Processors
51 pages
BlockchainTechnology Module 7
No ratings yet
BlockchainTechnology Module 7
57 pages

Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA

Uploaded by

Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA

Uploaded by

HO CHI MINH UNIVERSITY OF

TECHNOLOGY AND EDUCATION

COMPUTER ORGANIZATION AND

PROJECT: THE GPU PROGRAMMING PARALLEL

NGÔ TIẾN TÚ 20119175

Ho Chi Minh City, …., ………2022

2.1. Research subjects:

2.2. Support Tools

Figure 1: Sum of prime number problem’s flowchart

void FPN(int *s, int n)//Find prime numbers

__global__ void FPN(int *s, int n)//Find prime numbers

4.Image processing problem

newimage << type << endl;

string red = "", green = "", blue = "";

Figure 8: Program of image processing on CPU

__global__ void Histogram_CUDA(unsigned char* Image, int* Histogram);

//allocate cuda variable memory

//copy CPU data to GPU

dim3 Grid_Image(Width, Height);

//free up the memory of GPU

__global__ void Histogram_CUDA(unsigned char* Image, int* Histogram){

int Image_Idx = x + y * gridDim.x;

Figure 4: CPU time of process image

Figure 10: GPU time of process image

The group learned and investigated how to program on GPU parallel

You might also like

global void FPN(int *s, int n)//Find prime numbers

global void Histogram_CUDA(unsigned char* Image, int* Histogram);

global void Histogram_CUDA(unsigned char* Image, int* Histogram){