Concurrent Kernel in OpenCL

To execute two kernels concurrently using the same memory in OpenCL: 1) Enqueue both kernels on a concurrent command queue using clEnqueueNDRangeKernel. 2) Pass the event objects from each kernel to buffer read/map calls to read results on the host. 3) Not all hardware supports true concurrent execution, kernels may be serialized. 4) For simple element-wise operations, use a float2 vector data type in a single kernel for efficiency.

Uploaded by

sdancer75

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Concurrent Kernel in OpenCL

Uploaded by

sdancer75

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

5/20/2021 concurrent kernel in OpenCL

concurrent kernel in OpenCL

[0] [1] hanarce
[2018-11-16 15:58:11]
[ memory kernel opencl ]
[ https://stackoverflow.com/questions/53341388/concurrent-kernel-in-opencl ]

I would like know how can I execute two or more different kernels in parallel and at the same time? Obviously in the same GPU using OpenCL. My main idea is to use two
different kernels (kernel A and Kernel B) but they need to use the same memory (I do not want to duplicate the memory by using one buffer for each in the “a” and “b” pointers).
So is there another way I can accomplish the dual execution with an efficient memory technique? The codes of the kernels are the following: Kernel A:

_kernel void kernelA(global struct VectorStruct* a, int aLen0, global struct VectorStruct* b, int bLen0, global struct VectorStruct* c, int cLen0) {
int i = get_local_id(0);
c[(i)].x = a[(i)].x + b[(i)].x; }

Kernel B:

_kernel void kernelB(global struct VectorStruct* a, int aLen0, global struct VectorStruct* b, int bLen0, global struct VectorStruct* d, int cLen0){ int i = get_local_id(0); d[(i)].y = a[(i)]

The definition for the struct VectorStruct is the following:

struct VectorStruct { int x; int y; };

In the host code I have to create four pointers: VectorStruct* a VectorStruct* b VectorStruct* c VectorStruct* d The poiner “a” and “b” have the data that I will transfer to GPU.
The pointer “c” will storage the results of the kernel A, and the pointer “d” will storage the results of the kernel B.

[0] [2018-11-16 16:29:50] pmdj [ ACCEPTED]

You can enqueue your 2 kernels with clEnqueueNDRangeKernel() on a concurrent command queue, i.e. one where CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE was passed during the
clCreateCommandQueue [1]. Then pass both created event objects to the buffer read or map call for reading out the result from the host. Note that not all hardware and OpenCL implementations
supports concurrent execution of different kernels, so they may end up being serialised to some extent after all.

You can also achieve something similar with multiple serial command queues.

For your simple kernel it may be better to use a float2 to represent your vector and perform a vectorised (SIMD) addition in a single kernel. The OpenCL compiler should pick up on the vector
operations and distribute the operations across the parallel hardware automatically.

For slightly more complicated operations where this doesn't work so well, you could represent the vector's x and y coordinates as a 2-element array, and simply enqueue twice the number of work-
items on one kernel that works on alternating dimensions.

Both approaches will give you much more efficient memory access patterns.

Note that your use of get_local_id(0) might be erroneus, depending on what you want to achieve - you probably want to be using get_global_id(0) in this case.

[1] https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html
1

www.stackprinter.com/export?question=53341388&service=stackoverflow 1/1

Ansible Automation Workshop
100% (1)
Ansible Automation Workshop
127 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Net-Centric Past Questions Answers
No ratings yet
Net-Centric Past Questions Answers
7 pages
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
Malware Analysis Series (MAS) - Article 2
No ratings yet
Malware Analysis Series (MAS) - Article 2
96 pages
Exploit Development- Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabili
No ratings yet
Exploit Development- Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabili
57 pages
AdvancedOpenCL Full
No ratings yet
AdvancedOpenCL Full
101 pages
Threads With Ucontext - Implementation Chatgpt
No ratings yet
Threads With Ucontext - Implementation Chatgpt
8 pages
bh-usa-07-sotirov-WP
No ratings yet
bh-usa-07-sotirov-WP
20 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
5 pages
Phrack64 Paper
No ratings yet
Phrack64 Paper
94 pages
Lab5 Mem Internal
No ratings yet
Lab5 Mem Internal
6 pages
Openscad Manual 9
No ratings yet
Openscad Manual 9
11 pages
Network Drivers Lab
No ratings yet
Network Drivers Lab
20 pages
C & C++ Interview Questions You'll Most Likely Be Asked
No ratings yet
C & C++ Interview Questions You'll Most Likely Be Asked
24 pages
Malloc and Calloc
No ratings yet
Malloc and Calloc
13 pages
Bh Usa 07 Sotirov WP
No ratings yet
Bh Usa 07 Sotirov WP
21 pages
Adv Dlmalloc.
No ratings yet
Adv Dlmalloc.
35 pages
Openbravo Eclipse Devel Setup
No ratings yet
Openbravo Eclipse Devel Setup
37 pages
En-Exploit Writing Tutorial Part 4 From Exploit To Metasploit %80%93 The Basics
No ratings yet
En-Exploit Writing Tutorial Part 4 From Exploit To Metasploit %80%93 The Basics
10 pages
Opencl: These Notes Will Introduce Opencl
No ratings yet
Opencl: These Notes Will Introduce Opencl
34 pages
Project - 0x00. AirBnB Clone - The Console - ALX Africa Intranet
100% (1)
Project - 0x00. AirBnB Clone - The Console - ALX Africa Intranet
29 pages
CUDA Putting It All Together
No ratings yet
CUDA Putting It All Together
39 pages
Soa Faq New
No ratings yet
Soa Faq New
36 pages
Design Compiler Synthesis
No ratings yet
Design Compiler Synthesis
14 pages
2011 Fall Midterm1 Soln CS439
No ratings yet
2011 Fall Midterm1 Soln CS439
8 pages
OpenCL Guide
No ratings yet
OpenCL Guide
19 pages
Linux Interview Questions: March 2020
No ratings yet
Linux Interview Questions: March 2020
10 pages
Modern GPU
100% (1)
Modern GPU
221 pages
Javascript Interview Questions
No ratings yet
Javascript Interview Questions
13 pages
EKON27 MLP 1 Sign
No ratings yet
EKON27 MLP 1 Sign
20 pages
Dynamic Memory Allocation: TIME (Whilst The Program Is Running) - Before For The Programs
No ratings yet
Dynamic Memory Allocation: TIME (Whilst The Program Is Running) - Before For The Programs
18 pages
#LAB1
No ratings yet
#LAB1
7 pages
Ece5950 Tut4 Vcs GL
No ratings yet
Ece5950 Tut4 Vcs GL
5 pages
Clojure Guides_ Mathematics with Clojure
No ratings yet
Clojure Guides_ Mathematics with Clojure
4 pages
Linux Internals Interview Question PDF
No ratings yet
Linux Internals Interview Question PDF
7 pages
Offensive VBA
No ratings yet
Offensive VBA
50 pages
Ruby Metasploit Content
No ratings yet
Ruby Metasploit Content
7 pages
Buffer Overflow Vulnerability Lab
No ratings yet
Buffer Overflow Vulnerability Lab
12 pages
Dynamic Memory Allocation and Fragmentation in C and C
100% (1)
Dynamic Memory Allocation and Fragmentation in C and C
13 pages
Frescoplay Courses - Dump
No ratings yet
Frescoplay Courses - Dump
32 pages
Cse410 Sp09 Final Sol
No ratings yet
Cse410 Sp09 Final Sol
10 pages
p64 - 0x06 - Attacking The Core - Kernel Exploitation Notes - by - Twiz & Sgrakkyu
No ratings yet
p64 - 0x06 - Attacking The Core - Kernel Exploitation Notes - by - Twiz & Sgrakkyu
94 pages
Lecture 19-Opencl: Ece 459: Programming For Performance
No ratings yet
Lecture 19-Opencl: Ece 459: Programming For Performance
47 pages
(Videogame) Rendering 102
No ratings yet
(Videogame) Rendering 102
32 pages
Explain Extern "C"
No ratings yet
Explain Extern "C"
6 pages
Prep
No ratings yet
Prep
41 pages
06-Intro To Opencl PDF
No ratings yet
06-Intro To Opencl PDF
57 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Top 30 JavaScript Interview Questions and Answers For 2024 - by Ravi Sharma - Medium
No ratings yet
Top 30 JavaScript Interview Questions and Answers For 2024 - by Ravi Sharma - Medium
45 pages
Buffer Overflow Vulnerability Lab
No ratings yet
Buffer Overflow Vulnerability Lab
9 pages
A Es Implementation On Open CL
No ratings yet
A Es Implementation On Open CL
6 pages
1671766451654
No ratings yet
1671766451654
8 pages
Bonwick 94 Slab
No ratings yet
Bonwick 94 Slab
12 pages
Machine Learning With CAI Lazarus Delphi
100% (1)
Machine Learning With CAI Lazarus Delphi
17 pages
CS411 Visual Programming Past Paperssolvedfrom Imran
No ratings yet
CS411 Visual Programming Past Paperssolvedfrom Imran
9 pages
Understanding Process Memory
No ratings yet
Understanding Process Memory
39 pages
Lab 4 Process (Cont) Course: Operating Systems: Thanh Le-Hai Hoang Email: Thanhhoang@hcmut - Edu.vn
No ratings yet
Lab 4 Process (Cont) Course: Operating Systems: Thanh Le-Hai Hoang Email: Thanhhoang@hcmut - Edu.vn
9 pages
C++ Inteview Ques
No ratings yet
C++ Inteview Ques
27 pages
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Redirect HTTP To HTTPS in Nginx - Linuxize
No ratings yet
Redirect HTTP To HTTPS in Nginx - Linuxize
7 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Opencl 2.0 Features: Benjamin Coquelle MAY 2015
No ratings yet
Opencl 2.0 Features: Benjamin Coquelle MAY 2015
40 pages
How Do I Disable X at Boot Time So That The System Boots in Text Mode
No ratings yet
How Do I Disable X at Boot Time So That The System Boots in Text Mode
11 pages
(INFO) ANDROID DEVICE PARTITIONS and FILESYSTEMS - XDA Developers Forums
No ratings yet
(INFO) ANDROID DEVICE PARTITIONS and FILESYSTEMS - XDA Developers Forums
12 pages
Android Partitions Explained - Boot, System, Recovery, Data, Cache & Misc
No ratings yet
Android Partitions Explained - Boot, System, Recovery, Data, Cache & Misc
6 pages
Selling Yourself To Others
No ratings yet
Selling Yourself To Others
28 pages
How To Enable ES6 (And Beyond) Syntax With Node and Express
No ratings yet
How To Enable ES6 (And Beyond) Syntax With Node and Express
20 pages
Performance Microsoft - TypeScript Wiki
No ratings yet
Performance Microsoft - TypeScript Wiki
11 pages
Token Based Authentication Made Easy - Auth0
100% (1)
Token Based Authentication Made Easy - Auth0
10 pages
Pre-73 DLX: Vintage Style Pre Amplifier
No ratings yet
Pre-73 DLX: Vintage Style Pre Amplifier
2 pages
Redis Command Line To View Chinese Without Scrambling
No ratings yet
Redis Command Line To View Chinese Without Scrambling
3 pages
The Use of Information Technology in The Universit
No ratings yet
The Use of Information Technology in The Universit
9 pages
8 Function Christmas Light Circuit - Homemade Circuit Projects
No ratings yet
8 Function Christmas Light Circuit - Homemade Circuit Projects
5 pages
Inside The Intel and Creative Assembly Collaboration: White Paper
No ratings yet
Inside The Intel and Creative Assembly Collaboration: White Paper
10 pages
A Journey Through The CPU Pipeline
No ratings yet
A Journey Through The CPU Pipeline
20 pages
F3294 Phe840m
No ratings yet
F3294 Phe840m
2 pages
How To Compute The PSNR (Peak Signal-To-Noise Ratio)
No ratings yet
How To Compute The PSNR (Peak Signal-To-Noise Ratio)
3 pages
G5ca 1a Relay
No ratings yet
G5ca 1a Relay
4 pages
Configuring The Arm NN SDK Build Environment For TensorFlow PDF
No ratings yet
Configuring The Arm NN SDK Build Environment For TensorFlow PDF
11 pages
BR Training Catalog
No ratings yet
BR Training Catalog
64 pages
TKM Iot
No ratings yet
TKM Iot
2 pages
Installing SAPRouter On Linux
No ratings yet
Installing SAPRouter On Linux
3 pages
Electronic Equipment: Profibus DP Fieldbus Control
No ratings yet
Electronic Equipment: Profibus DP Fieldbus Control
6 pages
Seo Cheat Sheet PDF
No ratings yet
Seo Cheat Sheet PDF
2 pages
Unit 2 Clock-Driven Scheduling: 5.1 Notations and Assumptions
No ratings yet
Unit 2 Clock-Driven Scheduling: 5.1 Notations and Assumptions
18 pages
Re505x (Us) Qig Rev2.1.0
No ratings yet
Re505x (Us) Qig Rev2.1.0
2 pages
August 2024 - Top 10 Read Articles in VLSI Design & Communication Systems
No ratings yet
August 2024 - Top 10 Read Articles in VLSI Design & Communication Systems
24 pages
Huawei OceanStor T Series Technical White Paper
No ratings yet
Huawei OceanStor T Series Technical White Paper
38 pages
Staff Management System Report
No ratings yet
Staff Management System Report
4 pages
Family Tree Heritage Gold 2022 Crack Serial Keygen
No ratings yet
Family Tree Heritage Gold 2022 Crack Serial Keygen
1 page
CD164937 - Metro Grau - Chiclayo
No ratings yet
CD164937 - Metro Grau - Chiclayo
3 pages
Big - Data PPT Unit 1
No ratings yet
Big - Data PPT Unit 1
85 pages
Forest Fire Detection and Recognition
No ratings yet
Forest Fire Detection and Recognition
11 pages
Network Layer Protocols: Arp, Ipv4, Icmp, Ipv6, and Icmpv6: Review Questions
No ratings yet
Network Layer Protocols: Arp, Ipv4, Icmp, Ipv6, and Icmpv6: Review Questions
4 pages
Purchase Intention Questionnaire Ott
No ratings yet
Purchase Intention Questionnaire Ott
15 pages
DS-7304/7308/7316HI-S Standalone DVR Dalone DVR: Key Features
0% (1)
DS-7304/7308/7316HI-S Standalone DVR Dalone DVR: Key Features
1 page
Smart Reader For Blind People
No ratings yet
Smart Reader For Blind People
3 pages
Empowering Developers To Deploy Their Own Data Stores. A Story of Terraform, Puppet and Rage - Tomas Doran
No ratings yet
Empowering Developers To Deploy Their Own Data Stores. A Story of Terraform, Puppet and Rage - Tomas Doran
29 pages
2018 Came BPT MTM
No ratings yet
2018 Came BPT MTM
56 pages
Domination Number of Graphs1
No ratings yet
Domination Number of Graphs1
16 pages
How To Setup Auto-Deployment Using Codepipeline and Codedeploy
No ratings yet
How To Setup Auto-Deployment Using Codepipeline and Codedeploy
30 pages
Stored Procedure and User-Defined Functions: TCS Internal
No ratings yet
Stored Procedure and User-Defined Functions: TCS Internal
31 pages
TENWAY BF-480 V1.0 Software Programming Step
No ratings yet
TENWAY BF-480 V1.0 Software Programming Step
11 pages
Bionic A15 Pro?
100% (2)
Bionic A15 Pro?
3 pages
Manual Profiler EMP-400 Manual-DRAFT
No ratings yet
Manual Profiler EMP-400 Manual-DRAFT
102 pages
CHAPTER 2 Review of Literature. SAMPLE.
No ratings yet
CHAPTER 2 Review of Literature. SAMPLE.
4 pages

Concurrent Kernel in OpenCL

Uploaded by

Concurrent Kernel in OpenCL

Uploaded by

5/20/2021 concurrent kernel in OpenCL

concurrent kernel in OpenCL

The definition for the struct VectorStruct is the following:

struct VectorStruct { int x; int y; };

[0] [2018-11-16 16:29:50] pmdj [ ACCEPTED]

You might also like