Speed up Code executions with help of Pragma in C/C++
Last Updated :
21 Sep, 2021
The primary goal of a compiler is to reduce the cost of compilation and to make debugging produce the expected results. Not all optimizations are controlled directly by a flag, sometimes we need to explicitly declare flags to produce optimizations. By default optimizations are suppressed. To use suppressed optimizations we will use pragmas.
Example for unoptimized program: Let us consider an example to calculate Prime Numbers up to 10000000.
Below is the code with no optimization:
C++
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs" ;
return 0;
}
|
Output:
Execution time: 0.592183 secs
Following are the Optimization:
1. O1: Optimizing compilation at O1 includes more time and memory to break down larger functions. The compiler makes an attempt to reduce both code and execution time. At O1 hardly any optimizations produce great results, but O1 is a setback for an attempt for better optimizations.
Below is the implementation of previous program with O1 optimization:
C++
#pragma GCC optimize("O1")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.384945 secs.
2. O2: Optimizing compilation at O2 optimize to a greater extent. As compared to O1, this option increases both compilation time and the performance of the generated code. O2 turns on all optimization flags specified by O1.
Below is the implementation of previous program with O2 optimization:
C++
#pragma GCC optimize("O2")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.288337 secs.
3. O3: All the optimizations at level O2 are specified by O3 and a list of other flags are also enabled. Few of the flags which are included in O3 are flop-interchange -flop-unroll-jam and -fpeel-loops.
Below is the implementation of previous program with O3 optimization:
C++
#pragma GCC optimize("O3")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.580154 secs.
4. Os: It is optimize for size. Os enables all O2 optimizations except the ones that have increased code size. It also enables -finline-functions, causes the compiler to tune for code size rather than execution speed and performs further optimizations designed to reduce code size.
Below is the implementation of previous program with Os optimization:
C++
#pragma GCC optimize("Os")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.317845 secs.
5. Ofast: Ofast enables all O3 optimizations. It also has the number of enabled flags that produce super optimized results. Ofast combines optimizations produced by each of the above O levels. This optimization is usually preferred by a lot of competitive programmers and is hence recommended. In case more than one optimizations are declared the last declared one gets enabled.
Below is the implementation of previous program with Ofast optimization:
C++
#pragma GCC optimize("Ofast")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.303287 secs.
To further achieve optimizations at architecture level we can use targets with pragmas. These optimizations can produce surprising results. However it is recommended to use target with any of the optimizations specified above.
Below is the implementation of previous program with Target:
C++14
#pragma GCC optimize("Ofast")
#pragma GCC target("avx,avx2,fma")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.292147 secs.
Similar Reads
Executing main() in C/C++ - behind the scene
How to write a C program to print "Hello world" without main() function? At first, it seems impractical to execute a program without a main() function because the main() function is the entry point of any program. Let us first understand what happens under the hood while executing a C program in Lin
4 min read
Use of "stdafx.h" header in C++ with examples
A header file contains the set of predefined standard library functions. The header file can be included in the program with the C preprocessing directive "#include". All the header files have ".h" extension. Syntax: #include <header_file> / "header_file" Here, #: A preprocessor directiveheade
3 min read
Measure execution time of a function in C++
We can find out the time taken by different parts of a program by using the std::chrono library introduced in C++ 11. We have discussed at How to measure time taken by a program in C. The functions described there are supported in C++ too but they are C specific. For clean and robust C++ programs we
3 min read
Writing C/C++ code efficiently in Competitive programming
First of all you need to know about Template, Macros and Vectors before moving on the next phase! Templates are the foundation of generic programming, which involve writing code in a way that is independent of any particular type.A Macro is a fragment of code which has been given a name. Whenever th
6 min read
Facts and Question related to Style of writing programs in C/C++
Here are some questions related to Style of writing C programs: Question-1: Why i++ executes faster than i + 1 ? Answer-1: The expression i++ requires a single machine instruction such as INR to carry out the increment operation whereas, i + 1 requires more instructions to carry out this operation.
2 min read
Convert C/C++ program to Preprocessor code
We use g++ compiler to turn provided Cpp code into preprocessor code. To see the preprocessor code generated by the CPP compiler, we can use the â-Eâ option on the command line: Preprocessor include all the # directives in the code. and also expands MACRO function. Syntax: g++ -E filename.cpp // MAC
1 min read
#pragma Directive in C
In C, the #pragma directive is a special purpose directive that is used to turn on or off some features. #pragma also allows us to provide some additional information or instructions to the compiler. It is compiler-specific i.e., the behavior of pragma directive varies from compiler to compiler. Syn
6 min read
Writing code faster during Competitive Programming in C++
This article focuses on how to implement your solutions and implement them fast while doing competitive programming. Setup Please refer Setting up a C++ Competitive Programming Environment Snippets Snippet is a programming term for a small region of re-usable source Code. A lot of modern text editor
3 min read
Introduction to Parallel Programming with OpenMP in C++
Parallel programming is the process of breaking down a large task into smaller sub-tasks that can be executed simultaneously, thus utilizing the available computing resources more efficiently. OpenMP is a widely used API for parallel programming in C++. It allows developers to write parallel code ea
5 min read
Register Allocations in Code Generation
Registers are the fastest locations in the memory hierarchy. But unfortunately, this resource is limited. It comes under the most constrained resources of the target processor. Register allocation is an NP-complete problem. However, this problem can be reduced to graph coloring to achieve allocation
6 min read