Speed up Code executions with help of Pragma in C/C++

Last Updated : 21 Sep, 2021

The primary goal of a compiler is to reduce the cost of compilation and to make debugging produce the expected results. Not all optimizations are controlled directly by a flag, sometimes we need to explicitly declare flags to produce optimizations. By default optimizations are suppressed. To use suppressed optimizations we will use pragmas.

Example for unoptimized program: Let us consider an example to calculate Prime Numbers up to 10000000.

Below is the code with no optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with NO optimization
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs";
 
    return 0;
}

Output:

Execution time: 0.592183 secs

Following are the Optimization:

1. O1: Optimizing compilation at O1 includes more time and memory to break down larger functions. The compiler makes an attempt to reduce both code and execution time. At O1 hardly any optimizations produce great results, but O1 is a setback for an attempt for better optimizations.

Below is the implementation of previous program with O1 optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with O1 optimization
 
// To see the working of controlled
// optimization "O1"
#pragma GCC optimize("O1")
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs.";
 
    return 0;
}

Output:

Execution time: 0.384945 secs.

2. O2: Optimizing compilation at O2 optimize to a greater extent. As compared to O1, this option increases both compilation time and the performance of the generated code. O2 turns on all optimization flags specified by O1.

Below is the implementation of previous program with O2 optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with O2 optimization
 
// To see the working of controlled
// optimization "O2"
#pragma GCC optimize("O2")
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs.";
 
    return 0;
}

Output:

Execution time: 0.288337 secs.

3. O3: All the optimizations at level O2 are specified by O3 and a list of other flags are also enabled. Few of the flags which are included in O3 are flop-interchange -flop-unroll-jam and -fpeel-loops.

Below is the implementation of previous program with O3 optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with O3 optimization
 
// To see the working of controlled
// optimization "O3"
#pragma GCC optimize("O3")
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs.";
 
    return 0;
}

Output:

Execution time: 0.580154 secs.

4. Os: It is optimize for size. Os enables all O2 optimizations except the ones that have increased code size. It also enables -finline-functions, causes the compiler to tune for code size rather than execution speed and performs further optimizations designed to reduce code size.

Below is the implementation of previous program with Os optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with Os optimization
 
// To see the working of controlled
// optimization "Os"
#pragma GCC optimize("Os")
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs.";
 
    return 0;
}

Output:

Execution time: 0.317845 secs.

5. Ofast: Ofast enables all O3 optimizations. It also has the number of enabled flags that produce super optimized results. Ofast combines optimizations produced by each of the above O levels. This optimization is usually preferred by a lot of competitive programmers and is hence recommended. In case more than one optimizations are declared the last declared one gets enabled.

Below is the implementation of previous program with Ofast optimization:

C++

// C++ program to calculate the Prime
// Numbers upto 10000000 using Sieve
// of Eratosthenes with Ofast optimization
 
// To see the working of controlled
// optimization "Ofast"
#pragma GCC optimize("Ofast")
 
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
 
// Boolean array for Prime Number
vector<bool> prime(N, true);
 
// Sieve implemented to find Prime
// Number
void sieveOfEratosthenes()
{
    for (int i = 2; i <= sqrt(N); ++i) {
        if (prime[i]) {
            for (int j = i * i; j <= N; j += i) {
                prime[j] = false;
            }
        }
    }
}
 
// Driver Code
int main()
{
    // Initialise clock to calculate
    // time required to execute without
    // optimization
    clock_t start, end;
 
    // Start clock
    start = clock();
 
    // Function call to find Prime Numbers
    sieveOfEratosthenes();
 
    // End clock
    end = clock();
 
    // Calculate the time difference
    double time_taken
        = double(end - start)
          / double(CLOCKS_PER_SEC);
 
    // Print the Calculated execution time
    cout << "Execution time: " << time_taken
         << " secs.";
 
    return 0;
}

Output:

Execution time: 0.303287 secs.

To further achieve optimizations at architecture level we can use targets with pragmas. These optimizations can produce surprising results. However it is recommended to use target with any of the optimizations specified above.

Below is the implementation of previous program with Target:

C++14

// C++ program to calculate the Prime 
// Numbers upto 10000000 using Sieve 
// of Eratosthenes with Ofast optimization along with target optimizations 
 
// To see the working of controlled 
// optimization "Ofast" 
#pragma GCC optimize("Ofast") 
#pragma GCC target("avx,avx2,fma")
 
#include <cmath> 
#include <iostream> 
#include <vector> 
#define N 10000005 
using namespace std; 
 
// Boolean array for Prime Number 
vector<bool> prime(N, true); 
 
// Sieve implemented to find Prime 
// Number 
void sieveOfEratosthenes() 
{ 
    for (int i = 2; i <= sqrt(N); ++i) { 
        if (prime[i]) { 
            for (int j = i * i; j <= N; j += i) { 
                prime[j] = false; 
            } 
        } 
    } 
} 
 
// Driver Code 
int main() 
{ 
    // Initialise clock to calculate 
    // time required to execute without 
    // optimization 
    clock_t start, end; 
 
    // Start clock 
    start = clock(); 
 
    // Function call to find Prime Numbers 
    sieveOfEratosthenes(); 
 
    // End clock 
    end = clock(); 
 
    // Calculate the time difference 
    double time_taken 
        = double(end - start) 
        / double(CLOCKS_PER_SEC); 
 
    // Print the Calculated execution time 
    cout << "Execution time: " << time_taken 
        << " secs."; 
 
    return 0; 
} 

Output:

Execution time: 0.292147 secs.

Facts and Question related to Style of writing programs in C/C++

executable

Improve

Article Tags :

Practice Tags :

Speed up Code executions with help of Pragma in C/C++

C++

C++

C++

C++

C++

C++

C++14

Similar Reads

Thank You!

What kind of Experience do you want to share?