Compilation_Time-Based_Analysis_using_Op
Compilation_Time-Based_Analysis_using_Op
net/publication/343007153
CITATIONS READS
0 24
3 authors:
Engr Mubashar Ch
COMSATS University Islamabad
11 PUBLICATIONS 33 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Engr Mubashar Ch on 17 July 2020.
Abstract [2]. The third technique involves changing the loop order.
Compilation time has always been an important factor for The impact of loop order is an important count in execution
performance analysis of any system. This paper discusses the time. Basically, it reduces the jump calls between
various optimized iterative techniques to analyze the performance instruction execution which in return reduce the overall
of programs like loop unrolling, loop level parallelism or loop- execution time. It comes up with vertical and horizontal
carried dependence, and loop ordering. In first technique, loop is
unrolled up to scale 5 and then compared with rolled one to find
execution order of instructions.
out performance differences. In the second technique, it finds that 𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑘 = 0; 𝑘 < 100; 𝑘 + +)
the loop carried dependence and inter-change the statements order {
and exposes the par-allelism. In the third one, the loop order is . . .
changed to reduce the jump calls during code execution. The
execution time of all three methods are compared, that is the proof 𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑗 = 0; 𝑗 < 100; 𝑗 + +)
of high performance after implementing the optimized iterative {
techniques. The execution time may differ on different machines. . . .
The results are calculated on a core i5 machine with 2.7Ghz
processor under Linux kernel. }
Key words:
Compilation time; performance analysis; iterative techniques;
program optimization.
𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑗 = 0; 𝑗 < 100; 𝑗 + +)
{
1. Introduction . . .
Optimization techniques are the helping hand of any system 𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑘 = 0; 𝑘 < 100; 𝑘 + +)
developed to perform well. Different optimization {
techniques are developed to reduce the execution time/ . . .
memory usage by processor. Some of these are }
implemented on ma-chine level, while others are the source }
level implementation. This paper discusses the source level
optimization of iterative methods to reduce the execution This technique also counts for the high-performance system
time of a program. The first technique discusses the loop that needs the iterative solutions to be implemented. The
unrolling against loop rolling. If a large number of loop paper is further divided as follows. It discusses related work
iterations exist in the kernel pipeline, the loop iterations that is already gone through the experiment. Then the
could potentially be the critical path of the kernel pipeline. methodology is explained under all the techniques
UL can increase the pipeline throughput by allocating more experimented during this research. Then the results and
hardware resources to the loop [1]. It causes the reduction conclusion sum up the discussion about the experiment
of compilation time that indirectly adds up to performance. under discussion.
The loop-carried dependence is the iterative dependency
that exists in loops that causes a barrier to implement
parallelism. To implement the loop-level parallel-ism, we 2. Related Work
need to recognize the structure of loops, arrays and any
variable involved. If the findings show that the statements Loop unrolling is widely helpful in different types of
inside the loop are not circular dependent, then we can make applications. Some of its applications exist in image
alterations in statements to execute parallel and improve the processing where the image convolution takes help of loop
execution time. The results of this technique also count for unrolling while multiplying the matrix. It helps in creating
higher performance and reduction in compilation technique optimized algorithms. The performance after optimization
3. Proposed Methodology
3.2 Loop level parallelism
This approach comes up with three different methodologies
to find out the best possible optimization on source level.
All the three techniques are explained here with source code The following code is about eliminating the loop-carried
and results. Every function provokes from the 𝑚𝑎𝑖𝑛() dependence and exposing the loop-level parallelism [2].
function. While for comparing the results both scenarios are
discussed. 𝐿𝑜𝑜𝑝 − 𝑙𝑒𝑣𝑒𝑙 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑒:
}
𝐴[0] = 𝐴[0] + 𝐵[0];
}
𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑖 = 0; 𝑖 < 99; 𝑖 + +)
𝐿𝑜𝑜𝑝 𝑜𝑟𝑑𝑒𝑟 2:
{ 𝑣𝑜𝑖𝑑 𝑙𝑜𝑜𝑝2()
} {
} {
The above code is just finding out the dependency between 𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑗 = 0; 𝑗 < 100; 𝑗 + +)
the instructions and finding if it can be removed or not.
{
3.3 Loop ordering
𝑥[𝑗][𝑘] = 𝑖;
The following code describes the loop ordering technique
that is also helpful in some scenarios to enhance the }
performance.
}
𝐿𝑜𝑜𝑝 𝑜𝑟𝑑𝑒𝑟 1:
}
𝑣𝑜𝑖𝑑 𝑙𝑜𝑜𝑝1()
}
{
It can be observed that here the most inner loop is
𝑖𝑛𝑡 𝑥[100][100]; interchanged with the second inner loop thus improves the
execution time.
𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑖 = 0; 𝑖 < 10000; 𝑖 + +)
3.4 Execution process
{
The execution is the process where we can compare the
𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑗 = 0; 𝑗 < 100; 𝑗 + +) results of its compilation time that indirectly is a sign of
performance evaluation of the system. To get the execution
{ time, we come up with the following strategies.
𝑓𝑜𝑟 (𝑖𝑛𝑡 𝑘 = 0; 𝑘 < 100; 𝑘 + +) // Get the system time before function start
𝑎𝑢𝑡𝑜 𝑠𝑡𝑎𝑟𝑡 = ℎ𝑖𝑔ℎ_𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛_𝑐𝑙𝑜𝑐𝑘: : 𝑛𝑜𝑤();
{
// Invoke the function,
𝑥[𝑗][𝑘] = 𝑖;
𝑙𝑜𝑜𝑝1()
} //Get the system time after function stop execution
𝑎𝑢𝑡𝑜 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 = 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛_𝑐𝑎𝑠𝑡 < 𝑚𝑖𝑐𝑟𝑜𝑠𝑒𝑐𝑜𝑛𝑑𝑠 > [5] D. del Rio Astorga, M. F. Dolz, L. M. Sánchez, J. D. García,
(𝑠𝑡𝑜𝑝 – 𝑠𝑡𝑎𝑟t); M. Danelutto, and M. Torquati, "Finding parallel patterns
through static analysis in C++ applications," The
International Journal of High Performance Computing
4. Results Applications, vol. 32, pp. 779-788, 2018.
[6] (2020, 15-Jan-2020). C++ Debugger. Available:
The results are taken from two different sources after https://www.onlinegdb.com/
compilation. The first source is the online compiler [6]. The
second source is the Linux OS. Table1 shows the results.
Ume Farwa completed BS (Information
Technology) from University of Education,
LR: Loop rolling Lahore in 2017. Presently, she is MPhil
Scholar at Information Technology
LUR: Loop unrolling University Lahore, Pakistan. Her research
interest is including HCI, machine learning,
LLP: Loop-level parallelism data mining, networks and programming.
1.05 0.3001 4.3828 1.8878 1.4375 0.3556 Khurshid Asghar is working as Associate
Professor at Department of Computer
0.2126 0.0421 3.3843 2.903 2.5339 2.3422 Science University of Okara. He earned
PhD degree in the field of image forensics
from COMSATS University Islamabad,
Pakistan. He also worked as research
associate at Cardiff School of Computer
5. Conclusion Science and Informatics, Cardiff
University, UK. His current research
The techniques for optimizing the iterative methods are interest includes Image Processing, Image
useful scenario to scenario. For example, some techniques and Video Forensics, Machine Learning, Deep Learning, Network
might not be as helpful as expected. On the other hand, there Security, Biometrics, Medical Imaging Brain Signals, Geometric
Modeling and Computer programming.
is machine to machine performance variation. So, the above
results may vary if the same code is run on some other
machine with different environments. Overall these Mubbashar Siddique is working as
techniques are helpful in various real time applications as Lecturer at Department of Computer
well as for the OS itself, as stated above. Image processing Sciences. He completed BSc
takes it into a great account to use the loops for matrix (Telecommunication Engineering) from
manipulation. These techniques are still on the way to Institute of Engineering & Technology,
improve time to time for achieving high performance on Lahore Campus, and Pakistan. He got merit
different systems. scholarship from COMSATS University
Islamabad (Abbottabad Campus), Pakistan
where he completed his MS computer
References science in 2010. Presently, he is a PhD
[1] Z. Wang, B. He, W. Zhang, and S. Jiang, "A performance Scholar at COMSATS University Islamabad, Pakistan. Mr.
analysis framework for optimizing OpenCL applications on Siddique also worked as a research associate at Department of
FPGAs," in 2016 IEEE International Symposium on High Cyber Defense Graduate School of Information Security, Korea
Performance Computer Architecture (HPCA), 2016, pp. 114- University, South Korea. Presently, he is working in video and
125. image forensic domain. Furthermore, his research interest is in the
[2] J. L. Hennessy and D. A. Patterson, Computer architecture: a area of image/video processing, computer vision, machine
quantitative approach vol. 6: Elsevier, 2019. learning, data mining and networks.
[3] A. Tousimojarad, W. Vanderbauwhede, and W. P. Cockshott,
"2D Image Convolution using Three Parallel Programming
Models on the Xeon Phi," arXiv preprint arXiv:1711.09791,
2017.
[4] T. M. Low, F. D. Igual, T. M. Smith, and E. S. Quintana-Orti,
"Analytical modeling is enough for high-performance BLIS,"
ACM Transactions on Mathematical Software (TOMS), vol.
43, pp. 1-18, 2016.