0% found this document useful (0 votes)
22 views

1 Compiler - Slide

Java source code is compiled to bytecode, which is initially interpreted by the Java Virtual Machine (JVM). The JVM's just-in-time (JIT) compiler then profiles and compiles frequently executed bytecode to native machine code for improved performance.

Uploaded by

gilbertelena7898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

1 Compiler - Slide

Java source code is compiled to bytecode, which is initially interpreted by the Java Virtual Machine (JVM). The JVM's just-in-time (JIT) compiler then profiles and compiles frequently executed bytecode to native machine code for improved performance.

Uploaded by

gilbertelena7898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A Bit of History

• In the early 1950s, most programming was with assembly language


CS 335: An Overview of • Led to low programmer productivity and high cost of software development

Compilation • In 1954, John Backus proposed a program that translated high level
expressions into native machine code for IBM 704 mainframe
Swarnendu Biswas • Fortran (Formula Translator) I project (1954-1957): The first compiler
was released
Semester 2022-2023-II
CSE, IIT Kanpur

Content influenced by many excellent references, see References slide for acknowledgements.

CS 335 Swarnendu Biswas

Impact of Fortran Executing Programs


• Programmers were initially reluctant to use a high-level programming • Programming languages are an abstraction for describing
language for fear of lack of performance computations
• Fortran I compiler was the first optimizing compiler • For e.g., control flow constructs and data abstraction
• Advantages of high-level programming language abstractions
• The Fortran compiler has had a huge impact on the field of • Improved productivity, fast prototyping, improved readability, maintainability, and
programming languages and computer science debugging
• Many advances in compilers were motivated by the need to generate efficient
Fortran code • The abstraction needs to be transferred to machine-executable form
• Modern compilers preserve the basic structure of the Fortran I compiler! to be executed

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


What is a Compiler? Important Features of a Compiler
• A compiler is a system software that translates a program in a source
language to an equivalent program in a target language
• Generate correct code
• System software (e.g., OS and compilers) helps application software to run
• Improve the code according to some metric
• Provide feedback to the user, point out errors and potential
source Compiler target mistakes in the program
program program

• Typical “source” languages might be C, C++, or Java


• The “target” language is usually the instruction set of some processor
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Source-Source Translators Transpiler vs Compiler


• Produce a target program in another programming language rather Transpiler Compiler
than the assembly language of some processor
• Also known as transcompilers or transpilers • Converts between programming • A “traditional” compiler
• TypeScript transpiles to JavaScript, and many research compilers generate C languages at approximately the translates a higher level
programs same level of abstraction programming language to a
• The output programs require further translation before they can be lower level language
executed
• A typesetting program that produces PostScript can be considered a
compiler
• Typesetting LaTeX to generate PDF is compilation

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Interpreter Compilers vs Interpreters
• An interpreter takes as input an executable specification and Compilers Interpreters
produces as output the result of executing the specification
• Translates the whole program at • Executes the program one line at a
once time
source • Compilation and execution happens
program at the same time
Interpreter output
input • Memory requirement during • Memory requirement is less, since
compilation is more there is less state to maintain
• Error reports are congregated • Error reports are per line
• Scripting languages are often interpreted (e.g., Bash) • On an error, compilers try to fix the • Stops translation on an error
error and proceed past
• Examples: C, C++, and Java • Examples: Bash and Python

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

More about Interpreters and Compilers Hybrid Translation Schemes


• Whether a language is interpreted or compiled is an implementation- • Translation process for a few languages include both compilation and
level detail interpretation (e.g., Lisp)
• If all implementations are interpreters, we say the language is interpreted • Java is compiled from source code into a form called bytecode
• Python is compiled to bytecode, and the bytecode is interpreted (.class files)
(CPython is the reference implementation) • Java virtual machines (JVMs) start execution by interpreting the
• Interpreting the bytecode is faster than interpreting a higher-level bytecode
representation
• PyPy both interprets and just-in-time (JIT) compiles the bytecode to optimized • JVMs usually also include a just-in-time compiler that compiles
machine code at runtime frequently-used bytecode sequences into native code
• JIT compilation happens at runtime and is driven by profiling
https://stackoverflow.com/questions/6889747/is-python-interpreted-or-compiled-or-both

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Compilation Flow in Java with Hotspot JVM Language Processing
• Language processing is an important component of programming
.java javac .class Template
program compiler bytecode • A large number of system software and application programs require
interpreter
structured input
input • Command line interface in Operating Systems
C1 + C2 • Query language processing in Databases
Hotspot
compiler JVM • Type setting systems like Latex

output

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

A Language-Processing System
Development Toolchain
skeletal source program

Programmer Source program Assembly code


preprocessor Editor Compiler Assembler

source program Programmer Object


fixes bugs code
absolute machine code Controlled
compiler execution with Resolved machine
debug information code
library, relocatable Debugger Loader Linker
target assembly program linker/loader object files

assembler relocatable machine code


Executable machine
code
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Goals of a Compiler Applications of a Compiler
• A compiler must preserve the meaning of the program being
compiled
• Proving a compiler correct is a challenging problem and an active area of
research
DO I = 1, N
• A compiler must improve the input program in some discernible way DO J = 1, M
A(I,J+1) = A(I,J) + B
• Compilation time and space required must be reasonable ENDDO
• The engineering effort in building a compiler should be manageable ENDDO

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Applications of a Compiler Programming Language vs Natural Language


• Perform loop transformations to help with parallelization • Natural languages
• Interpretation of words or phrases evolve over time
• E.g., “awful” meant worthy of awe and “bachelor” meant an young knight
• Allow ambiguous interpretations
DO I = 1, N DO J = 1, M
DO J = 1, M DO I = 1, N • “I saw someone on the hill with a telescope.” or “I went to the bank.”
A(I,J+1) = A(I,J) + B A(I,J+1) = A(I,J) + B • “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo”
ENDDO ENDDO
ENDDO ENDDO
• Programming languages have well-defined structures and
interpretations, and disallow ambiguity

https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Constructing a Compiler
• A compiler is one of the most intricate software systems
• General-purpose compilers often involve more than a hundred thousand LoC
• Very practical demonstration of integration of theory and engineering
Idea Implementation
Finite and push-down automata Lexical and syntax analysis
Greedy algorithms
Fixed-point algorithms
Register allocation
Dataflow analysis
Structure of a Compiler
… …

• Other practical issues such as concurrency and synchronization, and


optimizations for the memory hierarchy and target processor

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Compiler Structure Compiler Structure


• A compiler interfaces with both the source language and the target
architecture intermediate
source representation target
program
Front End Back End program
intermediate
source representation target
program
Front End Back End program
Compiler

• Front end is responsible for understanding the input program in a source


Compiler language
• Back end is responsible for translating the input program to the target
architecture

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Intermediate Representation Advantages of Two-Phased Compiler
• An intermediate representation (IR) is a data structure to encode
Structure
information about the input program • Simplifies the process of writing or retargeting a compiler
• E.g., graphs, three address code, LLVM IR • Retargeting is the task of adapting the compiler to generate code for a new
• Different IRs may be used during different phases of compilation processor

int f(int a, int b) { define i32 @f(i32 %a, i32 %b) {


source Front End Back End
return a + 2*b; ; <label>:0
} LLVM IR %1 = mul i32 2, %b
language 1 1 1
%2 = add i32 %a, %1
ret i32 %2 intermediate
int main() { } representation
return f(10, 20);
} define i32 @main() {
; <label>:0 source Front End Back End
%1 = call i32 @f(i32 10, i32 20) language n n n
ret i32 %1
}
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Three-Phased View of a Compiler Three-Phased View of a Compiler


• IR makes it possible to add more phases to compilation • Front end consists of two or three passes that handle the details of
the input source-language program
• Optimization phase contains many passes to perform different
source IR IR target optimizations
program
Front End Optimizer Back End program
• The IR is generated by the front end
• The number and purpose of these passes vary across compiler
implementations
Compiler
• The back end passes lower the IR representation closer to the target
• Optimizer is an IR→IR transformer that tries to improve the IR machine’s instruction set
program in some way

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Visualizing the LLVM Compiler System Implementation Choices
Monolithic structure Multipass structure

• Can potentially be more • Less complex and easier to


efficient, but is less flexible debug
• Can incur compile time
performance penalties

https://blog.gopheracademy.com/advent-2018/llvm-ir-and-go/

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Translation in a Compiler
• Direct translation from a high-level language to machine code is
difficult
• Mismatch in the abstraction level between source code and machine code
• Compare abstract data types and variables vs memory locations and registers
• Control flow constructs vs jump and returns

Phases in a Compiler • Some languages are farther from machine code than others
• For example, languages supporting object-oriented paradigm

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Translation in a Compiler Different Phases in a Compiler
source program target program
• Translate in small steps, where each step handles a reasonably
simple, logical, and well defined task
• Design a series of IRs to encode information across steps lexical analyzer code generator
symbol table
• IR should be amenable to program manipulation of various kinds (e.g., type
checking, optimization, and code generation)
• IR becomes more machine specific and less language specific as syntax analyzer error handler code optimizer
translation proceeds

intermediate code
semantic analyzer generator

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Front End Lexical Analysis


• First step in translation is to compare the input program structure • Reads characters in the source program and groups them into a
with the language definition stream of tokens (or words)
• Requires a formal definition of the language, in the form of regular • Tokens represent a syntactic category
expressions and context-free grammar • Character sequence forming a token is called a lexeme
• Two separate passes in the front end, often called the scanner and the • Tokens can be augmented with the lexical value
parser, determine whether or not the input code is a valid program defined
by the grammar position = initial + rate * 60

• Tokens are ID, “=”, ID, “+”, ID, “*”, CONSTANT

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Challenges in Lexical Analysis Syntax Analysis
• Identify word separators • Once words are formed, the next logical step is to understand the
• The language must define rules for breaking a sentence into a sequence of structure of the sentence
words • This is called syntax analysis or parsing
• Normally white spaces and punctuations are word separators in languages
• Syntax analysis imposes a hierarchical structure on the token stream
• In programming languages, a character from a different class may also be
treated as a word separator =
position = initial + rate * 60
id1 +

id2 *

id3 60

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Semantic Analysis Semantic Analysis


• Once the sentence is constructed, we need to interpret the meaning • Compiler performs other checks like type checking and matching
of the sentence formal and actual arguments
X saw someone on the hill with a telescope
position = initial + “rate” * 60

JJ said JJ left JJ’s assignment at home

• This is a very challenging task for a compiler


• Programming languages define very strict rules to avoid ambiguities
• For e.g., scope of variable named JJ

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Intermediate Representation Code Optimization
• Once all checks pass, the front end generates an IR form of the code • Attempts to improve the IR code according to some metric
• IR is a program for an abstract machine • E.g., reduce the execution time, code size, or resource usage
• “Optimizing” compilers spend a significant amount of compilation
time in this phase
id1 = id2 + id3 * 60 t1 = inttofloat(60)
t2 = id3 * t1 • Most optimizations consist of an analysis and a transformation
t3 = t2 + id2 • Analysis determines where the compiler can safely and profitably apply the
id1 = t3
technique
• Data flow analysis tries to statically trace the flow of values at run-time
• Dependence analysis tries to estimate the possible values of array subscript expressions

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Code Optimization Challenges with Code Optimization


• Some common optimizations • All strategies may not work for all applications
• Common sub-expression elimination, copy propagation, dead code • Compiler may need to adapt its strategies to fit specific programs
elimination, loop invariant code motion, strength reduction, and constant
folding • Choice and order of optimizations
• Parameters that control decisions & transformations

t1 = inttofloat(60) t1 = id3 * 60.0 • Active research on “autotuning” or “adaptive runtime”


t2 = id3 * t1 id1 = t1 + id2 • Compiler writer cannot predict a single answer for all possible programs
t3 = t2 + id2 • Use learning, models, or search to find good strategies
id1 = t3

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Steps in Code Generation Steps in Code Generation
• Back end traverses the IR code and emits code for the target machine • Register allocation
• First stage is instruction selection • Decide which values should occupy the limited set of architectural registers
• Translate IR operations into target machine instructions
• Can take advantage of the feature set of the target machine • Instruction scheduling
• Assumes infinite number of registers via virtual registers • Reorder instructions to maximize utilization of hardware resources and
minimize cycles
t1 = id3 * 60.0 MOVF id3 -> r2
id1 = t1 + id2 MULF #60.0, r2 -> r2
MOVF id2 -> r1
ADDF r2, r1 -> r1
MOVF r1 -> id1

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

LOAD/STORE take 3 cycles, MUL takes LOAD/STORE take 3 cycles, MUL takes
2 cycles, and ADD takes 1 cycle. 2 cycles, and ADD takes 1 cycle.
Naïve Instruction Scheduling Improved Instruction Schedule
LOAD @ADDR1, @OFF1 -> R1 LOAD @ADDR1, @OFF1 -> R1 LOAD @ADDR1, @OFF1 -> R1
ADD R1, R1 -> R1 ADD R1, R1 -> R1 LOAD @ADDR2, @OFF2 -> R2
LOAD @ADDR2, @OFF2 -> R2 LOAD @ADDR2, @OFF2 -> R2 LOAD @ADDR3, @OFF3 -> R3
MUL R1, R2 -> R1 MUL R1, R2 -> R1 ADD R1, R1 -> R1
LOAD @ADDR3, @OFF3 -> R2 LOAD @ADDR3, @OFF3 -> R2 MUL R1, R2 -> R1
MUL R1, R2 -> R1 MUL R1, R2 -> R1 MUL R1, R3 -> R1
STORE R1 -> @ADDR1, @OFF1 STORE R1 -> @ADDR1, @OFF1 STORE R1 -> @ADDR1, @OFF1

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

You might also like