0% found this document useful (0 votes)
10 views

A Overview

The document provides an overview of a compiler course, detailing its history, structure, and the importance of studying compilers. It explains the roles of interpreters and compilers, the process of compiling, and the significance of understanding compiler techniques for programming and software development. Additionally, it outlines the course project, prerequisites, and encourages collaborative work among students.

Uploaded by

Arif Kamal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

A Overview

The document provides an overview of a compiler course, detailing its history, structure, and the importance of studying compilers. It explains the roles of interpreters and compilers, the process of compiling, and the significance of understanding compiler techniques for programming and software development. Additionally, it outlines the course project, prerequisites, and encourages collaborative work among students.

Uploaded by

Arif Kamal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Compilers

Overview and Administrivia

04/01/25 © 2002-08 Hal Perkins & UW CSE A-1


Credits
 Some direct ancestors of this course
 Cornell CS 412-3 (Teitelbaum, Perkins)
 Rice CS 412 (Cooper, Kennedy, Torczon)
 UW CSE 401 (Chambers, Ruzzo, et al)
 UW CSE 582 (Perkins)
 Many grad compiler courses
 Many books (Appel; Cooper/Torczon;
Aho, [Lam,] Sethi, Ullman [Dragon Book])

04/01/25 © 2002-08 Hal Perkins & UW CSE A-2


Agenda
 Introductions
 What’s a compiler?
 Administrivia

04/01/25 © 2002-08 Hal Perkins & UW CSE A-3


Interpreters & Compilers
 Interpreter
 A program that reads an source
program and produces the results of
executing that program
 Compiler
 A program that translates a program
from one language (the source) to
another (the target)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-4


Common Issues
 Compilers and interpreters both
must read the input – a stream of
characters – and “understand” it;
analysis

w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0


) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }

04/01/25 © 2002-08 Hal Perkins & UW CSE A-5


Interpreter
 Interpreter

Execution engine

Program execution interleaved with
analysis
running = true;
while (running) {
analyze next statement;
execute that statement;
}

Usually need repeated analysis of
statements (particularly in loops,
functions)

But: immediate execution, good
debugging & interaction
04/01/25 © 2002-08 Hal Perkins & UW CSE A-6
Compiler
 Read and analyze entire program
 Translate to semantically equivalent
program in another language
 Presumably easier to execute or more efficient
 Should “improve” the program in some fashion
 Offline process
 Tradeoff: compile time overhead (preprocessing
step) vs execution performance

04/01/25 © 2002-08 Hal Perkins & UW CSE A-7


Typical Implementations
 Compilers
 FORTRAN, C, C++, Java, COBOL, etc. etc.
 Strong need for optimization in many
cases
 Interpreters
 PERL, Python, Ruby, awk, sed, shells,
Scheme/Lisp/ML, postscript/pdf, Java VM
 Particularly effective if interpreter
overhead is low relative to execution cost
of individual statements
04/01/25 © 2002-08 Hal Perkins & UW CSE A-8
Hybrid approaches
 Well-known example: Java
 Compile Java source to byte codes – Java
Virtual Machine language (.class files)
 Execution

Interpret byte codes directly, or

Compile some or all byte codes to native code
 Just-In-Time compiler (JIT) – detect hot spots & compile
on the fly to native code – standard these days
 Variation: .NET
 Compilers generate MSIL
 All IL compiled to native code before
execution
04/01/25 © 2002-08 Hal Perkins & UW CSE A-9
Why Study Compilers? (1)
 Become a better programmer(!)
 Insight into interaction between
languages, compilers, and hardware
 Understanding of implementation
techniques
 What is all that stuff in the debugger
anyway?
 Better intuition about what your code
does
04/01/25 © 2002-08 Hal Perkins & UW CSE A-10
Why Study Compilers? (2)
 Compiler techniques are everywhere
 Parsing (little languages, interpreters,
XML)
 Database engines, query languages
 AI: domain-specific languages
 Text processing

Tex/LaTex -> dvi -> Postscript -> pdf
 Hardware: VHDL; model-checking tools
 Mathematics (Mathematica, Matlab)
04/01/25 © 2002-08 Hal Perkins & UW CSE A-11
Why Study Compilers? (3)
 Fascinating blend of theory and
engineering
 Direct applications of theory to practice

Parsing, scanning, static analysis
 Some very difficult problems (NP-hard
or worse)

Resource allocation, “optimization”, etc.

Need to come up with good-enough
approximations/heuristics

04/01/25 © 2002-08 Hal Perkins & UW CSE A-12


Why Study Compilers? (4)
 Ideas from many parts of CSE
 AI: Greedy algorithms, heuristic search
 Algorithms: graph algorithms, dynamic
programming, approximation algorithms
 Theory: Grammars, DFAs and PDAs, pattern
matching, fixed-point algorithms
 Systems: Allocation & naming,
synchronization, locality
 Architecture: pipelines, instruction set use,
memory hierarchy management

04/01/25 © 2002-08 Hal Perkins & UW CSE A-13


Why Study Compilers? (5)
 You might even write a compiler
some day!
 You’ll almost certainly write parsers
and interpreters in some context if
you haven’t already

04/01/25 © 2002-08 Hal Perkins & UW CSE A-14


Structure of a Compiler
 First approximation
 Front end: analysis

Read source program and understand its
structure and meaning
 Back end: synthesis

Generate equivalent target language
program
Source Front End Back End Target

04/01/25 © 2002-08 Hal Perkins & UW CSE A-15


Implications
 Must recognize legal programs (&
complain about illegal ones)
 Must generate correct code
 Must manage storage of all
variables/data
 Must agree with OS & linker on target
format
Source Front End Back End Target

04/01/25 © 2002-08 Hal Perkins & UW CSE A-16


More Implications
 Need some sort of Intermediate
Representation(s) (IR)
 Front end maps source into IR
 Back end maps IR to target machine
code
 Often multiple IRs – higher level at first,
lower level in later phases
Source Front End Back End Target

04/01/25 © 2002-08 Hal Perkins & UW CSE A-17


source tokens IR
Scanner Parser

Front End
 Split into two parts
 Scanner: Responsible for converting character
stream to token stream

Also strips out white space, comments
 Parser: Reads token stream; generates IR
 Both of these can be generated
automatically
 Source language specified by a formal grammar
 Tools read the grammar and generate scanner
& parser (either table-driven or hard-coded)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-18


Tokens
 Token stream: Each significant
lexical chunk of the program is
represented by a token

Operators & Punctuation: {}[]!+-=*;: …

Keywords: if while return goto

Identifiers: id & actual name

Constants: kind & value; int, floating-
point character, string, …

04/01/25 © 2002-08 Hal Perkins & UW CSE A-19


Scanner Example
 Input text
// this statement does very little
if (x >= y) y = 42;
 Token Stream

IF LPAREN ID(x) GEQ ID(y)

RPAREN ID(y) BECOMES INT(42) SCOLON


 Notes: tokens are atomic items, not
character strings; comments & whitespace
are not tokens (in most languages)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-20


Parser Output (IR)
 Many different forms
 Engineering tradeoffs have changed
over time (e.g., memory is (almost) free
these days)
 Common output from a parser is
an abstract syntax tree
 Essential meaning of the program
without the syntactic noise

04/01/25 © 2002-08 Hal Perkins & UW CSE A-21


Parser Example
 Token Stream  Abstract Syntax
Input
IF LPAREN ID(x) Tree
ifStmt
GEQ ID(y) RPAREN
>= assign
ID(y) BECOMES

INT(42) SCOLON ID(x) ID(y) ID(y) INT(42)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-22


Static Semantic Analysis
 During or (more common) after
parsing
 Type checking
 Check language requirements like
proper declarations, etc.
 Preliminary resource allocation
 Collect other information needed by
back end analysis and code generation

04/01/25 © 2002-08 Hal Perkins & UW CSE A-23


Back End
 Responsibilities
 Translate IR into target machine code
 Should produce “good” code

“good” = fast, compact, low power
consumption (pick some)
 Should use machine resources
effectively

Registers

Instructions

Memory hierarchy
04/01/25 © 2002-08 Hal Perkins & UW CSE A-24
Back End Structure
 Typically split into two major parts
with sub phases
 “Optimization” – code improvements

Often works on lower-level IR than parser
AST
 Code generation

Instruction selection & scheduling

Register allocation

04/01/25 © 2002-08 Hal Perkins & UW CSE A-25


The Result
 Input  Output
if (x >= y)
y = 42; mov eax,
[ebp+16]
ifStmt cmp eax,[ebp-8]
jl L17
>= assign
mov [ebp-8],42
ID(x) ID(y) ID(y) INT(42)
L17:

04/01/25 © 2002-08 Hal Perkins & UW CSE A-26


Some History (1)
 1950’s. Existence proof
 FORTRAN I (1954) – competitive with
hand-optimized code
 1960’s
 New languages: ALGOL, LISP, COBOL,
SIMULA
 Formal notations for syntax, esp. BNF
 Fundamental implementation
techniques

Stack frames, recursive procedures, etc.
04/01/25 © 2002-08 Hal Perkins & UW CSE A-27
Some History (2)
 1970’s
 Syntax: formal methods for producing
compiler front-ends; many theorems
 Late 1970’s, 1980’s
 New languages (functional; Smalltalk &
object-oriented)
 New architectures (RISC machines,
parallel machines, memory hierarchy
issues)
 More attention to back-end issues
04/01/25 © 2002-08 Hal Perkins & UW CSE A-28
Some History (3)
 1990s and beyond

Compilation techniques appearing in many
new places

Just-in-time compilers (JITs)

Software analysis, verification, security

Phased compilation – blurring the lines
between “compile time” and “runtime”

Using machine learning techniques to control
optimizations(!)

Compiler technology critical to effective use
of new hardware (RISC, Itanium, complex
memory heirarchies)

The new 800 lb gorilla - multicore
04/01/25 © 2002-08 Hal Perkins & UW CSE A-29
CSEP 501 Course Project
 Best way to learn about compilers is to
build one
 CSEP 501 course project: Implement an
x86 compiler in Java for an object-
oriented programming language

MiniJava subset of Java from Appel book

Includes core object-oriented parts (classes,
instances, and methods, including subclasses and
inheritance)

Basic control structures (if, while)

Integer variables and expressions
04/01/25 © 2002-08 Hal Perkins & UW CSE A-30
Project Details
 Goal: large enough language to be interesting;
small enough to be tractable
 Project due in phases

Final result is the main thing, but timeliness and
quality of intermediate work counts for something

Final report & short conference at end of the course
 Core requirements, then open-ended
 Reasonably open to alternative projects; let’s
discuss

Most likely would be a different implementation
language (C#, ML, F#, ?) or target (MIPS/SPIM, x86-
64, …)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-31


Prerequisites
 Assume undergrad courses in:
 Data structures & algorithms

Linked lists, dictionaries, trees, hash tables, &c
 Formal languages & automata

Regular expressions, finite automata, context-free
grammars, maybe a little parsing
 Machine organization

Assembly-level programming for some machine (not
necessarily x86)
 Gaps can usually be filled in
 But be prepared to put in extra time if needed

04/01/25 © 2002-08 Hal Perkins & UW CSE A-32


Project Groups
 You are encouraged to work in
groups of 2 or 3

Pair programming strongly encouraged
 Space for CVS or SVN repositories +
other shared files available on UW
CSE machines

Use if desired; not required

Mail to instructor/TA if you want this
04/01/25 © 2002-08 Hal Perkins & UW CSE A-33
Programming
Environments
 Whatever you want!

But assuming you’re using Java, your code
should compile & run using standard Sun
javac/java

Generics (Java 5/6) are fine

If you use C# or something else, you
assume some risk of the unknown

Work with other members of the class and pull
together

Class discussion list can be very helpful here

If you’re looking for a Java IDE, try Eclipse

Or netbeans, or <name your favorite>

javac/java + emacs for the truly hardcore
04/01/25 © 2002-08 Hal Perkins & UW CSE A-34
CSEP 501 Web
 Everything is (or will be) at
www.cs.washington.edu/csep501
 Lecture slides will be on the course
web by mid-afternoon before each
class

Printed copies available in class at UW, but
you may want to read or print in advance
 Live video during class

But do try to join us (questions, etc.)
 Archived video and slides from class
will be posted a day or two later
04/01/25 © 2002-08 Hal Perkins & UW CSE A-35
Books
 Three good books:
 Aho, Lam, Sethi, Ullman, “Dragon
Book”, 2nd ed (but 1st ed is also fine)
 Appel, Modern Compiler
Implementation in Java, 2nd ed.
 Cooper & Torczon, Engineering a
Compiler
 Dragon book is the “official” text, but all would work &
we’ll draw on all three (and more)
 If we put these on reserve in the engineering library,
would anyone notice?
04/01/25 © 2002-08 Hal Perkins & UW CSE A-36
Coming Attractions
 Review of formal grammars
 Lexical analysis – scanning
 Background for first part of the project
 Followed by parsing …

 Good time to read the first couple of


chapters of (any of) the book(s)

04/01/25 © 2002-08 Hal Perkins & UW CSE A-37

You might also like