0% found this document useful (0 votes)
5 views

TalkingCompilersWithGPT

The document provides lecture notes for a 60-hour course on Compiler Construction, authored by Fernando Magno Quintão Pereira, which incorporates insights from large language models like ChatGPT. It outlines the course structure, key topics such as lexical analysis, parsing, and code generation, and discusses the distinctions between interpretation, compilation, and virtualization of programming languages. The notes emphasize a collaborative effort in their creation, reflecting on the integration of modern tools in teaching compiler concepts.

Uploaded by

gmndaxdo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

TalkingCompilersWithGPT

The document provides lecture notes for a 60-hour course on Compiler Construction, authored by Fernando Magno Quintão Pereira, which incorporates insights from large language models like ChatGPT. It outlines the course structure, key topics such as lexical analysis, parsing, and code generation, and discusses the distinctions between interpretation, compilation, and virtualization of programming languages. The notes emphasize a collaborative effort in their creation, reflecting on the integration of modern tools in teaching compiler concepts.

Uploaded by

gmndaxdo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 916

This material is freely available and must not be sold or used for commercial purposes.

Send comments, typos and suggestions to [email protected]

Talking Compilers with ChatGPT


Lecture Notes for a 60-Hour Course on Compiler Construction

Fernando Magno Quintão Pereira


The Federal University of Minas Gerais - Brazil
December of 2024
Version 0.2
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

To Gabriel, a little dragonslayer,


who finds something new each day
to teach Rafa and me.

1
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Preface

In the second semester of 2024, I was tasked with teaching Compiler Construction to
undergraduate students at the Federal University of Minas Gerais. At that time, I had been
conducting research in the field of compilers for almost twenty years. However, I had never
taught this specific course before. Instead, I was usually teaching courses like Theory and
Practice of Programming Language Design or Basic Programming to undergraduates.

That said, I did have some material prepared for a graduate course on Static Program Analysis,
which I could potentially reuse. But all of that seemed almost irrelevant, because this was the age
of ChatGPT, Gemini, and Llama! It was the era of large language models (LLMs) and all the
incredible things they enable. So, I decided to embrace these tools wholeheartedly to help create
my lecture notes.

I usually teach by asking students questions, and my lecture notes often consist of
questions—typically without the answers. These notes turned out to be excellent guidelines for
crafting a more comprehensive story to share with the Compiler Construction students. This PDF
is primarily the result of feeding ChatGPT and other LLMs those questions and refining their
responses.

The table of contents, in turn, is inspired by the ACM Undergraduate Curricula for Compilers. I
tried to cover as much of it as time and energy allowed. I've organized the contents following
Bob Nystrom's Crafting Interpreters: first I discuss the implementation of an interpreter (for a
simple expression-oriented subset of SML/NJ) and then talk about code generation (using
RISC-V as the main target).

Of course, not all the answers generated by LLMs met my expectations. For instance, I struggled
to get GPT to explain that the application "(fn x => fn y => x + y) z" would yield a closure in
SML/NJ. But these minor frustrations were far outweighed by the joy of working with these
models. The process of creating this book was both productive and exciting—I learned much
along the way.

The students of that second semester of 2024 played a key role in helping me shape a compelling
narrative. I am also deeply grateful to the teaching assistants, who reviewed the code I uploaded
to GitHub. Ultimately, this book was a collective effort involving many people, and I owe them
all my thanks. This includes friends from the Reddit community r/compilers, as well as
colleagues who also teach compiler construction. To everyone who contributed to this collection
of lecture notes, muito obrigado!

Fernando Magno Quintão Pereira


Belo Horizonte - December 24th, 2024

2
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Table of Contents
●​ Introduction
●​ Lexical Analysis
●​ Tree-Like Program Representation
●​ Recursive-Descent Parsing
●​ Bottom-Up Parsing
●​ Parser Generators and Parser Combinators
●​ Variables and Bindings
●​ The Visitor Design Pattern
●​ Type Systems
●​ Type Checking
●​ Type Inference
●​ Anonymous Functions
●​ Recursive Functions
●​ Introduction to Code Generation
●​ Code Generation for Expressions
●​ Code Generation for Statements
●​ Code Generation for Functions
●​ Memory Allocation
●​ Pointers and Aggregate Types
●​ Code Generation for Object-Oriented Features
●​ Heap Allocation
●​ Introduction to Code Optimizations
●​ Data-Flow Analyses
●​ Static Single-Assignment Form
●​ SSA-Based Register Allocation

3
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Introduction
How do we run programs?
There are three typical ways to execute programs:
●​ Interpretation: In this method, the program remains in a high-level representation, such
as an Abstract Syntax Tree (AST) or another intermediate form. The interpreter
processes this representation directly, executing the program step by step within a
"dispatch loop." No machine code is generated beforehand, so the interpreter must
remain active throughout the execution to manage and execute each instruction.
●​ Compilation: This method involves translating the entire program into machine code (a
binary executable) ahead of time using a compiler. Once the compilation is complete, the
resulting machine code can be executed directly by the hardware without further
interpretation or translation. The program runs natively on the machine, leading to
potentially faster execution compared to interpretation.
●​ Virtualization: Virtualization usually involves transforming the program into a low-level
code representation, often called bytecode. This bytecode is not directly executable by
the hardware but is designed to be efficiently interpreted by a virtual machine. The virtual
machine then executes the bytecode within its own dispatch loop, similar to how an
interpreter would handle a high-level representation. While the bytecode is typically more
abstract than machine code, it is more structured and lower-level than the high-level
representation used in pure interpretation.

Have you ever seen the cover of "Crafting Interpreters". It shows different
ways to analyze or execute programs. Have you ever heard of the book?
Yes, "Crafting Interpreters" by Robert Nystrom is a well-known and highly regarded book in the
programming community. It provides a hands-on, practical introduction to building interpreters
and compilers from scratch. The book is split into two parts: the first part walks readers through
building a tree-walk interpreter for a simple programming language called Lox, focusing on
concepts like lexical analysis, parsing, and interpretation. The second part dives into more
advanced topics like bytecode compilation, virtual machines, and optimization. The book is
written in a conversational, accessible style, making it ideal for both beginners and experienced
developers interested in language design and implementation. Here's the cover page:

4
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Some people like to call "Crafting Interpreters" the "Mountain Book". That's due to its cover
page, which creatively visualizes the different paths taken by compilers and interpreters during
the execution or translation of programming languages. The image shows a mountainous terrain
that symbolizes the journey of a program from source code to its final form, whether that's
machine code or bytecode. Here’s an explanation of the key paths shown:

1.​ Scanning (Lexical Analysis):


○​ This is the first step after source code is provided. The scanner breaks down the
source code into tokens. Tokens are meaningful elements like keywords,
variables, operators, etc.
2.​ Parsing:
○​ Once the tokens are generated, they are used by the parser to build a syntax
tree (or Abstract Syntax Tree, AST). The syntax tree represents the grammatical
structure of the source code, organizing the tokens in a way that reflects the
logical relationships of the code.
3.​ Tree-Walk Interpreter:
○​ One possible path from here is a tree-walk interpreter, where the interpreter
directly walks through the syntax tree and executes the code step by step without
further translation. This is typical for simple or interpreted languages like Python.
It avoids the need to generate intermediate code or machine code.
4.​ Transpiling:
○​ Another path from the syntax tree is transpiling, which involves converting the
code into another high-level language. Transpilers take source code in one
language and transform it into another high-level language, often for
cross-language compatibility or platform-specific optimizations.
5.​ Intermediate Representations (IRs):
○​ The diagram also shows the path toward Intermediate Representations (IRs).
These are lower-level, platform-independent forms of code that are still not

5
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

machine code. Intermediate representations allow optimization and are a crucial


step in compilers like LLVM.
6.​ Optimizing:
○​ Before generating final code, the intermediate representation can be optimized.
This step involves improving the efficiency of the code, such as reducing memory
usage, speeding up execution, or simplifying complex instructions.
7.​ Code Generation:
○​ Finally, code generation transforms the intermediate representation into
machine code that can run directly on hardware. This is the end of the
compilation process and produces an executable file or binary.
8.​ Virtual Machine (Bytecode Execution):
○​ The other key path is the generation of bytecode. Bytecode is a lower-level,
platform-independent code that is not as optimized as machine code but can be
executed by a virtual machine (like the JVM for Java or Python’s interpreter).
This path involves compiling source code into bytecode that can be interpreted or
just-in-time compiled by the virtual machine at runtime.

Is there a definitive way to distinguish a compiler from an interpreter?


Yes, even though the difference between these tools sometimes can be blurry, we can
distinguish them based on their asymptotic complexity. The compiler complexity is
proportional to program size, and interpreter complexity depends on the program's
runtime behavior (including input size and algorithm). Here's a breakdown of this idea:

1.​ Compiler Complexity:


○​ A compiler's job is to translate the entire source code into machine code (or
another intermediate representation). The time complexity of this process is
generally proportional to the size of the source code, assuming no extreme
optimizations. It's about syntactic and semantic analysis, transformations, and
code generation, which depend on the program's size (number of tokens, AST
nodes, etc.), not its runtime behavior.
2.​ Interpreter Complexity:
○​ An interpreter, by contrast, directly executes the source code. Its complexity is
influenced by the size of the program and the nature of the program's execution,
which means both the input and the algorithm's complexity matter. The
interpreter continuously evaluates the program during execution, making the
overall computational complexity depend on how long and how deeply the code
runs.

Which languages are normally interpreted, compiled or virtualized?


AST Interpreters are common among educational and toy languages. Shell scripting languages
are also normally interpreted in this way. Other examples of AST-interpreted languages involve
these languages with heavy meta-programming capabilities, like Scheme, Lisp, and Haskell (via
GHCi). Many educational languages are also AST-interpreted, like MicroPython and Logo.

6
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Compiled languages are typically "System Languages", that is, languages meant to implement
computer systems like operating systems, databases, webserves, video games, etc. Examples
include C, C++, Rust, Go, Fortran and Pascal.

And almost everything else is virtualized. Examples of languages where programs are
translated to bytecodes, and then the bytecodes are interpreted include Python (via CPython),
Java (via the JVM), C# (via the CLR), etc.

It's important to note that some languages can fall into multiple categories depending on the
implementation. For instance, Python can be both interpreted directly (in some implementations)
and virtualized via bytecode (in CPython). JavaScript is another example that, depending on the
engine, may use a mix of interpretation, JIT compilation, and virtualized execution.

So, why are some languages interpreted and others compiled?


The decision to interpret or compile a programming language often comes down to a trade-off
between compilation time and execution time, which can be framed as the economy of user
time. The user experience in programming is influenced by the total time it takes to both
prepare and run a program, and this total time can be broken into two phases: the time spent
compiling (if compilation is necessary) and the time spent executing the program.

Interpretation vs. Compilation

●​ Interpreted languages (like Bash) are ideal for cases where the program's execution is
relatively short-lived or dynamic. The overhead of compiling in such cases would
outweigh any potential performance gains, so interpretation, which avoids the
compilation step and executes code directly, is more efficient in terms of overall time. For
example, in a Bash script, the primary role is to automate tasks and launch other
programs (which are often compiled). Since the script itself executes for a short time
compared to the programs it runs, interpreting it makes sense. There's little benefit to
compiling such scripts, as they aren't run long enough for the performance gains of
compilation to be worth the initial cost.
●​ Compiled languages (like C or C++) are used when performance is critical, particularly
when the program is expected to run for a long time or repeatedly. Programs in these
languages, such as operating system kernels or performance-sensitive applications,
benefit greatly from being compiled because the compilation process allows for
optimizations that make the resulting machine code run faster. While compilation takes
time, this upfront cost is justified because the compiled program can be executed many
times, and each execution benefits from the optimizations. The more a program is
expected to run, the more it makes sense to invest time in compiling and optimizing it.

Hybrid Approaches: Just-In-Time (JIT) Compilation

7
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Some languages, like JavaScript, benefit from both interpretation and compilation, depending
on the program's runtime behavior. In many cases, JavaScript programs are short-lived, running
only while a web page is open. For these situations, interpreting the code makes sense because
it avoids the overhead of compiling something that won't be used for long.

However, when a JavaScript program runs for an extended period (e.g., a web application that
remains active), a JIT (Just-In-Time) compiler can be used. JIT compilation begins by
interpreting the code but monitors how it runs. If certain parts of the code are executed
frequently, the JIT compiler steps in to compile and optimize those parts on the fly. This hybrid
approach balances the initial speed of interpretation with the eventual performance
improvements of compilation for long-running programs.

Conclusion

The choice between interpretation and compilation can be seen as a matter of optimizing for
user time:

●​ Interpreted languages shine when the cost of compilation would exceed the benefits
because the programs are expected to run for short durations.
●​ Compiled languages are essential for long-running programs, where upfront
compilation costs are offset by significant execution speed improvements.
●​ JIT compilation provides a middle ground, allowing for initial speed via interpretation
and optimizing performance as needed for longer-running code.

This time trade-off reflects the different roles that languages play in modern computing
environments, from quick scripting tasks to performance-critical applications, with tools like JIT
adapting dynamically to both ends of the spectrum.

Let's take a look into some program representations. How to Visualize the
AST of clang?
Given this program below:

#include <stdio.h>

int strlen(const char *str) {


int length = 0;
while (str[length] != '\0') {
length++;
}
return length;
}

int main() {
char str[] = "Hello, world!";

8
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

int length = strlen(str);


printf("The length of the string is: %d\n", length);
return 0;
}

We can view its AST using the following command:

$> clang -cc1 -ast-dump strlen.c

Or, if you have graphviz installed:

$> clang -cc1 -ast-view strlen.c

What about in Java? How to view the bytecodes?

Consider:

public class T {
public static void main(String args[]) {
int x = 1;
x += (x = 2);
System.out.println("x = " + x);
}
}

We can do:

$> javac T.java


$> javap -c T

So, with Java Bytecodes we can compile different programming languages


to different virtual machines. Is that correct?
Yes, that's very much correct! Java Bytecodes don't have to run on the JVM exclusively. There
are different virtual machines that run them. See the figure below to have an idea:

9
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Usually, we call these three parts that form up the language ecosystem its front-end, its
middle-end and its back-end.

1. Front-End (Language Independence)

The front-end is responsible for:

●​ Lexical analysis (tokenizing the input).


●​ Parsing (checking the syntax).
●​ Semantic analysis (type checking, variable declarations, etc.).

The front-end translates the source code (Java, C++, Rust, etc.) into an intermediate
representation (IR). This layer is responsible for converting high-level constructs (functions,
loops, variables) into an IR, such as LLVM IR in LLVM compilers or bytecode in the JVM.

By creating a common IR, multiple programming languages can target the same runtime
environment. For example, in the LLVM ecosystem, languages like C, C++, Rust, Swift, and
others can all be compiled into LLVM IR.

2. Middle-End (Language-Independent Optimizations)

10
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The middle-end is focused on applying optimizations to the IR generated by the front-end.


These optimizations improve the efficiency of the program by reducing the number of
instructions or improving execution time, without worrying about the underlying hardware.

Examples of middle-end optimizations include:

●​ Constant folding: Replacing expressions like 3 + 4 with 7 during compile-time.


●​ Dead code elimination: Removing parts of the code that are never executed.
●​ Loop unrolling: Expanding loops to reduce the overhead of iteration.

Since this phase operates on the IR, it is independent of both the source language and the
target machine architecture. This allows the same optimizations to be applied to code from
any language and for any platform.

3. Back-End (Target-Specific Code Generation)

The back-end is responsible for transforming the optimized IR into machine code or bytecode
that can be executed by a specific processor or runtime environment. It maps the IR into
instructions that the target architecture can understand (such as x86, ARM, RISC-V) and
handles:

●​ Register allocation: Deciding how to use the CPU registers to hold data.
●​ Instruction scheduling: Ordering the instructions for efficient execution on a given
CPU.
●​ Target-specific optimizations: Improving performance based on the capabilities of the
hardware, such as vectorization or using specific CPU instructions.

The back-end is highly target-dependent, but the separation allows engineers to support
multiple architectures without needing to change the previous stages.

Is it the case that a system that processes programming languages might


be engineered into three parts: front-end, middle-end and back-end?
Yes! The division of programming language processing systems into front-end, middle-end, and
back-end components beautifully encapsulates a modular approach that enhances
maintainability, scalability, and flexibility. The front-end focuses on parsing and semantic
analysis, transforming source code into an intermediate representation while providing error
diagnostics, which enables better user experience and developer productivity. The middle-end
optimizes this representation through various transformations, enhancing performance and
resource usage without tying the optimizations to any specific architecture. Finally, the back-end
translates the optimized representation into target code, whether it be machine code, bytecode,
or another format, ensuring the system can adapt to different platforms and architectures. This
separation allows developers to innovate and improve each layer independently, fostering a rich
ecosystem of tools and techniques that can evolve over time while keeping the entire
compilation process efficient and effective.

11
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why do engineers write runtime environments like this: with a front-end, a


middle-end and a back-end?
Engineers write runtime environments and compilers with a front-end, middle-end, and
back-end structure because it enables modularity, optimization, and flexibility in the compilation
and execution process. Here's a breakdown of why this design pattern is used:

1. Separation of Concerns

Each stage in this pipeline addresses different responsibilities, which makes the system easier
to manage, extend, and optimize. By dividing the work into distinct stages, developers can focus
on the particular tasks associated with each phase without worrying about others.

●​ Front-End: Focuses on parsing and understanding the source code.


●​ Middle-End: Optimizes the code in a way that's independent of the target machine.
●​ Back-End: Transforms the optimized intermediate code into machine-specific
instructions.

2. Modularity

Dividing the runtime or compiler into these components allows for greater modularity. Each
phase can be developed, improved, or replaced independently. For example:

●​ The front-end can support multiple languages without needing to modify the back-end.
●​ The back-end can support multiple architectures without changing the front-end.

This design allows engineers to focus on specific areas of the system in isolation and build a
reusable infrastructure.

3. Cross-Platform Flexibility

This architecture is particularly useful when supporting multiple platforms. For example, LLVM
can compile code for x86, ARM, RISC-V, and other architectures, but the front-end remains the
same regardless of the final machine code target. This flexibility allows developers to build
cross-platform tools easily by simply plugging in different back-ends.

Similarly, in the Java Virtual Machine (JVM) ecosystem, bytecode generated by the front-end
can run on any platform that has a JVM (e.g., Windows, Linux, macOS). The JVM acts as a
platform-independent back-end, abstracting the machine-specific details away from the
language.

4. Efficiency and Optimizations at Different Levels

●​ Front-End Efficiency: Converting high-level source code to an intermediate


representation that can be processed more easily.

12
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Middle-End Efficiency: Language-independent optimizations that make the IR more


efficient.
●​ Back-End Efficiency: Generating efficient machine code for the specific target platform,
applying hardware-specific optimizations.

This layered approach allows for optimizations at different stages:

●​ The middle-end focuses on code improvements that are universally beneficial across all
platforms.
●​ The back-end focuses on platform-specific improvements, ensuring that the final code
runs optimally on the target hardware.

5. Reusability and Extensibility

This structure makes compilers and runtime environments extensible and reusable. For
instance:

●​ If a new programming language is developed, engineers only need to write a new


front-end to convert it to IR, leveraging the existing middle-end optimizations and
back-end code generation.
●​ If new hardware architecture comes out, only a new back-end is needed, while existing
optimizations and front-end processing remain the same.

This approach drastically reduces development time and complexity, as adding support for new
languages or architectures becomes more modular.

Summary

The front-end, middle-end, and back-end architecture of runtime environments and compilers is
adopted because it allows for:

●​ Separation of concerns: Each phase handles a specific task (parsing, optimization,


code generation).
●​ Modularity: Different components can be developed, optimized, or replaced
independently.
●​ Language and platform independence: Multiple languages can target the same
runtime, and multiple hardware platforms can be supported by the same infrastructure.
●​ Reusability: It’s easier to extend support for new languages and architectures.
●​ Optimization flexibility: Optimizations can be applied universally in the middle-end or
platform-specific in the back-end.

This structure makes modern runtimes and compilers more flexible, efficient, and maintainable.

13
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Do you have more examples of this arrangement of front-end, middle-end


and back-end?
There are many such examples indeed! Let's map out programming languages that target LLVM
IR (Intermediate Representation) and some of the computer architectures LLVM compiles to.

Programming Languages Targeting LLVM

●​ C
●​ C++
●​ Rust
●​ Swift
●​ Objective-C
●​ Fortran
●​ Julia
●​ Haskell (via GHC backend)
●​ Kotlin/Native
●​ Zig
●​ OCaml
●​ Nim

LLVM (Intermediate Representation) Compilation Targets

●​ x86 (Intel, AMD)


●​ x86-64
●​ ARM
●​ AArch64 (64-bit ARM)
●​ MIPS
●​ RISC-V
●​ PowerPC
●​ SPARC
●​ WebAssembly

So, the figure could look like:

14
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

And indeed, there are several other similar ecosystems where multiple languages target a
common intermediate layer, which is then executed on different platforms or runtimes. Here are
two more prominent examples:

JavaScript as an Intermediate Target

JavaScript is often used as a compilation target for other languages to enable web
development. Many languages compile to JavaScript, which can then run on web browsers or
server environments like Node.js.

Languages

●​ TypeScript
●​ Dart
●​ Elm
●​ CoffeeScript
●​ ReasonML/ReScript
●​ Scala.js
●​ ClojureScript
●​ Haxe
●​ Python (via Brython or Transcrypt)
●​ Ruby (via Opal)

15
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Intermediate Representation

●​ JavaScript

Targets

●​ Browsers (Chrome, Firefox, Safari, Edge)


●​ Node.js
●​ Deno

This can be represented by the figure below:

.NET Common Language Runtime (CLR) Ecosystem

The .NET Common Language Runtime (CLR) allows various languages to be compiled into CIL
(Common Intermediate Language), which is then executed by the .NET runtime.

Languages

●​ C#
●​ F#
●​ Visual Basic .NET
●​ C++/CLI
●​ IronPython

16
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ IronRuby

Intermediate Representation

●​ CIL (Common Intermediate Language)

Targets

●​ Windows
●​ Linux
●​ macOS (via .NET Core)
●​ Azure (Cloud)

This would map out as:

Why to learn compilers?


Learning compilers can provide a deep understanding of programming languages and computer
systems. It helps you grasp the underlying principles of how code is translated into
machine-executable instructions. This knowledge can be invaluable because:

1.​ Understanding compiler optimizations can help you write code that performs better.
2.​ Compilers are the foundation of programming languages, and learning about them can
empower you to create your own.
3.​ A strong understanding of compilers can aid in debugging complex software issues.
4.​ Compilers offer a unique lens through which to view the field, providing a deeper
appreciation for the intricacies of programming and computer systems.

17
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5.​ And besides, jobs are good!

Which jobs are there?


Compiler experts are highly sought after in various industries. Here are some potential job roles:

●​ Compiler Developer: Directly involved in designing, implementing, and maintaining


compilers for programming languages.
●​ AI Compiler Engineer: Specialists who focus on optimizing deep learning models for
hardware acceleration.
●​ Software Engineer: Compiler knowledge can be beneficial for optimizing software
performance and understanding language-specific nuances.
●​ Systems Programmer: Working on operating systems or low-level systems often
requires a deep understanding of compilers and assembly language.
●​ Security Analyst: Compiler expertise can be valuable for analyzing malware and
understanding how it exploits vulnerabilities in compilers or programming languages.
●​ Research Scientist: Compiler research is an active field, and experts can contribute to
advancements in compiler technology.
●​ Educator: Compiler experts can teach courses in computer science, programming
languages, and compiler design.

The demand for compiler experts is particularly high in industries that rely on high-performance
computing, such as:

●​ Gaming: Optimizing game engines for various platforms.


●​ Scientific Computing: Developing efficient tools for simulations and data analysis.
●​ High-Performance Computing: Creating compilers for specialized hardware
architectures.
●​ Embedded Systems: Developing compilers for resource-constrained devices.

What is an AI Compiler Engineer?


That's pretty much a compiler engineer! But they work on a specific kind of compiler: one that
processes machine-learning models. Their primary goal is to bridge the gap between the
high-level abstractions of deep learning frameworks and the low-level hardware capabilities of
accelerators like GPUs, TPUs, and specialized AI chips.

Key responsibilities of an AI Compiler Engineer include:

●​ Model Optimization: Developing techniques to reduce the computational complexity and


memory footprint of deep learning models without sacrificing accuracy.
●​ Hardware Adaptation: Customizing compiler tools and techniques to target specific
hardware architectures, ensuring efficient utilization of resources.
●​ Performance Profiling: Analyzing the performance of deep learning models on different
hardware platforms to identify bottlenecks and optimization opportunities.

18
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Algorithm Development: Creating new algorithms and techniques for optimizing deep
learning computations.
●​ Collaboration: Working closely with researchers, data scientists, and hardware engineers
to develop and deploy efficient AI systems.

Why are AI Compiler Engineers in high demand?


●​ Rapid Growth of AI: The increasing adoption of AI in various industries has led to a
surge in demand for efficient and scalable deep learning solutions.
●​ Hardware Acceleration: The development of specialized hardware accelerators like
GPUs and TPUs has created new opportunities for optimizing deep learning models.
●​ Complex Models: Modern deep learning models are becoming increasingly complex,
making it challenging to achieve optimal performance on different hardware platforms.

Do you know anyone who works with compiler development here in Brazil?
There are lots of alumni from UFMG's Compilers Lab who work with compiler development
nowadays. Take a look into Slide 34+ in this presentation. We can talk about a few success
stories in the classroom!

Is it possible to work with compiler development in Belo Horizonte?


Absolutely! There are a few companies which hire compiler development teams. For instance,
Cadence Design Systems had two compiler teams with open positions in 2024:
●​ Jasper: for the compilation of RTL models into Verilog specifications.
●​ XNNC: a compiler for deep-learning models that run on embedded devices.

Which research is done in compilers?


There is a lot of research that's done in the field of compilers, for instance:
●​ Developing techniques to generate more efficient code.
●​ Creating new programming languages or extending existing ones.
●​ Building tools and frameworks for compiler development.
●​ Ensuring the correctness of compilers.
●​ Tailoring compilers for specific domains like scientific computing or embedded systems.

Let's talk a bit about the importance of Compilers with a cool example of it.
Do you know Thompson's "Reflection on Trusting Trust"?
That's a famous paper in computer science where Ken Thompson discusses the concept of
trusting software, particularly in the context of compilers and the potential for malicious code to
be hidden in a way that is almost impossible to detect. The paper is often cited when discussing
the importance of trust and security in software systems. Thompson argues that we have no
choice but to trust the tools we use. However, this trust is based on the assumption that those
tools are not malicious. The implication is that absolute trust is impossible without rigorous
verification, which itself may be infeasible.

19
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Thompson's exploit starts with a clean compiler, which does not contain any backdoors. This
compiler is trusted by the user and produces backdoor-free binaries. The attack leverages two
backdoors to propagate malicious behavior, even when the source code appears clean. The
process can be described in four stages, as illustrated in the figure below:

1.​ Insertion: In Thompson's hack, a (malicious) user modifies the compiler source to
introduce two backdoors:
a.​ BD1: This backdoor recognizes when the compiler is compiling the login program
and inserts a malicious backdoor (e.g., a hardcoded password) into the compiled
binary.
b.​ BD2: This self-replicating backdoor recognizes when the compiler is compiling its
own source code and reinserts both BD1 and BD2 into the new compiler binary.
2.​ Bootstrapping: Once the modified compiler source is compiled with the original
compiler, it produces a binary version of the trojan compiler. This binary now contains
both backdoors, BD1 and BD2. At this point, the source code of the trojan compiler is no
longer necessary, and can be removed. Therefore, the source code of the compiler
appears clean, as BD2 exists only at the binary level.
3.​ Attack: When the binary trojan compiler compiles the login system, it inserts BD1 into
the resulting binary, creating a trojan login system.

20
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ Propagation: When the binary trojan compiler compiles a clean version of its own
source code, it reinserts BD1 and BD2 into the new binary compiler. This ensures the
trojan behavior propagates indefinitely, even if the source code is reviewed and found to
be clean.

The genius of this attack lies in its self-propagating nature: the malicious behavior becomes
embedded at the binary level and is undetectable through inspection of the source code alone.
Trusting the compiler binary is therefore essential, as the hack demonstrates how source code
auditing alone cannot guarantee the absence of malicious behavior.

How could someone compromise the Brazilian election using this hack?
Let's walk through a step-by-step example of how a hacker with control over a compiler could
compromise a voting machine in an election, based on the principles from Thompson's paper.
1.​ Initial Setup
a.​ Voting Machine: The machine runs a specialized Linux-based software that
counts votes and sends results to a central server.
b.​ Compiler: This software is written in a programming language like C or C++ and
compiled using a compiler like GCC.
2.​ Hacker Controls the Compiler
a.​ The hacker gains control of the compiler used to build the voting software. This
could happen if the compiler is developed by a trusted but compromised third
party or if the hacker manages to inject malicious code into an existing compiler.
3.​ Inserting the Backdoor into the Compiler
a.​ The hacker modifies the compiler to recognize when it is compiling the voting
software. When this occurs, the compiler automatically inserts a backdoor into
the compiled version of the software, without altering the source code that
developers see.
b.​ Example: The backdoor might be designed to alter vote counts, either by flipping
a small percentage of votes from one candidate to another or by manipulating the
totals in a more subtle way.
4.​ Self-Replicating Compiler Hack
a.​ To ensure that this malicious behavior persists, the hacker also modifies the
compiler to recognize when it is compiling its own source code. When
recompiling itself, the compiler reinserts the backdoor code, ensuring that even if
someone tries to build a new, "clean" compiler from source, the backdoor
remains.
5.​ Compiling the Voting Software
a.​ The voting software is compiled using this compromised compiler. The
developers see nothing suspicious, as the source code appears normal.
However, the compiled executable contains the malicious backdoor.
6.​ Deploying the Voting Software
a.​ The compromised voting software is deployed on voting machines. When people
cast their votes, the software functions as expected on the surface but secretly

21
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

manipulates the results according to the instructions inserted by the


compromised compiler.
7.​ Results Transmission
a.​ Once the voting is complete, the manipulated results are sent to the central
server. The central server, assuming the data from the voting machines is
accurate, records these falsified results without any indication that something is
wrong.
8.​ Detection Difficulties
a.​ Even if a forensic investigation were to take place, examining the source code of
the voting software might not reveal anything unusual. The malicious code was
inserted during compilation and is not present in the source.
b.​ If someone were to recompile the voting software using what they believe to be a
clean version of the compiler, the malicious code would reappear, making it
extremely difficult to detect or eliminate the hack.

And how could we prevent Thompson's Hack?


Well, there are many ways. One simple approach is to perform random audits of voting
machines after the election to ensure the results match the votes cast. This can be done on a
sample of machines rather than all machines to save costs while still deterring tampering.
Ideally, we should use a voting system that produces a paper record for each vote, allowing for
manual recounts if necessary.

What will students learn in this course?


In this course on Compiler Construction, students will learn key concepts and techniques used
in the design and implementation of compilers. Topics covered include:

1.​ Lexical and Syntactic Analysis: Understanding how to break down and parse
arithmetic expressions, including both top-down and bottom-up parsing techniques.
2.​ Type Systems: Verification and inference of types in programs, including working with
anonymous functions, closures, and recursive functions.
3.​ Code Generation: Techniques for generating and optimizing machine code, including
control structures and function implementation.
4.​ Memory Management: Understanding memory allocation, garbage collection, and heap
management.
5.​ Program Analysis and Optimization: Control flow graphs, liveness analysis, register
allocation, and peephole optimizations.

The course also includes practical work with 15 virtual programming labs, worth 4.5 points each,
and two major exams. The structure of the course is mostly based on Crafting Interpreters,
which is a freely available textbook on the art of designing and implementing interpreters and
compilers.

22
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

And how is the project assignment?


The homework for the Compiler Construction course is a project where students will implement
two execution methods for a subset of the Standard ML of New Jersey programming language.
The project involves writing two different execution engines:

1.​ Tree-based Interpreter: This method will handle closures and recursive functions.
2.​ Code Generator: This generates RISC-V instructions and manages anonymous
functions, but not recursive ones.

Students will work in Python, with 15 deliverables that progressively build the compiler. The
deliverables are interdependent, but some can be completed independently, even if earlier steps
are unfinished. Each deliverable comes with partially completed code that students must finish.

Key Learning Goals:

●​ Understanding compiler theory, such as type checking, type inference, and


syntax/semantic analysis.
●​ Learning compiler engineering, including code generation, register allocation, and
handling recursive functions and closures.
●​ Applying these concepts through iterative development, where students must adapt to
changes (e.g., operator precedence).
●​ Gaining practical experience in debugging, optimization, and iterative design.

Students are encouraged to follow a test-driven development approach, using provided tests to
guide their implementation. They can submit their work multiple times through the Moodle
system, and partial credit is available if not all tests pass initially

How will this course be graded?


The course is graded out of 100 points, and students need a minimum of 60 points to pass.
There are two exams, each worth 20 points. The exams primarily focus on the homework and
can be thought of as a written "interview." Students will be asked to re-implement, on paper,
certain functions and methods from the homework. While there may be one theoretical question,
it will still be closely tied to the homework.

The homework itself is worth 60 points, but includes an additional 7.5 bonus points, meaning
students can earn up to 67.5 points for the homework, though only 60 will count toward the final
grade.

Additionally, students have opportunities to earn extra credit. Up to 5 points can be earned by
answering questions on https://www.reddit.com/r/Compilers/, with each answer worth 1 point.
Notice that students can earn at most five extra points in the course! So, even if they get 12.5
extra points, that will be capped to 5.0.

23
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's close with a bit of history. Do you know some of the history of early
compilers?
I do! The history of the first compilers is a cornerstone of computer science, marking the
transition from low-level, machine-oriented programming to more human-friendly, high-level
languages. Here's a summary of key milestones:

1. Early Days of Programming

In the 1940s and early 1950s, programming was done directly in machine code or assembly
language. These methods were:

●​ Highly specific to individual hardware.


●​ Difficult and error-prone.

The need for an automated way to translate higher-level abstractions into machine-readable
instructions gave rise to the concept of the compiler.

2. Grace Hopper's Contributions (1952)

Grace Hopper is often credited with developing one of the first compiler-like tools. Her work
included:

●​ The A-0 System, which she designed for the UNIVAC I. This system converted symbolic
mathematical instructions into machine code.
●​ Her groundbreaking ideas laid the foundation for the development of programming
languages and compilers.

3. The FORTRAN Compiler (1957)

The FORTRAN (FORmula TRANslation) compiler, developed by John Backus and his team at
IBM, was the first widely successful compiler. Key aspects include:

●​ Designed for numerical and scientific computing.


●​ Took two years to develop (1954–1957).
●​ Translated high-level mathematical notation into efficient machine code.
●​ Marked a major leap in performance and usability, reducing coding time significantly.

4. LISP and Recursive Compilers (Late 1950s)

The development of LISP, a language for artificial intelligence research, brought recursive
programming and symbolic computation into the spotlight. John McCarthy’s team created a
compiler capable of handling the complexities of LISP’s syntax and semantics.

5. COBOL (1959–1960)

24
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The COBOL (COmmon Business-Oriented Language) compiler emerged as the first attempt to
make programming languages universally understandable. It emphasized:

●​ Portability across hardware platforms.


●​ A syntax resembling natural English for business applications.

Grace Hopper played a significant role in its development.

6. Early Compiler Challenges

Building the first compilers involved overcoming technical challenges:

●​ Limited hardware resources: Compilers had to run on the same primitive systems they
were targeting.
●​ Complexity: Translating high-level abstractions into optimized machine code was a novel
and technically demanding task.
●​ Resistance: Many programmers initially resisted high-level languages, fearing a loss of
efficiency.

Legacy and Impact

The development of early compilers was transformative:

●​ It democratized computing by enabling more people to program without deep knowledge


of hardware.
●​ It laid the groundwork for modern computer science, making high-level programming
languages like C, Pascal, and Python possible.

Something is bugging me: we need a compiler to compile a program, but a


compiler is a program. So, what about the first compiler?
This is a great and thought-provoking question! The apparent paradox you’re pointing out is
often called the bootstrap problem. How could the first compiler be created if we need a
compiler to compile it? The answer lies in a process known as bootstrapping. Here’s a
breakdown of how it works:

The Bootstrapping Process

Bootstrapping refers to the process of building a complex system (like a compiler) from a
simpler one. For the first compiler, here's how it was likely possible:

1.​ Manual Translation: The very first compilers were often written in assembly language
or even machine code, which could be directly executed by the computer's hardware.
This is the simpler starting point. The development of the compiler began by manually

25
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

writing the initial code in a low-level language.​

2.​ Interpreters and Simple Compilers: Often, a simple interpreter or hand-written


assembly was used to initially "interpret" or execute a few high-level instructions. Once
a simple version of the compiler was in place, this first compiler could then be used to
compile more sophisticated versions of itself.​

3.​ Self-Compilation: In some cases, the early stages of the compiler involved a
self-compiling process. The very first version of the compiler might only handle a
limited subset of the language. As the compiler was improved, more complex features
were added. Eventually, this early version could compile itself (i.e., compile a more
advanced version of the compiler).​

4.​ Two-Stage Process: In practice, the compiler development process often starts with a
"bootstrapping" phase where:
○​ A very simple "primitive" compiler is manually written in an existing low-level
language like assembly.
○​ This simple compiler can compile a more advanced version of the compiler,
which in turn can compile even more sophisticated versions.

Example: The FORTRAN Compiler

When John Backus and his team developed the first FORTRAN compiler in the 1950s, the early
stages of the process were likely done by:

●​ Writing initial pieces of the compiler by hand in assembly language (low-level code).
●​ Using that primitive assembler or interpreter to slowly build up the full FORTRAN
compiler.

Once the first working compiler was created, it could be used to compile additional versions,
eventually leading to more advanced versions of the FORTRAN compiler capable of compiling
complex programs.

In Summary:

The first compiler was possible through a combination of:

●​ Writing the initial code in a very low-level language (like assembly).


●​ Using this low-level code to build more sophisticated tools.
●​ Iterative improvements (bootstrapping) until a fully functional compiler was in place.

This bootstrap process is essential to the development of complex systems, and it illustrates
how simple, manually-crafted programs can evolve into fully automated, self-compiling tools.

26
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This John Backus was very important, right? He bridged the gap between
theory and practical application in compilers, I suppose!
Yes: he was tremendous! John Warner Backus (1924–2007) was an American computer
scientist who revolutionized programming by creating the first high-level programming language,
FORTRAN, and advancing the field of compiler design. After initially studying chemistry and
dropping out of medical school, Backus discovered his passion for mathematics and
programming at IBM in the early 1950s. He led the team that developed FORTRAN, which
drastically simplified programming and made computers more accessible to scientists and
engineers. Later in his career, Backus shifted focus to programming language theory, proposing
innovative ideas like functional programming and contributing to the formal description of
programming languages with the Backus-Naur Form (BNF). For his groundbreaking
contributions, Backus received numerous accolades, including the Turing Award in 1977.

Contributions to Computer Science

1.​ FORTRAN (1957):


○​ Led the development of the first widely used high-level programming language.
○​ FORTRAN was the first language to use a compiler that could produce efficient
machine code, demonstrating the viability of high-level languages for real-world
applications.
2.​ Compiler Design:
○​ Directed the creation of the first FORTRAN compiler, one of the most advanced
and complex software projects of its era.

27
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Introduced many foundational ideas in optimization and machine-independent


programming.
3.​ Backus-Naur Form (BNF):
○​ Developed (with Peter Naur) a formal notation to describe the syntax of
programming languages.
○​ BNF became a standard in compiler design and language development,
influencing nearly every modern programming language.
4.​ Functional Programming:
○​ Authored the seminal paper "Can Programming Be Liberated from the von
Neumann Style?" (1977), which introduced the concept of functional
programming as an alternative to imperative programming.
○​ Influenced languages like Haskell and Scala by promoting ideas such as
immutability and higher-order functions.
5.​ ALGOL Development:
○​ Played a key role in the development of the ALGOL family of languages, which
introduced structured programming concepts.
6.​ Formal Language Theory:
○​ Contributed to the formalization of programming language design, enabling the
systematic development of new languages and compilers.
7.​ Turing Award (1977):
○​ Recognized for his profound impact on programming language design and
compiler development.

Backus’s work fundamentally changed how humans interact with computers, making
programming more efficient, scalable, and accessible. His contributions remain central to
computer science today.

28
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Lexical Analysis
Code: https://github.com/pronesto/DCC053_Public/tree/main/1_ArithLexing

Imagine that you want to implement a calculator in reverse polish notation


that recognizes +, -, *, / and numbers. What would be the steps?
Implementing a Reverse Polish Notation (RPN) calculator would follow a structured approach.
The process involves two main steps: creating a scanner (also known as a lexer) to recognize
the tokens, and building the actual logic to process those tokens. Here's a breakdown of the
steps:

1. Implementing the Scanner (Lexer)

The scanner's job is to take an input string (the RPN expression) and break it down into tokens,
such as numbers and operators (+, -, *, /). The tokens will later be processed to compute the
result.

Steps for the Scanner:

●​ Input: A string representing the RPN expression (e.g., "3 4 + 2 * 7 /").


●​ Output: A list of tokens, where each token is either a number or an operator.
●​ Actions:
○​ Iterate over the input string.
○​ For each character:
■​ If it’s a digit or part of a number (e.g., 3.5), recognize it as a number
token.
■​ If it’s a symbol like +, -, *, or /, recognize it as an operator token.
■​ Skip whitespace.
○​ Store the tokens in a list for further processing.

2. Processing the Tokens (Evaluation)

Once the scanner produces the list of tokens, the next step is to evaluate the RPN expression
using a stack-based approach. RPN means that operators come after their operands, so we
can evaluate expressions by pushing numbers onto the stack and applying operators when they
are encountered.

Steps for the Evaluator:

●​ Input: A list of tokens (produced by the scanner).


●​ Output: The final computed result.
●​ Actions:
○​ Initialize an empty stack.

29
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Iterate through the list of tokens:


■​ If the token is a number, push it onto the stack.
■​ If the token is an operator, pop the top two numbers from the stack, apply
the operator, and push the result back onto the stack.
○​ At the end of the iteration, the stack should contain exactly one value, which is
the result of the expression.

Could you implement the scanner?

import enum

class TokenType(enum.Enum):
EOF = -1 # End of file
NLN = 0 # New line
WSP = 1 # White Space
NUM = 2 # Number (integers)
ADD = 202 # The token '+'
SUB = 203 # The token '-'
MUL = 204 # The token '*'
DIV = 205 # The token '/'

class Token:
operators = {TokenType.ADD, TokenType.SUB, TokenType.MUL, TokenType.DIV}

def __init__(self, tokenText, tokenKind):


self.text = tokenText
self.kind = tokenKind

class Lexer:
def __init__(self, input_string):
self.input_string = input_string
self.position = 0
self.length = len(input_string)

def next_valid_token(self):
token = self.getToken()
if token.kind == TokenType.WSP or token.kind == TokenType.NLN:
token = self.next_valid_token()
return token

def tokens(self):
token = self.getToken()
while token.kind != TokenType.EOF:
if token.kind != TokenType.WSP and token.kind != TokenType.NLN:
yield token
token = self.getToken()

def getToken(self):
if self.position >= self.length:

30
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return Token("", TokenType.EOF)

current_char = self.input_string[self.position]
self.position += 1

if current_char.isdigit():
# Handle numbers (NUM)
number_text = current_char
while (
self.position < self.length
and self.input_string[self.position].isdigit()
):
number_text += self.input_string[self.position]
self.position += 1
return Token(number_text, TokenType.NUM)

elif current_char == "+":


return Token(current_char, TokenType.ADD)

elif current_char == "-":


return Token(current_char, TokenType.SUB)

elif current_char == "*":


return Token(current_char, TokenType.MUL)

elif current_char == "/":


return Token(current_char, TokenType.DIV)

elif current_char == " ":


return Token(current_char, TokenType.WSP)

elif current_char == "\n":


return Token(current_char, TokenType.NLN)

else:
raise ValueError(f"Unexpected character: {current_char}")

The lexer is responsible for breaking an input string into tokens, which represent numbers and
operators like +, -, *, and /. Here’s an explanation of each part of the code to help a computer
science student understand how it works:

1. Enum Class: TokenType

The TokenType class uses Python’s enum.Enum to define various token types used in the
lexer. Each token type represents a category of symbols or data in the input:

●​ EOF: End of file/input, signaling that the input string has been completely processed.
●​ NLN: New line (\n), which indicates the end of a line.
●​ WSP: Whitespace ( ), representing spaces in the input.
●​ NUM: Numbers (integer values).

31
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ ADD, SUB, MUL, DIV: The four arithmetic operators (+, -, *, /).

The numbers associated with each type are mostly symbolic. The exact values don’t matter, but
they help differentiate between the types.

2. Token Class

The Token class represents a single token in the input string. A token has two main attributes:

●​ text: The string representation of the token (e.g., "5" for a number or "+" for an
operator).
●​ kind: The token's type, which comes from the TokenType enum (e.g.,
TokenType.NUM for a number, TokenType.ADD for the + operator).

This class is used to store each token's text and its type as the lexer processes the input string.

3. Lexer Class

The Lexer class is responsible for reading the input string and breaking it down into tokens.

Constructor (__init__)

●​ input_string: The input to be tokenized (e.g., an RPN expression like "3 4 +").
●​ position: Tracks the current position in the input string, initialized to 0.
●​ length: The total length of the input string.

This constructor sets up the necessary variables to iterate over the input string.

next_valid_token()

This method retrieves the next non-whitespace, non-newline token:

●​ It repeatedly calls getToken() to fetch tokens.


●​ If the token is whitespace (WSP) or a newline (NLN), it skips it and fetches the next valid
token.

tokens()

This method is a generator that returns all tokens in the input string, excluding whitespace and
newlines. It continuously calls getToken() until it reaches the end of the input (EOF).

getToken()

32
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This method is the core of the lexer. It reads one character at a time from the input string and
returns the corresponding token. Here's how it works:

●​ End of Input Check: If the current position has reached the end of the input, return an
EOF token.
●​ Number Handling: If the current character is a digit, it tries to read an entire number:
○​ It starts with the current digit and keeps reading while the next characters are
also digits.
○​ Once it reads the whole number, it creates and returns a NUM token.
●​ Operator Handling: If the current character is +, -, *, or /, it returns the corresponding
operator token.
●​ Whitespace and Newline: If the current character is a space (" ") or newline ("\n"), it
returns the corresponding token (WSP or NLN).
●​ Error Handling: If an unrecognized character is encountered, it raises an error.

Could you walk through some example input?


Let’s go through an example where the input string is "3 4 + 5 *":

1.​ First Token: The character '3' is a digit, so the lexer reads it and returns a Token with
kind=TokenType.NUM and text="3".
2.​ Second Token: The lexer skips the space character.
3.​ Third Token: The character '4' is a digit, so the lexer reads it and returns a Token with
kind=TokenType.NUM and text="4".
4.​ Fourth Token: The lexer skips the space and then finds the '+' character. It returns a
Token with kind=TokenType.ADD and text="+".
5.​ Fifth Token: The lexer reads the '5' and returns a Token with kind=TokenType.NUM
and text="5".
6.​ Sixth Token: The lexer reads the '*' and returns a Token with
kind=TokenType.MUL and text="*".

At the end of the input string, the lexer returns an EOF token to signal that the input has been
fully processed.

Could you name some key concepts related to the scanner we just wrote?
Tokenization: This is the process of breaking down an input string into meaningful chunks
(tokens), each representing a number or operator.
State Management: The lexer uses position to keep track of where it is in the input string. It
increments position as it processes each character.
Skipping Whitespace: The lexer handles whitespace and newlines but skips over them to
focus on numbers and operators, the meaningful parts of an RPN expression.

33
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Error Handling: The lexer raises an error if it encounters any unexpected characters, ensuring
that the input conforms to what the RPN calculator expects.

Now, can you implement the actual calculator that uses the Token?
def compute_postfix(lexer):
stack = []

for token in lexer.tokens():


if token.kind == TokenType.NUM:
# Push numbers onto the stack
stack.append(int(token.text))

elif token.kind in Token.operators:


if len(stack) < 2:
raise ValueError("Insufficient values.")

b = stack.pop()
a = stack.pop()

if token.kind == TokenType.ADD:
result = a + b
elif token.kind == TokenType.SUB:
result = a - b
elif token.kind == TokenType.MUL:
result = a * b
elif token.kind == TokenType.DIV:
if b == 0:
raise ZeroDivisionError("Division by zero.")
result = a // b

# Push the result back onto the stack


stack.append(result)

else:
raise ValueError(f"Unexpected token type: {token.kind}")

# The final result should be the only value left in the stack
if len(stack) != 1:
raise ValueError("The user input has too many values.")

return stack[0]

This code implements the Reverse Polish Notation (RPN) calculator, which evaluates
arithmetic expressions in postfix notation (RPN). In postfix notation, operators follow their
operands, and the order of operations is implicitly determined by the placement of the operators.

Let’s break down how this function works for a computer science student:

Function Overview

The compute_postfix function evaluates an expression by:

34
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Reading tokens from the lexer.


2.​ Using a stack to store numbers and intermediate results.
3.​ Performing arithmetic operations when an operator is encountered.
4.​ Returning the final result from the stack.

Can you walk through an example?


Let's say we are evaluating the expression "3 4 + 5 *". The steps would be as follows:

1.​ Input String: "3 4 + 5 *"


2.​ Tokens: The lexer generates the tokens 3, 4, +, 5, and *.
3.​ Processing:
○​ Push 3 onto the stack: [3]
○​ Push 4 onto the stack: [3, 4]
○​ Encounter +: Pop 4 and 3, add them (3 + 4 = 7), push 7 onto the stack: [7]
○​ Push 5 onto the stack: [7, 5]
○​ Encounter *: Pop 5 and 7, multiply them (7 * 5 = 35), push 35 onto the stack:
[35]
4.​ Final Result: The stack contains [35], so the result is 35.

What are the key concepts one should keep about our RPN calculator?
1.​ Stack-based Evaluation: RPN naturally fits a stack-based approach since operands are
pushed onto the stack, and operators pop them off, compute the result, and push it back.
2.​ Token Processing: The lexer is responsible for tokenizing the input, and the calculator
processes each token in sequence.
3.​ Error Handling:
○​ The code checks for sufficient values in the stack before applying operators.
○​ It handles division by zero.
○​ It ensures that only one value remains in the stack at the end.

This structure makes the code modular, allowing easy extension for more operations or
features.

What are these "tokens" you talk so much about?


In the context of programming languages, compilers, and interpreters, tokens are the smallest
units of meaning that a program's source code can be broken down into during lexical analysis
(or tokenization). Each token represents a logically cohesive piece of information, such as a
keyword, operator, identifier, or literal.

How are Tokens represented?


A token typically consists of two parts:

35
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Type (Kind): Represents the category or classification of the token (e.g., keyword,
identifier, number, operator). In your Lexer.py example, this is represented by the
TokenType enum.
●​ Text (Value): The actual text or value of the token from the source code. For example, if
the lexer identifies a number, the text would be the numeric string itself (e.g., "42").

Can you give me examples of tokens?

Keywords: Reserved words in a programming language (e.g., if, while, return).

Identifiers: Names defined by the programmer (e.g., variable names like x, counter).

Literals: Fixed values such as numbers (e.g., 123), strings (e.g., "hello"), or characters (e.g.,
'a').

Operators: Symbols that perform operations (e.g., +, -, *, /).

Delimiters: Symbols that separate statements or group code (e.g., ;, {, }).

Why do we split the string into tokens? How does that help?
Tokens allow the lexer to simplify the input for the next phase of the compiler or interpreter,
which is parsing. By converting the raw code into a sequence of tokens, the parser can more
easily understand and process the structure of the code. They help detect syntax errors early in
the compilation or interpretation process.

How do we break down a string into tokens?


For that, we use finite state automata. In other words, we recognize tokens as instances of
regular expressions. There are tools that are totally built around this notion of regular
expressions. One example is AWK, which takes in commands that involve regular expressions,
and synthesizes automata that recognize these expressions within strings. Here are a few
examples of using regular expressions with AWK for text processing:

1. Match Lines Containing a Specific Word

Print all lines that contain the word "error":

awk '/error/ {print}' file.txt

In this example, the regular expression is the word "error", which can be recognized as a five
state automaton: 'e' → 'r'→ 'r'→ 'r'→ 'o' → 'r'.

2. Match Lines That Start with a Pattern

36
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Print lines that start with "INFO":

awk '/^INFO/ {print}' file.txt

The ^ symbol anchors the match to the start of the line.

3. Match Lines That End with a Pattern

Print lines that end with ".log":

awk '/\.log$/ {print}' file.txt

The $ symbol anchors the match to the end of the line.

4. Match Lines with a Specific Word Boundary

Print lines where "word" appears as a whole word:

awk '/\bword\b/ {print}' file.txt

The \b denotes a word boundary.

5. Match Lines with Digits

Print lines that contain at least one digit:

awk '/[0-9]/ {print}' file.txt

The [0-9] matches any digit.

6. Match Lines with Specific Number of Characters

Print lines with exactly 5 characters using a regular expression:

awk '/^.{5}$/ {print}' file.txt

Here, . matches any character, and {5} specifies the exact number of characters.

7. Extract Email Addresses

Extract and print email addresses from text:

awk '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/ {print}'


file.txt

37
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

These examples showcase how AWK combined with regular expressions becomes a powerful
tool for searching and processing text.

What is the relation between lexical analysis and regular expressions?


Lexical analysis is closely related to regular expressions because regular expressions provide a
powerful way to describe patterns in text, which is precisely what lexical analysis needs to do
when identifying tokens in source code. Lexical analyzers, or lexers, need to identify different
types of tokens in the source code, such as keywords, operators, identifiers, numbers, etc. Each
of these types of tokens can be described by a pattern, and regular expressions are a formal
way to describe such patterns. For example, an identifier in many programming languages
might be described by a pattern like [a-zA-Z_][a-zA-Z0-9_]*, which means a sequence starting
with a letter or underscore, followed by any number of letters, digits, or underscores. This
pattern can be expressed as a regular expression.

A lexer uses regular expressions to scan the input source code and match substrings that fit the
patterns defined for different token types. For instance, a regular expression like \d+ might be
used to recognize a sequence of digits as a numeric literal token (NUM).

Regular expressions can be efficiently compiled into finite automata, which are used by lexers to
quickly scan and recognize tokens in the input string. Tools like lex or flex generate lexers by
converting regular expressions into efficient state machines that can process input streams in
linear time.

So, in summary, lexical analysis and regular expressions are deeply connected because regular
expressions provide the formal framework to describe the patterns of the different tokens a lexer
needs to identify. Regular expressions are used to define the patterns of tokens, and the lexer
matches these patterns against the source code to produce a stream of tokens for further
processing by the parser.

So, can we say that the set of all the tokens in a programming language
form a regular language?
Yeap! Tokens are strings that belong to some regular language, especially in the context of
lexical analysis. A regular language is a set of strings that can be recognized by a finite
automaton. Regular languages are defined by regular expressions, which describe patterns of
strings. Each type of token that a lexer needs to recognize (like identifiers, numbers, keywords,
operators) can be described by a regular expression. Since regular expressions define regular
languages, the set of all valid tokens for a given programming language can be seen as
belonging to a regular language. During lexical analysis, the lexer uses regular expressions to
identify tokens in the input string. Each time the lexer matches a portion of the input with a
regular expression, it identifies that portion as a token. This means that the strings recognized
as tokens are those that belong to the regular language described by the corresponding regular
expression. Lexers often use finite automata (which can recognize regular languages) to

38
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

efficiently match input strings against the patterns described by regular expressions. This
connection further supports the idea that tokens are part of a regular language.

Can you give an example of an automaton that recognizes tokens?


Yes. For instance, consider the automaton below, which recognizes a language of tokens that
can be:
●​ Natural numbers: sequences of digits.
●​ Rational numbers: digits followed by a comma followed by more digits
●​ The sum operator: either '+' or the keyword 'add'
●​ The subtraction operator: the symbol '_'

In short, in lexical analysis, a regular automaton (or finite automaton) is used to recognize
tokens in a stream of input. Here’s a quick breakdown of the process:

1.​ States and Transitions: The automaton consists of a set of states, including a starting
state and one or more final (accepting) states. The transitions between states are
determined by the input characters.
2.​ Token Recognition: As the automaton processes the input string character by
character, it moves from one state to another based on the defined transitions.
3.​ Final States: When the automaton reaches a final state after processing the input, it
indicates that the input string matches a specific token pattern. The particular final state
reached determines the type of token that should be returned.

39
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This method allows for efficient tokenization of input, as regular automata can be implemented
using deterministic or nondeterministic approaches. Each token will be a path from the starting
state of the automaton to its final state.

If Tokens can be recognized by automata, then it should be possible to


automate the creation of lexers?
Yes! A table-driven implementation of a lexer can be structured using a transition table that
maps each state and input character to the next state and the corresponding action (token type
to return). This approach can be used by an automatic lexer generator. Here’s how you might
implement a table-driven lexer, in case you want to see how it looks like:

Explanation:

40
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Transition Table: The transition_table dictionary maps states (like 'start' and
'number') to the actions associated with input characters. Each action can be a
method that handles the logic for that input character in that state.
2.​ Dispatch Function: The dispatch method handles the logic for checking the current
character and determining which action to take based on the transition table.
3.​ State Methods: Each method (like state_number, state_add, etc.) returns the next
state and the token (if any) to be produced. For example, when a digit is recognized, it
transitions to the state_number.
4.​ Continuing States: The number state allows for continuation without returning a token
immediately, enabling the lexer to handle multi-digit numbers.

Is this table-driven approach the core principle behind lexer generators?


Yes, in a way. The basic principle of lexer generators (or lexical analyzers) is to automate the
process of creating a lexer based on a high-level description of the tokens you want to
recognize. Here's how it typically works:

Key Principles of Lexer Generators:

1.​ Token Definitions: You define the tokens you want to recognize using regular
expressions. Each token can have a name and a corresponding pattern. For example:

NUM: [0-9]+
ADD: \+
SUB: -
WSP: \s+

2.​ Transition Table Creation: The lexer generator takes these definitions and generates a
transition table (or state machine) that describes how to transition between states based
on input characters. This is often done using finite automata.
3.​ Code Generation: The generator produces code that implements the lexer. This code
will include functions for handling each token type and logic for managing state
transitions.
4.​ Error Handling: Lexer generators often include error handling mechanisms to deal with
unexpected input characters or invalid token definitions.
5.​ Integration: The generated lexer code can be integrated into a parser, allowing for
seamless tokenization and syntactic analysis.

Popular Lexer Generators:

●​ Flex: A widely used lexer generator for C/C++ that takes regular expressions as input
and generates a C source file for the lexer.
●​ ANTLR: A powerful tool that can generate lexers and parsers in multiple languages,
allowing for complex grammars and language constructs.

41
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ JFlex: A lexer generator for Java that is similar to Flex but tailored for Java applications.

Advantages of Lexer Generators:

●​ Efficiency: Automatically generating lexers can save time and reduce errors in
implementation.
●​ Maintainability: Changes to token definitions can be easily made in the high-level
description, and regenerating the lexer will update the code accordingly.
●​ Flexibility: Lexer generators can handle complex tokenization scenarios and provide
built-in features for common tasks.

Overall, lexer generators simplify the process of building lexers, making it easier for developers
to focus on the syntax and semantics of the language rather than the intricacies of state
management and token recognition.

But then, in the end, I can't have tokens formed by balanced parentheses?
Yes, that's correct. Regular languages, which can be described by regular expressions or finite
automata, are not capable of recognizing patterns with nested or balanced structures, such as
balanced parentheses. This is because regular languages can only handle patterns with a fixed,
linear structure and cannot track nested or recursive patterns.

To recognize balanced parentheses or similar nested structures, you'd need a more powerful
computational model, like a context-free grammar, which is used to describe context-free
languages. Context-free grammars can handle nested patterns and are typically used in parsing
tasks where balanced parentheses are common, such as in programming language syntax.

If context-free grammars are strictly more powerful than regular grammars,


why don't we use them to recognize tokens?
Regular automata and regular grammars are equivalent in terms of the languages they can
define, which are known as regular languages. However, when it comes to lexical analysis and
token recognition, regular grammars (and their associated automata) are often preferred for
several reasons:

Reasons for Using Regular Languages in Lexical Analysis:

1.​ Simplicity: Regular languages are simpler and easier to implement compared to
context-free languages. Regular expressions can define patterns with straightforward
rules, making them efficient to parse.
2.​ Efficiency: Lexical analyzers (lexers) built using regular expressions and finite automata
can operate in linear time, O(n), where n is the length of the input string. This efficiency
is crucial for performance in compilers, as lexers typically process large amounts of text.

42
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Determinism: Many regular languages can be implemented using deterministic finite
automata (DFAs), which allow for faster processing since they require only a single state
transition per input character. Context-free grammars, on the other hand, often involve
more complex parsing techniques, such as LL or LR parsing, which can require
backtracking or multiple passes over the input.
4.​ Separation of Concerns: Using regular languages for lexical analysis allows for a clear
separation between the lexical and syntactic analysis phases of compilation. Lexers
handle the tokenization (breaking input into tokens) while parsers handle the structure
and rules of the language. This separation simplifies the design of compilers.
5.​ Limited Scope of Tokens: The types of tokens that need to be recognized (like
keywords, identifiers, literals, operators, etc.) are generally simple and can be expressed
as regular languages. Context-free grammars are typically used for more complex
structures, like expressions or statements that may involve nested constructs.

Context-Free Grammars (CFGs):

While CFGs can describe more complex languages, their use in token recognition is often
unnecessary and can complicate the implementation. They are more suitable for parsing the
syntax of the language rather than for token recognition. In a typical compiler architecture:

●​ Lexical Analysis (Lexer): Uses regular expressions to identify and categorize tokens.
●​ Syntactic Analysis (Parser): Uses context-free grammars to analyze the grammatical
structure of the token sequences produced by the lexer.

In summary, regular languages provide the right balance of expressiveness, efficiency, and
simplicity for lexical analysis, while context-free grammars are used in later stages of the
compilation process where more complex language features are involved.

So, is it correct to say that the asymptotic complexity of a lexer built as a


table-driven automaton would be lower than as a nest of if-then-else
blocks?

Yes! Lexical analyzers (lexers) built as a nest of if statements could indeed have a time
complexity of O(n*T), where n is the length of the input string and T is the number of tokens (or
the number of token patterns to be tested). This is because, in the worst case, the lexer would
need to evaluate each character against all possible token patterns, leading to multiple checks
for each input character. Notice that the lexer we built manually in this class operates as a nest
of if-then-else blocks; hence, its complexity is sub-optimal.

On the other hand, a lexer built using regular expressions and finite automata operates in O(n),
since the finite automaton processes each character exactly once, following state transitions
without needing to check all token patterns individually. This makes the automaton-based lexer
more efficient, especially as the number of token patterns increases.

43
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Talking about grammars, how would you change your implementation of


compute_postfix to compute_prefix?

To compute expressions in prefix notation (where operators come before their operands), the
approach differs from the one used for postfix notation, primarily in how the expression is
processed.

Key Differences in Prefix Notation:

1.​ Operator First: In prefix notation, operators appear before their operands. For example,
the expression + 3 4 would be written as 3 4 + in postfix notation.
2.​ Right-to-Left Processing: Since the operator comes first, prefix expressions are
generally processed from right to left. This allows us to encounter the operands first
before applying the operator, which contrasts with postfix where we process left-to-right.

Here's the code that does the job:

def compute_prefix(lexer):
"""
Evaluates an arithmetic expression in Polish Notation (Prefix Notation).

Examples:
>>> lexer = Lexer("+ 3 * 4 2")
>>> compute_prefix(lexer)
11

>>> lexer = Lexer("+ * 3 4 2")


>>> compute_prefix(lexer)
14
"""
token = lexer.next_valid_token()

if token.kind == TokenType.NUM:
# Base case: return the value if it's a number
return int(token.text)

elif token.kind in Token.operators:


# Recursive case: evaluate the operands
a = compute_prefix(lexer)
b = compute_prefix(lexer)

if token.kind == TokenType.ADD:
return a + b
elif token.kind == TokenType.SUB:
return a - b
elif token.kind == TokenType.MUL:
return a * b
elif token.kind == TokenType.DIV:
if b == 0:
raise ZeroDivisionError("Division by zero.")

44
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return a // b

else:
raise ValueError(f"Unexpected token type: {token.kind}")

How It Works in the Updated Code:

1.​ Stack-Based Approach:


○​ Just like in postfix notation, the solution uses a stack to store intermediate
values.
○​ As you iterate over the tokens (in reverse order for prefix notation), you push
numbers onto the stack.
2.​ Handling Operators:
○​ When the code encounters an operator, it pops the necessary operands from the
stack (two in the case of binary operators like +, -, *, and /).
○​ It then applies the operation and pushes the result back onto the stack.
3.​ Order of Evaluation:
○​ In postfix, operators act on the two most recent operands (popped last), while in
prefix, the operator precedes its operands, so you must ensure the operands are
already available on the stack by the time you encounter an operator.
○​ The stack structure is well-suited for both approaches, but for prefix, you'd
process the expression backward to ensure operands are pushed before their
operator is encountered.
4.​ Final Result:
○​ By the end of the process, the only value left in the stack should be the result of
the entire expression. If the stack contains more than one value, this indicates an
issue with the expression's structure (e.g., too many operands).

In summary, the main idea is that prefix notation requires processing from right to left to ensure
that operands are evaluated before their operators, but the stack-based approach remains
effective because it allows flexible order of operations based on the expression's format.

What if you had to add variables and keywords (add, sub, etc) to your
language? How would you do that?
There is a cheap trick to do it! We can add the keywords to a table. And then we use this table
to return the right token. For instance, let's imagine that we have general variables (symbols
formed of letters and digits, that start with letters) in addition to keywords like add and sub. So,
we can either write "+ 3 4" or "add 3 4". We need to create the new token VAR:

class TokenType(enum.Enum):
"""
This class defines the possible tokens that we consider.
"""

45
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

EOF = -1 # End of file


NLN = 0 # New line
WSP = 1 # White Space
NUM = 2 # Number (integers)
ADD = 202 # The token '+'
SUB = 203 # The token '-'
MUL = 204 # The token '*'
DIV = 205 # The token '/'
VAR = 206 # General variables

Then we can create a table with the keywords we have:

class Token:
...
@staticmethod
def key_word_token(text):
"""
This method returns the token associated with a given keyword in the
language, or None otherwise.

Example:
>>> Token.key_word_token('add').text
'+'

>>> Token.key_word_token('sub').text
'-'

>>> Token.key_word_token('x').text
'x'
"""
if text in tokens:
return tokens[text]
else:
return Token(text, TokenType.VAR)

# Initialize the table of tokens somewhere outside the Token class:


tokens = {}
tokens["add"] = Token("+", TokenType.ADD)
tokens["sub"] = Token("-", TokenType.SUB)
tokens["mul"] = Token("*", TokenType.MUL)
tokens["div"] = Token("/", TokenType.DIV)

Finally, we add a special case to the Lexer, to read the new tokens:

def getToken(self):
...
if current_char.isdigit():
# Handle numbers (NUM)

46
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

...
elif current_char.isalpha():
id_text = current_char
while (
self.position < self.length
and self.input_string[self.position].isalnum()
):
id_text += self.input_string[self.position]
self.position += 1
return Token.key_word_token(tokText)
elif current_char == "+":
...

Notice that we don't need to add special cases for the keywords add and sub in the lexer: that's
already taken care of in the table! With these changes, we can run tests like:

>>> lexer = Lexer("3 4 + 2 * 7 /")


>>> compute_postfix(lexer)
2

>>> lexer = Lexer("3 4 add 2 * 7 /")


>>> compute_postfix(lexer)
2

>>> lexer = Lexer("3 4 add 2 mul 7 div")


>>> compute_postfix(lexer)
2

>>> lexer = Lexer("4 3 sub")


>>> compute_postfix(lexer)
1

>>> lexer = Lexer("4 2 5 * add 1 3 2 * + div")


>>> compute_postfix(lexer)
2

Having a scanner really simplifies the implementation of these calculators.


Why are scanners so important in programming languages?
A scanner (or lexer) is a very important component in implementing programming languages! It
plays an essential role in the first phase of compilation or interpretation, where it transforms
raw source code into a series of meaningful tokens. These tokens represent the smallest units
of meaning, such as keywords, operators, identifiers, numbers, and punctuation.

Here's why a scanner is so important in programming languages:

1. Breaking Down the Source Code

47
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The source code is typically a long sequence of characters that can be difficult to
process directly. The scanner breaks this input into discrete tokens, making it easier for
later stages (like parsing) to work with structured data instead of raw text.
●​ Example: In int x = 5;, the scanner produces tokens like int (keyword), x
(identifier), = (operator), 5 (number), and ; (punctuation).

2. Removing Noise

●​ The scanner handles details like whitespace (spaces, newlines) and comments, which
are usually not relevant for execution but help with code readability.
●​ By filtering out these elements early, the scanner ensures that only meaningful parts of
the code are passed to later stages.

3. Efficiency and Simplicity

●​ Separating scanning from parsing simplifies the overall design of the compiler or
interpreter. Instead of handling raw characters during parsing (which can be complex),
the parser can focus on analyzing the structure of the program using tokens.
●​ This separation makes it easier to manage error detection and recovery during the
parsing stage.

4. Handling Lexical Errors

●​ The scanner is responsible for identifying lexical errors, such as illegal characters or
malformed numbers. This allows for early detection of issues and better error reporting
before deeper semantic checks are performed.

5. Foundation for Syntax and Semantics

●​ The tokens produced by the scanner are passed to the parser, which checks the
syntactic structure of the code (grammar). This is where higher-level constructs like
expressions, statements, and functions are formed.
●​ The correctness of the scanner is critical because improper tokenization can lead to
parsing errors and incorrect interpretation of the code's structure.

6. Supports Code Flexibility

●​ A well-designed scanner can handle different forms of tokens, enabling support for
various programming language features such as:
○​ Different number formats (integers, floating points, hexadecimals).
○​ String literals with escape sequences.
○​ Operators with different precedence levels.

7. Scalability for Complex Languages

48
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ As programming languages evolve and become more complex, the scanner can be
extended to recognize more advanced lexical patterns, like identifiers in languages with
different symbol rules or keywords with context-sensitive meanings.

In summary, the scanner is fundamental to bridging the gap between raw code and a
structured program. It is the first step toward understanding and executing a programming
language. Without a scanner, managing the complexities of modern programming languages
would be far more difficult and error-prone.

So, a compiler is like a pipeline that transforms programs into different


formats, and the scanner is the first part of this chain?
Yes, exactly! A compiler is essentially a pipeline that converts a program from one
representation to another at different stages. The goal is to transform high-level source code
(written by a programmer) into low-level machine code or some other executable form. Each
stage of the pipeline processes the program at a different level of abstraction, refining it step by
step. Here's a breakdown of this transformation process:

1. Source Code (Bytes/String)

●​ The starting point of the compiler pipeline is the raw source code, which is just a
sequence of characters or bytes. This is a high-level program written in a
programming language like C, Python, or Java.
●​ Example: int x = 5 + 3; is a sequence of characters in a C program.

2. Lexical Analysis (Tokens)

●​ The first transformation occurs in the lexical analysis (scanning) phase, where the raw
source code is converted into tokens. Tokens are the basic building blocks of the
language, like keywords (int), identifiers (x), operators (+), and literals (5, 3).
●​ This is where whitespace and comments are ignored, and the program is reduced to
meaningful elements.
●​ Example: The source code int x = 5 + 3; would be tokenized into something like
[INT, IDENTIFIER(x), EQUALS, NUMBER(5), PLUS, NUMBER(3),
SEMICOLON].

3. Syntactic Analysis (Abstract Syntax Tree - AST)

●​ Next is the syntactic analysis (parsing) phase, where the sequence of tokens is
analyzed according to the language's grammar to form an Abstract Syntax Tree (AST).
●​ The AST represents the structure of the program in a hierarchical way. It reflects how
different components (like expressions, statements, functions) are related.
●​ Example: The expression 5 + 3 would be represented as a tree where + is the root,
with 5 and 3 as its children.

49
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4. Semantic Analysis

●​ In the semantic analysis phase, the AST is checked for meaning beyond syntax. This
includes checking:
○​ Type correctness (e.g., ensuring you don't add a string to a number).
○​ Scope resolution (e.g., ensuring that variables are declared before being used).
●​ This stage might annotate the AST with additional information, such as data types.

5. Intermediate Representation (IR)

●​ The AST is often transformed into an Intermediate Representation (IR). IR is a


lower-level, but still machine-independent, representation of the program that is easier to
optimize and translate into machine code.
●​ Control flow graphs (CFGs) and Static Single Assignment (SSA) are common forms
of IR, where the program's control and data flow are explicitly represented.
●​ Example: The statement int x = 5 + 3; might be translated into a sequence of IR
instructions like:

t1 = 5
t2 = 3
t3 = t1 + t2
x = t3

6. Optimization

●​ The IR is then passed through an optimization phase where transformations are made
to improve the performance and efficiency of the program. Common optimizations
include:
○​ Dead code elimination (removing code that has no effect).
○​ Constant folding (precomputing constant expressions).
○​ Loop unrolling (optimizing loop execution).
●​ The goal is to produce more efficient machine code without altering the program's
semantics.

7. Code Generation (Target-Specific Representation)

●​ After optimization, the IR is converted into a target-specific representation, which can


be assembly code for a specific machine architecture (like x86, ARM, or RISC-V).
●​ The program is now expressed in terms of machine instructions, registers, memory
locations, and other hardware-level details.
●​ Example: The assignment x = 5 + 3 might turn into:

MOV R1, 5
MOV R2, 3

50
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

ADD R1, R1, R2


MOV [x], R1

8. Machine Code

●​ Finally, the assembly or low-level code is translated into machine code, which is a
binary representation that the CPU can directly execute. This is the final output of the
compiler.
●​ Example: The assembly instructions would be turned into a series of binary instructions
understood by the CPU, like 1101001010010000.

9. Linking (if applicable)

●​ In languages like C or C++, once the machine code is generated, linking combines the
object code with libraries and other modules to create an executable.

Summary of the Compiler Pipeline:

●​ Source Code → Tokens → AST → IR → Optimized IR → Assembly Code →


Machine Code.

Each stage in the pipeline is a transformation from one representation of the program to
another, and each stage gets progressively closer to machine-executable code. This design
makes compilers modular and easier to maintain, as different stages can be modified or
optimized independently.

Do all these representations have an interpretable meaning to the


compiler?
The actual meaning of programs start to become apparent to the compiler/interpreter once we
reach the level of the abstract syntax tree. Let's see:

1.​ Source Code: At this stage, it's indeed just a sequence of bytes for the machine. To the
computer, it's a series of bits that must be processed to derive any meaning. This
processing involves interpreting the syntax and structure defined by the language's
grammar.
2.​ Tokens: These are symbolic representations extracted from the source code by a lexer
or tokenizer. While tokens are meaningful to a human who understands the language, to
the machine, they're just symbols. The semantic relationships between these tokens
aren't apparent until further processing.
3.​ AST (Abstract Syntax Tree): This is where the semantic structure starts to become
explicit. The AST represents the hierarchical syntactic structure of the source code, and
it can be traversed or interpreted to understand the code's intended behavior. It provides
a concrete way to examine the structure and relationships within the code, making it
possible to apply further transformations or optimizations.

51
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In summary, the AST is indeed the stage where we transition from raw tokens and bytes to a
meaningful representation that can be interpreted by a machine. Your emphasis on the role of
parsing in extracting meaning aligns with this perspective.

So, is lexical analysis the entry point of every system that processes
programs?
The scanner, or lexer, is indeed the entry point in the process of transforming source code into
executable behavior. Here's a breakdown of its role:

1.​ Lexical Analysis: The scanner performs lexical analysis, converting the raw source
code into a stream of tokens. These tokens are the basic building blocks of the
language's syntax, such as keywords, operators, and identifiers.
2.​ Entry Door: As the first stage in the processing pipeline, the scanner essentially serves
as the gateway for the execution engine—whether it's an interpreter, compiler, or virtual
machine. It provides the subsequent stages with a structured representation of the code
that can be further analyzed and processed.
3.​ Interaction with Execution Engines:
○​ Interpreter: The interpreter uses the tokens to build an AST or directly interpret
the behavior of the program.
○​ Compiler: The compiler translates the tokens into intermediate representations
or directly into machine code.
○​ Virtual Machine: The VM uses the tokens to generate bytecode or other
intermediate forms that it can execute.

In essence, the scanner’s output is a critical input for all downstream components of the
execution engine, making it a fundamental part of the process.

I wish I could take a look into an actual lexer used in a real-world system.
Do you know any?
Sure! Let's take a look into the lexer in the One True AWK repository. This file is the
implementation of AWK's lexer. This file is responsible for lexical analysis, which is the process
of reading input and breaking it into tokens that the AWK interpreter can understand and
process.

Key Responsibilities of lex.c in AWK

1.​ Tokenization:
○​ The lexer reads the input stream and splits it into meaningful units (tokens), such
as keywords, operators, numbers, strings, and identifiers.
○​ Tokens are then passed to the parser for syntactic analysis.
2.​ Regular Expression Handling:

52
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Since AWK heavily relies on patterns and regular expressions, the lexer includes
logic to identify and handle regular expressions as distinct tokens.
3.​ Whitespace and Comments:
○​ The lexer skips over whitespace and comments, ensuring they don’t interfere
with the interpretation of the AWK program.
4.​ Error Reporting:
○​ The lexer identifies invalid sequences in the input and generates appropriate
error messages.
5.​ State Management:
○​ The lexer keeps track of the current state of parsing (e.g., whether it's inside a
string, processing a regular expression, or handling an escape sequence).

Interesting Parts of the Code

If you browse lex.c, you’ll likely encounter (at least it was there when this chapter was
written!):

●​ Character Classes: Functions that categorize characters (e.g., isalpha for letters or
isdigit for digits).
●​ State Machine: Logic for handling different parsing states, such as transitioning
between reading identifiers, strings, or numbers.
●​ Regular Expression Parsing: Code to distinguish regular expressions from other
patterns.

Why is This Important?

The lexer is a crucial part of the AWK interpreter. It lays the groundwork for how AWK
understands your input program by converting raw text into a structured format that the parser
can use. If you’re studying lex.c, you’re diving into the core of how AWK interprets its scripting
language.

Who created AWK?


AWK was created by three prominent computer scientists at Bell Labs in the 1970s (and that's
from where the name AWK actually comes from!):

1.​ Alfred V. Aho (A)


○​ Renowned for his contributions to programming languages, compilers, and
algorithms.
○​ Co-author of the famous "Dragon Book" (Compilers: Principles, Techniques, and
Tools).
○​ Played a major role in designing the theoretical underpinnings of AWK,
particularly its pattern-matching mechanisms.
2.​ Peter J. Weinberger (W)

53
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ A distinguished researcher with expertise in programming, algorithms, and


software engineering.
○​ Known for his work in practical computing systems and for contributing to AWK's
features that allow concise and expressive text processing.
3.​ Brian W. Kernighan (K)
○​ A pioneer in Unix and programming languages, co-author of The C Programming
Language (K&R) with Dennis Ritchie.
○​ Instrumental in creating AWK's user-friendly syntax and design philosophy, which
emphasizes simplicity and power.

The Origins of AWK

●​ Purpose: AWK was designed as a simple and efficient text-processing tool, allowing
users to process text files, extract information, and perform data transformations using
patterns and actions.
●​ Inspiration: The language's design reflects the combined influence of Unix's pipeline
philosophy and the power of regular expressions, making AWK a perfect tool for
Unix-like systems.

Fun Fact

AWK was so influential that it inspired later scripting languages like Perl, which extended many
of AWK's ideas. Even today, AWK remains a widely used tool for quick and powerful text
processing.

Is this Aho the one-and-only Aho from the Dragon Book?


Yes: this is the Dragon-Book-Aho! Alfred Vaino Aho is a renowned computer scientist, professor,
and author, celebrated for his groundbreaking work in algorithms, programming languages, and
compilers. He was born on August 9, 1941, in Timmins, Ontario, Canada, and received his
B.A.Sc. in Engineering Physics from the University of Toronto in 1963. He later earned his Ph.D.
in Electrical Engineering/Computer Science from Princeton University in 1967.

54
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Aho spent much of his career at Bell Labs, where he worked on various influential projects,
including AWK. He later joined Columbia University, where he became the Lawrence
Gussman Professor Emeritus of Computer Science. In 2020, he was awarded the Turing
Award, often referred to as the "Nobel Prize of Computing," alongside Jeffrey Ullman, for their
work on algorithms and theory in computer science.

Contributions to Computer Science

1.​ Programming Languages and Compilers


○​ Co-authored the classic textbook Compilers: Principles, Techniques, and Tools
(commonly called the Dragon Book), which remains a cornerstone in compiler
education.
○​ Developed foundational theories for compiler construction, including lexical
analysis, parsing, and optimization.
2.​ AWK Programming Language
○​ Co-created AWK with Peter Weinberger and Brian Kernighan, contributing to its
theoretical design and practical implementation.
3.​ Algorithms and Data Structures
○​ Co-authored the seminal textbook The Design and Analysis of Computer
Algorithms with John Hopcroft and Jeffrey Ullman, which laid the foundation for
modern algorithm design.
4.​ Formal Languages and Automata Theory
○​ Contributed extensively to the theory of regular expressions, context-free
grammars, and automata.
○​ Co-authored Introduction to Automata Theory, Languages, and Computation,
another classic textbook (co-written with Hopcroft and Ullman).

55
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5.​ Development of GREP


○​ Contributed to early work that influenced the creation of tools like grep, which
popularized the use of regular expressions for text searching.
6.​ Contributions to Computer Science Education
○​ Authored multiple influential textbooks that have educated generations of
computer scientists.
○​ Mentored countless students during his tenure at Columbia University.
7.​ Turing Award (2020)
○​ Recognized for fundamental algorithms and their application to programming
systems, including theoretical foundations for compilers.

Legacy

Aho’s work bridges theoretical computer science and practical programming, making his
contributions essential to both academic research and real-world software development. His
name is synonymous with excellence in algorithms, programming languages, and systems, and
his tools and ideas remain influential to this day.

56
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Tree-Like Program Representation


Code: https://github.com/pronesto/DCC053_Public/tree/main/2_ArithRepresentation

What would be a good data structure to represent infix arithmetic


expressions?
A good data structure to represent infix arithmetic expressions is an expression tree. Here’s
why:

●​ Tree Structure: In an expression tree, each internal node represents an operator, and
each leaf node represents an operand. The left and right children of a binary operator
node, for instance, represent the sub-expressions for that operator.
●​ Infix Representation: The tree can naturally represent the precedence of operations in
infix notation. For example, the expression (a + b) * c would have * as the root, with the
left child being the subtree for a + b and the right child being the leaf node c.
●​ Evaluation: You can easily evaluate the expression by recursively evaluating the
subtrees.
●​ Traversal: An inorder traversal of the tree will give you the original infix expression.

Could you write a simple program that represents arithmetic expressions as


a tree?
class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right

def evaluate(node):
"""
Produces the value of an arithmetic expression.

Examples:
>>> e = Node('+', Node(23), Node(19))
>>> evaluate(e)
42

>>> e0 = Node('+', Node(23), Node(19))


>>> e1 = Node('-', e0, Node(20))
>>> evaluate(e1)
22
"""
if node is None:
return 0
# Evaluate the a leaf operand.
if node.left is None and node.right is None:
return int(node.value)
# Recursively evaluate the left and right subtrees
left_val = evaluate(node.left)
right_val = evaluate(node.right)
# Apply the operator

57
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

if node.value == "+":
return left_val + right_val
elif node.value == "-":
return left_val - right_val
elif node.value == "*":
return left_val * right_val
elif node.value == "/":
return left_val / right_val

This implementation represents arithmetic expressions using a tree structure, where each node
can be an operator (like +, -, *, or /) or a value (a number). Here's a breakdown of how it works:

1.​ Node Class: Each Node object represents a part of the expression. It has three
attributes:
○​ value: This can either be an operator (like "+") or a numeric value.
○​ left: This points to the left child node (the left operand).
○​ right: This points to the right child node (the right operand).
2.​ Evaluate Function: This function recursively computes the result of the arithmetic
expression represented by the tree.
○​ Base Case: If the current node is a leaf (meaning it has no children), it returns
the integer value of that node.
○​ Recursive Case: If the node is not a leaf, it evaluates the left and right subtrees
by calling evaluate on each child. This gives the values of the left and right
operands.
○​ Operator Application: After obtaining the values of the operands, it applies the
operator stored in the current node’s value:
■​ For "+", it adds the two operands.
■​ For "-", it subtracts the right operand from the left.
■​ For "*", it multiplies them.
■​ For "/", it divides the left operand by the right.
3.​ Examples: The comments provide examples of how the tree structure represents
expressions:
○​ An expression for 23+1923 + 1923+19 is created as a tree with "+" as the root,
23 as the left child, and 19 as the right child.
○​ For a more complex expression like (23+19)−20(23 + 19) - 20(23+19)−20, the
tree has "+" as a child of the "-" operator.

In summary, this implementation allows for the representation and evaluation of arithmetic
expressions in a structured way using a tree-like data structure, making it easy to extend or
modify for more complex expressions or operations in the future.

Could you add a second function to print the expression in infix notation?
Here's our implementation:
def print_infix(node):
"""
This function prints the arithmetic expression in infix notation.

58
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Examples:
>>> e = Node('+', Node(23), Node(19))
>>> print_infix(e)
'(23+19)'

>>> e0 = Node('-', Node(23), Node(19))


>>> e1 = Node('-', Node(20), e0)
>>> print_infix(e1)
'(20-(23-19))'
"""
if node is None:
return ""
# If the node is a leaf (operand), just return its value
if node.left is None and node.right is None:
return str(node.value)
# Recursively get the left and right sub-expressions
left_expr = print_infix(node.left)
right_expr = print_infix(node.right)
# Wrap the expression in parentheses
return f"({left_expr}{node.value}{right_expr})"

This function, print_infix, converts and displays arithmetic expressions in infix notation (the
standard way of writing expressions, where operators are placed between operands). Here’s
how it works:

1.​ Function Purpose: The goal of print_infix is to represent the arithmetic expression
stored in the tree structure as a string in infix form.
2.​ Base Case:
○​ If the current node is None, it returns an empty string. This handles cases where
there might be no expression.
○​ If the node is a leaf (meaning it has no children), it returns the string
representation of its value. This is for numeric values like 23 or 19.
3.​ Recursive Case:
○​ The function recursively calls itself for the left and right child nodes of the current
node, retrieving the left and right sub-expressions.
○​ It then combines these sub-expressions with the current node's operator (the
value) placed between them.
4.​ Expression Formatting:
○​ The returned string for the current node wraps the entire expression in
parentheses. This is important to maintain the correct order of operations and
clearly define the structure of the expression.
○​ For example, if the current node represents the operator "+", and its left child is
"23" and right child is "19", the result will be "(23+19)".
5.​ Examples:
○​ For a simple expression like 23+1923 + 1923+19, the function outputs "(23+19)".
○​ For a more complex expression such as 20−(23−19)20 - (23 - 19)20−(23−19),
the function builds the string by first calculating the inner expression "(23-19)",
then wrapping the entire expression as "(20-(23-19))".

59
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In summary, this function constructs a string representation of an arithmetic expression in infix


notation by recursively processing each node of the expression tree, ensuring that the
expression is correctly formatted with parentheses to represent the order of operations clearly.

Cool! What about postfix?


Here's the code:
def print_postfix(node):
"""
This function prints the arithmetic expression in postfix notation.

Examples:
>>> e = Node('+', Node(23), Node(19))
>>> print_postfix(e)
'23 19 + '

>>> e0 = Node('-', Node(23), Node(19))


>>> e1 = Node('-', Node(20), e0)
>>> print_postfix(e1)
'20 23 19 - - '
"""
if node is None:
return ""
# Recursively get the left and right sub-expressions
left_expr = print_postfix(node.left)
right_expr = print_postfix(node.right)
# Combine left, right, and current node's value
return f"{left_expr}{right_expr}{node.value} "

The print_postfix function converts and displays arithmetic expressions in postfix notation
(also known as Reverse Polish Notation, or RPN). Here’s how it differs from the previous
print_infix function:

1.​ Notation:
○​ Infix notation (used by print_infix) places operators between operands (e.g.,
23+1923 + 1923+19).
○​ Postfix notation places operators after their operands (e.g., 232323 191919 +++).
This eliminates the need for parentheses to dictate order of operations, as the
sequence of operations is inherently defined by the position of the operators and
operands.
2.​ Structure:
○​ In the print_postfix function, the output is generated by recursively
processing the left and right subtrees before appending the operator. This reflects
the order in which operations should be performed.
○​ The function builds the output by first evaluating the left expression, then the right
expression, and finally adding the operator at the end.
3.​ Output Formatting:
○​ Each operand and operator is separated by a space, which helps in clearly
delineating them in the output (e.g., "23 19 + ").

60
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Unlike print_infix, there’s no need for parentheses since the structure of


postfix notation inherently indicates the order of operations.
4.​ Examples:
○​ For the expression 23+1923 + 1923+19, the function outputs "23 19 + ".
○​ For the expression 20−(23−19)20 - (23 - 19)20−(23−19), the function outputs "20
23 19 - - ". Here, it first evaluates the inner expression "23 19 -" and then
appends the outer operator "-".

In summary, print_postfix differs from print_infix in that it constructs and outputs


expressions in postfix notation, which emphasizes a different order of evaluation and does not
require parentheses to clarify the structure of the expression. The recursive processing remains
similar, but the final string construction aligns with the rules of postfix notation.

When printing in infix notation we need parentheses, and when printing in


postfix we don't need them. Why?
The need for parentheses in infix notation, but not in postfix notation, arises from how these
notations handle operator precedence and the order of operations.

Infix Notation

●​ Operator Precedence: In infix notation, the position of operators relative to operands is


crucial for determining the order of operations. For example, in the expression 3 + 4 *
5, multiplication has a higher precedence than addition, so 4 * 5 is evaluated first, and
then the result is added to 3.
●​ Parentheses: To explicitly indicate the intended order of operations, parentheses are
used in infix notation. Without parentheses, expressions could be ambiguous or
misinterpreted. For instance, in (3 + (4 * 5)), the parentheses clarify that 4 * 5
should be evaluated first.

Postfix Notation

●​ Order of Operations: In postfix notation (Reverse Polish Notation), the order of


operations is inherently clear from the position of operators and operands. The notation
places operators after their operands, which naturally dictates the order of operations
without needing additional symbols.
●​ No Need for Parentheses: Since postfix notation eliminates ambiguity by design,
there’s no need for parentheses. The expression 3 4 5 * + unambiguously means
"multiply 4 and 5, then add 3 to the result," regardless of operator precedence rules.

61
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Imagine that I want to add a fifth operation to my program. Say, modulo,


with symbol %. So, now I need to modify all the functions, e.g.,
print_postfix, print_infix, evaluate?
Yes, when adding a new operation like modulo (%) to your program, you will need to update all
relevant functions, such as print_postfix, print_infix, and evaluate, to recognize and
correctly handle the new operation. For instance, the new version of the evaluate function will
become as in the code below:

def evaluate(node):
if node is None:
return 0
if node.left is None and node.right is None: # Leaf node (operand)
return int(node.value)
# Recursively evaluate the left and right subtrees
left_val = evaluate(node.left)
right_val = evaluate(node.right)
# Apply the operator
if node.value == '+':
return left_val + right_val
elif node.value == '-':
return left_val - right_val
elif node.value == '*':
return left_val * right_val
elif node.value == '/':
return left_val / right_val
elif node.value == '%': # Handle modulo operation
return left_val % right_val

Could you think about an easier way to add new operators without having
to modify lots of stuff?
Yes, we could use a more object oriented approach. Example:

from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def eval(self):
raise NotImplementedError

class Num(Expression):
def __init__(self, num):
self.num = num
def eval(self):
return self.num

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right

62
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

@abstractmethod
def eval(self):
raise NotImplementedError

class Add(BinaryExpression):
def eval(self):
return self.left.eval() + self.right.eval()

class Sub(BinaryExpression):
def eval(self):
return self.left.eval() - self.right.eval()

class Mul(BinaryExpression):
def eval(self):
return self.left.eval() * self.right.eval()

class Div(BinaryExpression):
def eval(self):
return self.left.eval() // self.right.eval()

def evaluate(exp):
"""
Evaluate an arithmetic expression.

Examples:
>>> n1 = Num(28)
>>> n2 = Num(4)
>>> e = Div(n1, n2)
>>> evaluate(e)
7

>>> n1 = Num(7)
>>> n2 = Num(4)
>>> e = Mul(n1, n2)
>>> evaluate(e)
28
"""
return exp.eval()

This implementation introduces a more object-oriented approach to representing arithmetic


expressions, using classes and inheritance. Here’s how it differs from the previous
representation using the Node class:

1.​ Class Hierarchy:


○​ Abstract Base Class: The Expression class is an abstract base class (ABC)
that defines the interface for all expression types. It enforces that any subclass
must implement the eval method, providing a clear structure for expression
evaluation.
○​ Concrete Classes: The Num class represents numeric values, while
BinaryExpression serves as a base class for expressions involving two

63
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

operands (like addition and subtraction). Specific operations (like addition,


subtraction, multiplication, and division) are implemented as subclasses of
BinaryExpression (e.g., Add, Sub, Mul, Div).
2.​ Separation of Concerns:
○​ Each class has a single responsibility. For example, Num only handles numeric
values, while the operation classes (e.g., Add) handle the logic for their specific
operations. This separation makes the code easier to maintain and extend.
3.​ Method Overriding:
○​ Each concrete class implements the eval method to define how that specific
type of expression should be evaluated. This allows for polymorphism, where the
same method call (eval()) can behave differently based on the object type.
4.​ No Explicit Nodes:
○​ In the previous representation, each expression was represented as a node in a
tree structure, where each node held a value and references to its children. In
this implementation, there are no explicit nodes; instead, each expression is an
object that directly represents its value or operation.
5.​ Cleaner Evaluation:
○​ The evaluation of expressions is streamlined through method overriding. For
instance, calling evaluate(e) simply calls the eval method of the expression
object, regardless of its type. This leads to cleaner and more concise evaluation
logic.
6.​ Examples:
○​ The examples demonstrate how to create numeric and operation objects and
evaluate them using the evaluate function. For instance, creating a division
operation with two Num instances directly reflects the new object-oriented
approach.

In summary, this representation focuses on encapsulating behavior (evaluation logic) within


dedicated classes, promoting a clear and organized structure. The use of inheritance and
polymorphism enhances code clarity and allows for easier modifications or extensions in the
future.

And how could you print the expression in pre/in/post-fix notation?


class Num(Expression):
def __init__(self, num):
self.num = num
def eval(self):
return self.num
def __str__(self):
return str(self.num)

class Add(BinaryExpression):
def eval(self):
return self.left.eval() + self.right.eval()
def __str__(self):

64
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return "+"

class Sub(BinaryExpression):
def eval(self):
return self.left.eval() - self.right.eval()
def __str__(self):
return "-"

class Mul(BinaryExpression):
def eval(self):
return self.left.eval() * self.right.eval()
def __str__(self):
return "*"

class Div(BinaryExpression):
def eval(self):
return self.left.eval() // self.right.eval()
def __str__(self):
return "/"

def print_prefix(exp):
"""
Evaluate an arithmetic expression.

Examples:
>>> n1 = Num(28)
>>> n2 = Num(4)
>>> e0 = Div(n1, n2)
>>> n3 = Num(5)
>>> e1 = Mul(e0, n3)
>>> f"{print_prefix(e1)} = {e1.eval()}"
'* / 28 4 5 = 35'

>>> n1 = Num(28)
>>> n2 = Num(4)
>>> e0 = Div(n1, n2)
>>> n3 = Num(50)
>>> e1 = Div(n3, e0)
>>> e2 = Mul(e1, e0)
>>> f"{print_prefix(e2)} = {e2.eval()}"
'* / 50 / 28 4 / 28 4 = 49'
"""
s = str(exp)
if isinstance(exp, BinaryExpression):
s = f"{s} {print_prefix(exp.left)} {print_prefix(exp.right)}"
return s

def print_infix(exp):
s = str(exp)
if isinstance(exp, BinaryExpression):
s = f"({print_infix(exp.left)}) {s} ({print_infix(exp.right)})"
return s

65
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def print_postfix(exp):
s = str(exp)
if isinstance(exp, BinaryExpression):
s = f"{print_postfix(exp.left)} {print_postfix(exp.right)} {s}"
return s

Adding the __str__ method to each expression class allows for flexible and consistent string
representations of expressions, making it easy to print them in different notations. Here’s how
this change addresses the problem of printing in various formats:

1.​ Unified String Representation:


○​ The __str__ method provides a way to define how each type of expression
should be represented as a string. For numeric values, it returns the number as a
string, while for operators, it simply returns the operator symbol. This unified
approach allows for easy concatenation of strings when building different
notations.
2.​ Polymorphism:
○​ By overriding the __str__ method in each subclass, the code takes advantage
of polymorphism. When str(exp) is called, it dynamically resolves to the
appropriate method based on the actual type of the expression object. This
means that whether the expression is a number or an operator, the correct string
representation is used automatically.
3.​ Recursive Construction:
○​ Each printing function (print_prefix, print_infix, and print_postfix)
calls str(exp) to get the string representation of the current expression. It then
recursively constructs the string for left and right sub-expressions, building the full
expression in the desired notation.
○​ This recursive nature allows each printing function to easily adapt the expression
string according to its specific notation rules without duplicating logic for string
representation.
4.​ Separation of Concerns:
○​ The logic for how expressions are evaluated (in the eval methods) is separated
from how they are printed (in the __str__ methods and the printing functions).
This enhances code clarity and maintainability, as changes to one aspect
(evaluation or printing) can be made independently.
5.​ Flexible Notation Handling:
○​ With this structure, adding a new notation is straightforward. You simply create a
new printing function and utilize the existing __str__ methods for the
expressions. Each notation can have its own rules for how to arrange the strings
(e.g., prefix, infix, postfix) without needing to change how expressions are
represented.

66
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In summary, adding the __str__ method streamlines the process of generating string
representations for various expression types, allowing different printing functions to construct
the desired notation flexibly and efficiently. This object-oriented design enhances modularity,
making the code easier to read, extend, and maintain.

Which design pattern does this object oriented version of Expressions


follow?
This implementation follows the "Composite" design pattern. The Composite pattern is used to
treat individual objects and compositions of objects uniformly. It allows you to compose objects
into tree structures and work with these structures as if they were individual objects.

In your code:

●​ Component: The Expression class acts as the abstract base class (or component) in
the pattern, defining a common interface (eval) for all expressions.
●​ Leaf: The Num class represents the leaf objects in the tree, which have no children and
implement the eval method to return a value directly.
●​ Composite: The BinaryExpression and its subclasses (Add, Subtract, Multiply,
Divide, Modulo) act as composite objects. They contain references to child
components (left and right) and implement the eval method by delegating to their
children.

The key aspect of the Composite pattern is that it allows you to treat individual objects (like Num)
and compositions of objects (like Add or Multiply) uniformly through the common eval
method. This is exactly what your implementation achieves.

We have seen two ways to implement Expressions. The first is the


procedural approach. The second is the object oriented approach. How are
they different?

1. Procedural Approach

In the procedural approach, the operations (such as evaluating, printing in infix/postfix notation)
are implemented as functions that operate on data structures representing the expressions.
Here's a summary:

●​ Characteristics:
○​ Data and functions are separate.
○​ Expressions are represented by nested data structures (e.g., tuples, lists).
○​ Functions are responsible for interpreting and manipulating these data structures.
○​ New operations may require modifying existing functions to handle new data
structures.

67
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Object-Oriented Approach (Composite Design Pattern)

In the OO approach, the expressions themselves are represented as objects with methods that
define their behavior. Each expression type knows how to evaluate itself, print itself, etc.

●​ Characteristics:
○​ Data and behavior are encapsulated within objects.
○​ Each expression type is represented by a class, and operations are methods
within these classes.
○​ New operations or expression types can be added by defining new classes or
methods.

What are the relative advantages of each approach?

1. Procedural Approach

●​ Advantages:
○​ Simplicity: For small, well-defined problems, the procedural approach can be
simpler and more straightforward.
○​ Performance: It can be more efficient in terms of memory and speed because
there’s less overhead compared to objects.
○​ Easier to Understand for Simple Problems: The logic is often more
transparent, especially for those familiar with procedural programming.
●​ Disadvantages:
○​ Scalability Issues: As the number of operations or types of expressions grows,
the code can become harder to maintain. Adding a new type of expression may
require modifying multiple functions.
○​ Tight Coupling: The functions are tightly coupled to the specific data structures,
making it harder to extend or modify.
○​ Less Reusability: Code reuse is more challenging since the logic is tied to
specific data structures.

2. Object-Oriented Approach (Composite Design Pattern)

●​ Advantages:
○​ Extensibility: Adding new expression types is easier. For example, adding a new
type of expression, like modulo, only requires creating a new class, without
modifying existing ones.
○​ Encapsulation: Each expression type encapsulates its behavior, leading to
better organization and separation of concerns.
○​ Reusability: Code can be reused more effectively, especially with inheritance
and polymorphism.
○​ Scalability: The system can grow more easily, and it’s simpler to maintain and
extend.

68
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Disadvantages:
○​ Complexity: The OO approach introduces more complexity, especially for small,
simple problems.
○​ Overhead: There’s typically more memory and CPU overhead due to the use of
objects and method calls.
○​ Learning Curve: It requires a good understanding of OO principles, which might
be more challenging for those new to OO programming.
○​ Extensibility: Adding new operations to existing types is difficult. For example, to
add an operation to compute the "size" of expressions it would be necessary to
change all the subclasses of Expression, adding the new method to them.

When to use one approach over the other?


Procedural Approach:

●​ When the problem is small and well-defined.


●​ When performance is critical, and the overhead of objects is not justified.
●​ When the number of types of expressions is limited and unlikely to grow.

Object-Oriented Approach:

●​ When you expect the system to grow or require modifications over time.
●​ When you want to add new types of expressions without changing existing code.
●​ When code maintainability, extensibility, and reusability are priorities.

Depending on the programming language, you might not have much


choice. Can you give examples?
Languages like C and Pascal are classic examples of procedural programming languages. They
encourage a structure where data and functions are separate. Similarly, if we choose functional
programming languages, like Haskell or Elixir, usually we follow the procedural path: data and
functions will be separate. In contrast, languages like Java, C#, and Smalltalk are designed with
object-oriented programming (OOP) as their core paradigm. Languages like Python, C++, and
JavaScript support multiple paradigms, giving you the flexibility to choose the best approach for
your problem.

How would be the implementation of expressions in C, for instance?


If you were to implement expressions in C, your program would indeed resemble the tree-based
representation like the one in your provided code rather than the object-oriented approach used
in the earlier examples. Here’s how the C implementation would likely look similar and the
reasons for that:

1.​ Structs Instead of Classes:

69
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ In C, you would use struct to define a tree node instead of a class. Each struct
could contain a value (operator or operand) and pointers to left and right child
nodes, similar to the Node class.

typedef struct Node {


char* value; // for operators
int number; // for numbers
struct Node* left;
struct Node* right;
} Node;

2.​ Function-Based Approach:


●​ C is procedural, so you would define functions to evaluate and print the
expression rather than using methods within classes. This would involve passing
pointers to nodes as arguments to these functions.

int evaluate(Node* node) {


if (node == NULL) return 0;
if (node->left == NULL && node->right == NULL) return node->number;

int left_val = evaluate(node->left);


int right_val = evaluate(node->right);

if (strcmp(node->value, "+") == 0) return left_val + right_val;


else if (strcmp(node->value, "-") == 0) return left_val - right_val;
// Add other operations...
}

3.​ Manual Memory Management:


○​ C requires you to manage memory manually using malloc and free. In
contrast, the object-oriented approach in languages like Python manages
memory automatically.
4.​ No Inheritance or Polymorphism:
○​ Since C does not support inheritance and polymorphism natively, you would need
to rely on a more manual approach for defining behaviors (like using function
pointers for different operations if desired).
5.​ Print Functions:
○​ You would create separate functions to handle printing in different notations,
similar to how the earlier Python functions work but without the benefit of
polymorphism.

Overall, your C implementation would focus on structured programming principles, using structs
and function calls, reflecting a more traditional tree-based approach rather than the more
abstract and modular object-oriented design seen in languages like Python or C++.

What would this interpreter look like, if implemented in SML/NJ?

70
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In Standard ML (SML), you can use algebraic data types to represent arithmetic expressions
effectively. Below is an implementation that captures the essence of the expression
representation and evaluation in SML/NJ:

(* Define the algebraic data type for expressions *)


datatype expr =
​ Num of int ​ (* Represents a number *)
| Add of expr * expr ​ (* Represents addition *)
| Sub of expr * expr ​ (* Represents subtraction *)
| Mul of expr * expr ​ (* Represents multiplication *)
| Div of expr * expr ​ (* Represents division *)

(* Function to evaluate an arithmetic expression *)


fun eval (Num n) = n
| eval (Add (e1, e2)) = eval e1 + eval e2
| eval (Sub (e1, e2)) = eval e1 - eval e2
| eval (Mul (e1, e2)) = eval e1 * eval e2
| eval (Div (e1, e2)) = eval e1 div eval e2

(* Function to print an expression in infix notation *)


fun print_infix (Num n) = Int.toString n
| print_infix (Add (e1, e2)) = "(" ^ print_infix e1 ^ " + " ^ print_infix e2 ^ ")"
| print_infix (Sub (e1, e2)) = "(" ^ print_infix e1 ^ " - " ^ print_infix e2 ^ ")"
| print_infix (Mul (e1, e2)) = "(" ^ print_infix e1 ^ " * " ^ print_infix e2 ^ ")"
| print_infix (Div (e1, e2)) = "(" ^ print_infix e1 ^ " div " ^ print_infix e2 ^ ")"

(* Example usage *)
val expr1 = Add (Num 23, Num 19) ​(* Represents 23 + 19 *)
val expr2 = Sub (expr1, Num 20) ​ (* Represents (23 + 19) - 20 *)

(* Evaluate expressions *)
val result1 = eval expr1 ​ (* Should be 42 *)
val result2 = eval expr2 ​ (* Should be 22 *)

(* Print expressions *)
val infix1 = print_infix expr1 ​ (* Should print "(23 + 19)" *)
val infix2 = print_infix expr2 ​ (* Should print "((23 + 19) - 20)" *)

Explanation:

1.​ Algebraic Data Types:


○​ The datatype expr defines an algebraic data type with constructors for
numeric values (Num) and binary operations (Add, Sub, Mul, Div). This captures
the recursive nature of expressions.
2.​ Evaluation Function:
○​ The eval function recursively evaluates expressions based on their structure,
using pattern matching to determine the type of expression and compute the
result.
3.​ Printing Function:

71
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The print_infix function constructs a string representation of the expression


in infix notation, again using pattern matching to format each part of the
expression appropriately.
4.​ Example Usage:
○​ The example usage demonstrates creating expressions, evaluating them, and
printing them in infix notation.

This approach is idiomatic in SML, leveraging its powerful type system and pattern matching
capabilities to create a clear and concise representation of arithmetic expressions.

The C and the SML/NJ implementations actually look similar in their


essences: the focus is on the operations, not on the types. Do you agree?
I actually do! Both the C and SML/NJ implementations emphasize a procedural or functional
approach, where the focus is on operations (like evaluation and printing) rather than on
encapsulating behavior and state within objects, which is characteristic of object-oriented
programming.

Key Points of Comparison:

1.​ Operation-Centric:
○​ In both implementations, the primary structure is defined by the operations (like
eval, print_infix, etc.) rather than by encapsulated types. This is typical of
functional programming, where functions operate on data types directly.
2.​ Pattern Matching vs. Control Structures:
○​ SML utilizes pattern matching, which succinctly handles different expression
types, while C uses conditional statements to achieve similar results. Both focus
on the behavior associated with each type rather than defining behavior within a
type itself.
3.​ Separation of Data and Behavior:
○​ In both languages, data structures (like struct in C or datatype in SML) are
separate from the functions that operate on them. This contrasts with
object-oriented programming, where methods are typically defined within classes
that encapsulate data.
4.​ Lack of Inheritance and Polymorphism:
○​ Neither implementation leverages inheritance or polymorphism, which are
hallmarks of object-oriented design. Instead, they rely on function overloading (in
SML's case) and control flow constructs (in C's case).

Conclusion:

The designs reflect a more operational paradigm, which is common in procedural and functional
languages, where the focus is on defining how to manipulate data rather than on modeling
real-world entities through objects. Object-oriented programming, by contrast, emphasizes

72
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

encapsulation, inheritance, and polymorphism, focusing on modeling behaviors as part of data


structures. Your observation highlights a fundamental difference in design philosophy between
these approaches!

It seems that adding new types would be tough in the C or in the SML
implementations, but adding new operations would be easy?
That's right! The design of both the C and SML implementations makes it relatively easy to add
new operations but more challenging to introduce new types of expressions. Here’s why:

Adding New Operations

1.​ Separation of Operations:


○​ In both implementations, operations like eval and print_infix are defined
separately from the expression types. To add a new operation (e.g.,
print_postfix), you simply create a new function that operates on the
existing data types. This modularity makes it straightforward to expand
functionality.
2.​ Pattern Matching and Control Structures:
○​ In SML, adding a new operation typically involves adding a new pattern to the
function definitions. In C, you would simply add new if or switch cases to
handle the new operation. This flexibility allows you to enhance the behavior of
existing types without altering their structure.

Adding New Types

1.​ Modification of Data Structures:


○​ Introducing a new type (e.g., Mod for modulus) requires modifying the existing
data structure definitions. In SML, you’d need to add a new constructor to the
datatype definition. In C, you’d have to update the struct and add new cases
in evaluation and printing functions. This can be more complex and error-prone,
especially if you have many operations depending on the expression types.
2.​ Updating All Related Functions:
○​ When adding a new type, you also need to revisit and modify all functions that
interact with the expression types (like eval, print_infix, etc.) to ensure they
handle the new type correctly. This can lead to a lot of repetitive changes and
increase the risk of inconsistencies.

Conclusion

The structure of these implementations favors the addition of operations due to their focus on
functionality and separation from data types. However, introducing new types requires more
extensive changes, making it less convenient. This reflects a common trade-off in software

73
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

design, where extensibility in one area may lead to challenges in another. Your insight captures
this fundamental aspect of managing complexity in programming languages!

So, an advantage of the OO approach is that it's very easy to add new
types. Is that so?
Yes, that's correct! One of the significant advantages of the object-oriented (OO) approach is the
ease with which new types (or classes) can be added, along with associated behavior. Here are
some key reasons why this is beneficial:

Advantages of Adding New Types in OO

1.​ Encapsulation:
○​ In OO design, data and behavior are bundled together within classes. When you
introduce a new type, you can define its properties and methods in one place,
making it easier to manage and understand.
2.​ Inheritance:
○​ New types can inherit from existing classes, allowing you to reuse code and
behavior. For example, if you have a base class for expressions, you can create
a new subclass for a specific type of expression, inheriting common functionality
while also extending or overriding specific methods.
3.​ Polymorphism:
○​ OO allows you to define common interfaces for different types. You can introduce
new types that implement the same interface, making it possible to treat different
types uniformly. This reduces the need to modify existing code when adding new
types.
4.​ Extensibility:
○​ Because OO design encourages the use of interfaces and abstract classes, it’s
often easier to add new types without altering existing code. You can focus on
implementing new functionality rather than modifying existing logic.
5.​ Cleaner Code:
○​ By organizing code into classes, it becomes clearer how different types interact
and what operations they support. This structure can lead to more maintainable
and understandable code as new types are introduced.

Conclusion

Overall, the OO approach provides a flexible framework for adding new types while minimizing
the impact on existing code. This leads to a more modular design, making it easier to manage
complexity as software systems grow and evolve. However, adding new operations is difficult,
because that would require us changing all the subclasses of tree nodes. This tension between
extensibility of types and extensibility of operations is at the core of a software-design challenge
known as the Expression Problem. We shall talk more about it when we discuss Visitors.

74
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Are these tree-like structures what we call an Abstract Syntax Tree (AST)?
Yes! An abstract syntax tree (AST) is a tree-like data structure used in computer science,
particularly in the context of compilers and interpreters, to represent the hierarchical structure of
code. It abstracts away surface-level syntax details and focuses on the logical structure and
relationships of the program's components.

Key Features of an AST

1.​ Abstract vs. Concrete:


○​ An AST is "abstract" because it omits certain syntax details found in the source
code, such as parentheses, commas, or formatting.
○​ In contrast, a parse tree (or concrete syntax tree) includes every token and
syntactic element in the source code.
2.​ Nodes and Structure:
○​ Each node in an AST represents a programming construct, such as an
expression, statement, or variable.
○​ Nodes are arranged hierarchically, with parent nodes representing higher-level
constructs and child nodes representing their components.
3.​ Simplification:
○​ For example, the mathematical expression 3 + (5 * 2) would be represented
in an AST as the figure below:

○​ This structure reflects the order of operations (multiplication before addition)


without explicitly including parentheses.

Why Are ASTs Important?

1.​ Representation of Code Semantics:


○​ The AST captures the essential meaning of code, making it easier for tools to
analyze, transform, or optimize it.
2.​ Compiler Design:
○​ ASTs are a core part of compilers, bridging the gap between parsing and code
generation.

75
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ They allow for optimizations (e.g., constant folding) and provide a structure for
emitting machine code.
3.​ Code Analysis and Transformation:
○​ ASTs are used in static analysis tools, refactoring tools, and formatters to inspect
or modify code.
4.​ Interpretation:
○​ Interpreters use ASTs to evaluate code directly, traversing the tree to compute
results.

How ASTs Differ from Parse Trees

●​ Parse Tree: Captures all syntax details, including parentheses, punctuation, and
keywords.
●​ AST: Focuses only on the semantic structure, omitting unnecessary syntax details.

For instance, a parse tree for 3 + 5 would include the tokens for the number 3, the + operator,
and the number 5, along with syntactic markers for precedence rules. The AST would simply
represent the operation + with 3 and 5 as its operands.

Conclusion

An AST is a powerful abstraction that allows compilers, interpreters, and other tools to work with
the logical structure of code. It simplifies complex syntax into a manageable format, enabling
various stages of processing, such as semantic analysis, optimization, and code generation.

Why is it recommended to process programs as an Abstract Syntax Tree,


instead of treating them as a raw string?
In principle, it should be possible to process the program as a big, unstructured string, but that
would be difficult to evolve and maintain. Almost every language processing system that is
moderately complex will use an abstract syntax tree (or some other data structure that is
separate from the parsing phase).

76
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In this sense, building an Abstract Syntax Tree (AST) instead of processing the program as a
string is essential for several reasons:

1. Logical Representation:

An AST represents the logical structure of a program, abstracting away its concrete syntax. This
means that details like parentheses, indentation, or specific syntactic sugar are encoded into the
tree structure rather than needing to be interpreted repeatedly during processing. For example,
the expression `(a + b) * c` would be represented as a tree where `*` is the root, with `a + b` as
one child and `c` as the other. This makes the program easier to analyze and manipulate.

2. Separation of Concerns:

Parsing converts raw source code into a structured representation (the AST), isolating syntax
recognition from the later stages of the compilation pipeline (e.g., type checking, optimization,
code generation). This separation simplifies both the parser and the subsequent processing
stages, improving maintainability and scalability.

3. Efficiency:

Processing a program as a string would require repeated parsing for tasks like type checking,
optimization, and code generation. With an AST, the program structure is parsed once, and
subsequent operations can work directly with the tree, significantly improving efficiency.

4. Error Detection and Input Validation:

Following principles from the Language-theoretic Security (LangSec) framework, input should
be fully recognized and validated before processing begins. The AST serves as proof that the

77
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

input conforms to the grammar of the language. To illustrate, consider an anti-pattern like
shotgun parsing (The first antipattern in The seven turrets of Babel), where parsing and
processing are mixed. This approach often leads to incomplete error checking, as errors
discovered late in processing could mean invalid input has already been partially processed,
making it difficult to recover or predict program behavior. An AST avoids this problem by fully
validating the structure of the input before any processing occurs.

5. Evolution and Extensibility:

An AST provides a structured and consistent foundation for extending a language or compiler.
New features, optimizations, or transformations can be added by working with the tree structure
rather than reworking complex string-processing logic.

Conclusion
In summary, an AST transforms the program into a clean, structured representation, facilitating
reliable, maintainable, and efficient language processing. While it’s theoretically possible to
process a program as a string, doing so would lead to inefficiency, complexity, and fragility in the
system.

Who invented this notion of an abstract syntax tree?


This question is not easy to answer. ASTs are so much part of compiler design, that their usage
is more or less folk knowledge. The abstract syntax tree (AST), as a concept, emerged from
early work in programming language design and compiler implementation. Its history is tied to
the evolution of parsers, intermediate representations, and formal methods in programming
languages. Below is an overview of the history and early implementations that featured ASTs.

1. Origins of ASTs

●​ Early Concepts (1950s–1960s):


○​ The need for abstract representations of code arose with the first high-level
programming languages, such as FORTRAN (1957) and ALGOL 60 (1960).
○​ While these languages did not explicitly use ASTs in their early implementations,
their compilers required internal data structures to represent the syntax and
semantics of code.
●​ The ALGOL Influence:
○​ ALGOL 60, one of the most influential languages of the 1960s, introduced block
structures and a clear, recursive grammar. This led to the development of
techniques like recursive descent parsing, which naturally produced hierarchical
representations akin to ASTs.
○​ Peter Naur’s work on defining ALGOL 60 inspired the idea of using trees to
represent program structures.

2. Early Use of ASTs

78
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Boehm’s Compiler for ALGOL 58 (1959):


○​ Hans Peter Boehm’s early ALGOL compiler is often credited as one of the first to
use a tree-like intermediate representation, which could be considered a
precursor to ASTs.
●​ PL/I and Vienna Definition Language (1960s):
○​ The formal specification of PL/I using the Vienna Definition Language (VDL)
introduced concepts of hierarchical structures to represent programming
constructs. While not explicitly labeled as ASTs, these structures resembled them
in purpose.
●​ LISP (1958):
○​ LISP introduced a tree-based representation of code in its syntax. In LISP,
programs are inherently structured as trees of expressions, and this structure is
preserved during evaluation. While this is not an AST in the compiler sense, it
shows an early practical use of tree-based representations.
●​ Simula (1967):
○​ As the first object-oriented language, Simula had sophisticated compiler needs.
Its implementation likely used hierarchical internal representations similar to
ASTs.

3. ASTs in Prominent Compiler Implementations

●​ FORTRAN Compilers (1960s):


○​ While early FORTRAN compilers relied more on linear representations, later
implementations adopted hierarchical structures to represent subroutine calls and
control flow.
●​ ALGOL 68 (1968):
○​ The ALGOL 68 compiler formalized the use of intermediate representations for
syntactic constructs, bringing it closer to modern ASTs.
●​ Pascal and the P-Code Machine (1970s):
○​ Niklaus Wirth’s Pascal compilers, especially those generating P-code, relied on
an intermediate representation that could be seen as an AST. This facilitated
portability and simplicity in code generation.
●​ C Compiler (1970s):
○​ Early C compilers, such as the one for Dennis Ritchie’s PDP-11, used tree-like
representations for expressions and statements during code generation and
optimization phases.

4. Explicit Recognition of ASTs

●​ The 1970s and Formal Compiler Texts:


○​ Compiler textbooks like "The Art of Compiler Design" (1971) and "Compilers:
Principles, Techniques, and Tools" (1986, "The Dragon Book") formalized
ASTs as a distinct phase in compiler design.

79
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ By the 1970s, ASTs were explicitly acknowledged as a standard intermediate


representation in academic and practical compiler projects.

Key Milestones in AST Development

1.​ Recursive Descent Parsing (1961):


○​ Introduced by Peter Lucas and Robert Irons, this parsing technique naturally led
to tree-based representations.
2.​ BCPL and Intermediate Representations (1960s):
○​ Martin Richards’ BCPL compiler (1966) used intermediate code that could be tied
to AST-like structures.
3.​ Interlisp (1970s):
○​ Introduced sophisticated editing and analysis tools based on tree-like program
representations, closely related to ASTs.
4.​ Modern Compilers (1980s Onward):
○​ ASTs became a standard part of compiler pipelines for languages like Ada, C++,
and Java.

Conclusion

The abstract syntax tree evolved gradually, with early influences from ALGOL, LISP, and PL/I.
While ASTs were not explicitly named or formalized in the 1950s and 1960s, tree-like
representations of code were in practical use. By the 1970s, ASTs became a recognized part of
compiler theory, and their widespread adoption in modern programming languages cemented
their importance in computer science.

So, BCPL is one of the first programming languages to feature a tree-like


intermediate representation? BCPL has inspired C, right?
Yes, BCPL (Basic Combined Programming Language), designed by Martin Richards in 1966,
was a very influential language. Although this tree-like representation was already well-known at
the time it was created, BCPL is considered one of the earliest programming languages to use
concepts that align with abstract syntax trees (ASTs). While BCPL itself might not have
explicitly used the term "AST," its compiler design emphasized intermediate representations that
were tree-like, laying the groundwork for modern compiler architectures.

Why Was BCPL Important?

1.​ Precursor to C and Modern Programming:


○​ BCPL directly influenced the development of B, designed by Ken Thompson,
which in turn inspired C, one of the most widely used programming languages
today.
○​ BCPL’s syntactic and structural simplicity carried forward into B and C, leaving a
legacy that persists in the syntax and semantics of modern programming.
2.​ Portable Compiler Design:

80
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ BCPL was one of the first languages to introduce the idea of portable
compilers. Its compilers used an intermediate representation known as O-code
(object code), which could be executed on different platforms with the help of an
interpreter.
○​ This approach inspired future virtual machines, including the JVM (Java Virtual
Machine) and LLVM.
3.​ Minimalism:
○​ BCPL was designed to be simple and efficient, targeting early computers with
limited resources.
○​ It demonstrated how a minimalistic language could support powerful
programming techniques, making it popular in academic and system
programming circles.
4.​ Influence on Compiler Theory:
○​ BCPL’s compiler was a practical demonstration of many theoretical compiler
concepts, including the use of an intermediate representation to separate
front-end parsing from back-end code generation.
○​ This separation is a core feature of modern compilers, often relying on ASTs to
represent the structure of the source code.
5.​ System Programming:
○​ BCPL was one of the first high-level languages to be used for system
programming. Many early operating systems and tools were written in BCPL,
such as portions of the Cambridge CAP computer and Tripos.

Did BCPL Explicitly Use ASTs?

BCPL’s compiler architecture relied heavily on a tree-like intermediate representation of


programs, which resembles an AST:

●​ BCPL parsed source code into an intermediate representation, stripping away


unnecessary syntax to focus on the logical structure of the program.
●​ This intermediate representation facilitated code generation and optimization, much like
modern ASTs.

While it might not have been explicitly called an "AST" in the 1960s, the concept was inherently
present.

Legacy of BCPL

1.​ Influence on Language Design:


○​ Many concepts from BCPL, such as control structures and syntax, influenced the
design of languages like B, C, and their descendants (C++, Java, etc.).
2.​ Compiler Techniques:
○​ The O-code intermediate representation in BCPL introduced ideas that are now
foundational in compiler design, including virtual machines and cross-platform
compatibility.

81
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Portability:
○​ BCPL’s portability was revolutionary for its time, setting a standard for languages
to be easily adapted across diverse hardware platforms.

Conclusion

BCPL was indeed very important in programming language history. Its focus on simplicity,
portability, and compiler innovations had a lasting impact on computer science. While not the
first language to use tree-like structures, BCPL was among the early pioneers that influenced
how compilers represent and process programs internally, paving the way for widespread use of
ASTs.

Who was Martin Richards, the creator of BCLP?

Martin Richards: A Short Bio

Martin Richards is a renowned British computer scientist best known for designing the
programming language BCPL (Basic Combined Programming Language) in 1966. He was
born in 1940 and studied mathematics at the University of Cambridge, earning his Ph.D. under
the supervision of David Wheeler, one of the pioneers of computer science.

Richards has had a long and impactful career, primarily at the University of Cambridge, where
he worked on programming languages, operating systems, and compiler design. He has made
significant contributions to the development of computer science, particularly in the areas of
programming language design and compiler technology.

Contributions to Computer Science

1.​ Design of BCPL (1966):

82
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ BCPL was a pioneering programming language that introduced several influential


concepts:
■​ It was one of the first portable programming languages, designed for
writing system software like operating systems and compilers.
■​ Its minimalist syntax and efficiency inspired the development of B (Ken
Thompson) and C (Dennis Ritchie), making it a direct ancestor of these
widely used languages.
■​ BCPL introduced the idea of using a single data type, the word, which
simplified the language's implementation and was well-suited to early
hardware.
2.​ O-code: A Portable Intermediate Representation:
○​ Richards developed the concept of O-code, an intermediate representation for
BCPL programs. O-code could be interpreted on different hardware platforms,
making BCPL one of the first languages to achieve portability.
○​ This concept of a portable intermediate representation influenced later systems
like the Java Virtual Machine (JVM) and LLVM.
3.​ Tripos Operating System:
○​ Richards contributed to the development of Tripos, a portable operating system
written in BCPL. Tripos served as the basis for early versions of the AmigaOS,
which was used in the Commodore Amiga computers.
4.​ Contributions to Compiler Design:
○​ His work on BCPL demonstrated the practical use of separating compiler front
ends (parsing and analysis) from back ends (code generation), a fundamental
concept in modern compiler design.
○​ His compilers served as teaching tools, providing insights into language
implementation.
5.​ Teaching and Mentorship:
○​ As a professor at the University of Cambridge, Richards influenced generations
of computer scientists, sharing his knowledge of programming languages and
systems design.

Legacy

●​ Martin Richards is widely respected for his contributions, though he is less well-known to
the general public compared to figures like Dennis Ritchie or John Backus.
●​ His work on BCPL and its influence on C secured his place in the history of programming
languages and systems software.
●​ Richards’ emphasis on simplicity and portability in programming languages remains a
cornerstone of software development.

Publications

●​ Richards authored several academic papers and technical reports related to BCPL,
programming language implementation, and operating systems. His 1981 book "BCPL:

83
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The Language and Its Compiler" (co-authored with Colin Whitby-Strevens) is


considered a classic in the field.

In summary, Martin Richards is a significant, though understated, figure in computer science,


whose contributions have had a lasting impact on programming languages and software
engineering.

84
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Recursive Descent Parsing


Code: https://github.com/pronesto/DCC053_Public/tree/main/3_ParsingArithExp

Last class, we wrote a data structure to represent expressions. How can


we convert a string, in the postfix format, to this structure, assuming that we
have the Lexer?
The function compute_prefix below parses and converts an arithmetic expression written in
prefix notation (where the operator comes before the operands) into a tree-like data structure,
with nodes representing different arithmetic operations (like addition, multiplication, etc.).

def compute_prefix(lexer):
"""
Examples:
>>> lexer = Lexer("+ 3 * 4 2")
>>> e = compute_prefix(lexer)
>>> e.eval()
11

>>> lexer = Lexer("+ * 3 4 2")


>>> e = compute_prefix(lexer)
>>> e.eval()
14
"""
token = lexer.next_valid_token()

if token.kind == TokenType.NUM:
# Base case: return the value if it's a number
return Num(int(token.text))

elif token.kind in Token.operators:


# Recursive case: evaluate the operands
a = compute_prefix(lexer)
b = compute_prefix(lexer)

if token.kind == TokenType.ADD:
return Add(a, b)
elif token.kind == TokenType.SUB:
return Sub(a, b)
elif token.kind == TokenType.MUL:
return Mul(a, b)
elif token.kind == TokenType.DIV:
return Div(a, b)

else:
raise ValueError(f"Unexpected token type: {token.kind}")

This code builds an expression tree for prefix expressions by recursively parsing the input and
constructing a tree of nodes for the operations. Each node represents a number or an arithmetic

85
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

operation, and the recursive nature of the function allows it to handle nested operations
correctly.

Structure Overview:

●​ Input: A Lexer that tokenizes an arithmetic expression.


●​ Output: A tree-like structure of nodes representing the arithmetic operations, which can
later be evaluated.

Code Explanation:

1.​ Lexer and Tokens:


○​ The Lexer is responsible for providing tokens, which are individual parts of the
input (e.g., numbers and operators). A token has two key components: its type
(kind) and its value (text).
2.​ Recursive Parsing:
○​ The function recursively builds a tree structure where each node can be an
operation (Add, Sub, Mul, Div) or a number (Num).
3.​ Base Case: Numbers:
○​ When lexer.next_valid_token() returns a token of type TokenType.NUM,
it means that we have encountered a number in the prefix expression. In this
case, the function creates a leaf node (a Num object) using the token's value and
returns it. This is the base case of the recursion, where no further operations are
required.
4.​ Recursive Case: Operators:
●​ If the token is an operator (e.g., +, -, *, /), the function recursively computes the
left and right operands by calling compute_prefix twice:
○​ a = compute_prefix(lexer) parses and builds the left operand.
○​ b = compute_prefix(lexer) parses and builds the right operand.
●​ After parsing both operands, the function wraps them in a node that represents
the operator (e.g., Add, Sub, Mul, Div). The structure of the tree grows as the
recursion unfolds, forming a binary tree for the entire expression.
5.​ Recursive Case: Operators:
○​ The operator token (token.kind) determines which operation to perform.
Based on this, the function returns a new node representing the operation:
●​ Add(a, b) for addition (+)
●​ Sub(a, b) for subtraction (-)
●​ Mul(a, b) for multiplication (*)
●​ Div(a, b) for division (/)
6.​ Error Handling:
○​ If an unexpected token type is encountered (something other than a number or
an operator), the function raises a ValueError indicating an invalid token.

86
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you show how it handles "+ * 3 4 2"?


Consider the example "+ * 3 4 2":

1.​ The lexer reads the first token +, which is an operator.


2.​ The function calls itself recursively to compute the left operand of +:
○​ The next token is *, which is another operator.
○​ It calls itself again to compute the left operand of *:
■​ The token 3 is a number, so it returns Num(3).
○​ It calls itself again to compute the right operand of *:
■​ The token 4 is a number, so it returns Num(4).
○​ Now it has both operands for *, so it returns Mul(Num(3), Num(4)).
3.​ The function now computes the right operand of +:
○​ The token 2 is a number, so it returns Num(2).
4.​ Now the function has both operands for +, so it returns Add(Mul(Num(3), Num(4)),
Num(2)).

This builds a tree where Add is the root, and its left and right children represent the
multiplication and number operations, respectively. When you call .eval() on this tree, it
computes the value of the expression (which is 14).

That was kinda easy! Why?


The fact that the strings are in prefix notation (also known as Polish notation) is the key
reason why this problem is relatively easy to solve. Let me explain why this makes things
simpler, and why it feels more like a straightforward process rather than traditional parsing. In
prefix notation, the operator comes first, followed by its operands. The major advantage of this
format is that there is no need for parentheses or precedence rules to dictate the order of
operations. This means the order of tokens in the expression directly reflects the structure of the
expression, which allows for linear, single-pass parsing. Here’s why it’s easier:

1.​ Natural left-to-right processing:


○​ When reading the expression from left to right, you always know exactly what the
next operator or operand should be.
○​ For example, when you encounter an operator like +, you know that the next two
tokens must be its operands, so you can just recursively process those.
2.​ Recursive calls mimic the tree structure:
○​ The way you process tokens matches the natural construction of an expression
tree:
■​ You see an operator, then you process its first operand, then its second
operand.
■​ This directly builds the tree-like structure without any need for
backtracking or complex parsing logic.

87
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ No need for operator precedence:


○​ In many parsing problems (like infix notation), you have to deal with operator
precedence and associativity, which makes parsing more complicated. You’d
need to figure out which operator applies to which operands based on the rules
of precedence (e.g., multiplication before addition).
○​ In prefix notation, the order of operations is explicit. There is no ambiguity
because you always process the operator first, followed by its operands, in the
exact order they appear.
4.​ Simple recursion:
○​ In infix notation, parsers often require more complex strategies like recursive
descent parsing, precedence climbing, or shunting-yard algorithms. These
methods are needed to handle the complexity of operator precedence,
parentheses, and left/right associativity.
○​ With prefix notation, however, a simple recursive approach works because the
expression naturally unfolds in the correct order as you process the tokens.

Can that code be called a "Parser"?


Yes, technically, what you've written is a parser for prefix expressions, albeit an extremely
simple one due to the nature of the input format. A parser is any function or algorithm that takes
a sequence of tokens and produces a structured output (like an abstract syntax tree or an
expression tree). In this case, your function parses the prefix expression and produces a tree
that represents the arithmetic expression. The parser is so simple due to a number of reasons:
●​ Prefix notation has a natural correspondence with how recursion works:
○​ You read the operator first, which tells you to apply it to two operands. Then you
recursively read the operands in the correct order, and no ambiguity or reordering
is required.
●​ No ambiguity:
○​ In prefix notation, you never have to worry about parentheses or operator
precedence. You just follow the order in which the tokens are presented.
●​ Single-pass processing:
○​ Since the structure is naturally embedded in the order of the tokens, a single
pass through the tokens is enough to build the entire tree.

So, in short, the expression is already structured in a way that avoids the complexity found in
other notations like infix notation. That's why you don't have to write any complicated recursive
calls or worry about precedence rules, making it a straightforward recursive descent parser.

Would you be able to extract a "grammar" from the code above?


Yes, we can consider each recursive call of the compute_prefix function as a non-terminal of the
grammar, and each token as a terminal, e.g.:

E ::= num
| + E E

88
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

| - E E
| * E E
| / E E
This grammar specifies that an expression (E) can be:

1.​ A number (num), which is a base case in the compute_prefix function.


2.​ An expression formed by an operator (+, -, *, /), followed by two sub-expressions (E
E), which is the recursive case handled by compute_prefix.

In the compute_prefix function, if the current token is a number (as recognized by


token.kind == TokenType.NUM), the function returns a Num object. If the token is an
operator (+, -, *, /), the function recursively evaluates two sub-expressions.

So, can I say that compute_prefix is a parser?


Yes, you can absolutely say that compute_prefix is a parser. In fact, it's a recursive
descent parser for arithmetic expressions in prefix (Polish) notation.

●​ Parsing refers to the process of analyzing a sequence of tokens (like numbers and
operators) to determine their syntactic structure based on a formal grammar.
●​ In the case of compute_prefix, it reads tokens from the lexer and constructs an
expression tree by recursively applying production rules from the grammar (as we
discussed earlier).
●​ It's a recursive descent parser because it uses recursion to handle sub-expressions.
Each recursive call processes part of the input and returns a subtree, just like a typical
recursive descent parsing algorithm would.

So, the compute_prefix function fits well within the definition of a parser: it interprets the
structure of an input (the prefix notation expression) and converts it into a meaningful
representation (the expression tree).

Then, a general methodology to write parsers is to come up with a


grammar, and then write a recursive descent parser for it, right?
The general methodology for writing parsers—especially recursive descent parsers—follows
these steps:

1. Define the Grammar

●​ The first step is to describe the structure of the language (or the expressions you're
parsing) using a formal grammar, often in Backus-Naur Form (BNF) or Extended BNF.
●​ The grammar should specify how valid sequences of tokens can be formed, typically
defining rules for each type of construct (e.g., expressions, terms, factors, operators).

89
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Break Down the Grammar into Functions

●​ Each rule of the grammar corresponds to a function (or method) in your recursive
descent parser.
○​ Base cases (like num) are handled directly by returning a value (or building a
corresponding AST node).
○​ Recursive cases (like operators followed by two sub-expressions) are handled by
calling the parsing function recursively.

3. Use Recursive Descent

●​ The parser function processes the input token by token. When it encounters a token that
matches a grammar rule, it processes it (and possibly calls itself recursively) to handle
sub-expressions.
●​ This approach mirrors the structure of the grammar:
○​ Each function in the parser corresponds to a rule in the grammar.
○​ Each call to a parsing function corresponds to applying a rule.

4. Return an Abstract Syntax Tree (AST)

●​ A key task of parsing is to transform the input into a more structured form, such as an
abstract syntax tree (AST).
●​ Each function in the recursive descent parser typically returns an AST node representing
the result of parsing that part of the input.

So, can we say that our expression tree is an abstract syntax tree?
Yes, you can absolutely say that the expression tree is a type of abstract syntax tree (AST). In
fact, the two terms are often used interchangeably in this context.
Expression Tree: This typically refers to a tree structure where each node represents an
operator or operand in an arithmetic expression. For example:

●​ Leaf nodes represent operands (numbers or variables).


●​ Internal nodes represent operators (e.g., +, -, *, /).

Abstract Syntax Tree (AST): This is a tree representation of the abstract syntactic structure of
some code. The "abstract" part means it doesn't include unnecessary syntactic details like
parentheses, focusing instead on the logical structure of the expressions or statements.
So, in the context of arithmetic expressions, the expression tree is an AST because it captures
the essential structure of the expression without extra syntactic details like parentheses. It's a
compact, meaningful representation that can be used for further processing (evaluation,
printing, optimization, etc.).

90
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you come up with a grammar for expressions in infix notation?


First attempt:

E ::= num
| '(' E ')'
| E '+' E
| E '-' E
| E '*' E
| E '/' E

Can you use this grammar to write a recursive descent parser?


Not really! It is left-recursive, which makes it tricky to use directly in a recursive descent parser,
because left-recursion can lead to infinite recursion in such parsers. In recursive descent
parsers, when a rule calls itself as the first thing in a production (left-recursive), it leads to infinite
recursion. In that example above, a parser would first try to parse E, which immediately leads to
another attempt to parse E, and this goes on indefinitely.

Could you rewrite the grammar, to remove the left recursion?


Yes, here's a version of the grammar without left recursion:
E ::= num E'
| '(' E ')' E'
E' ::= '+' E E'
| '-' E E'
| '*' E E'
| '/' E E'
| ε

Is there a general algorithm to remove left recursion from grammars?


Yes, there is an algorithm. Here's how it works:
Identify left-recursive productions: Look for any rules where the left-hand side of the
production (A ::=) appears as the first symbol on the right-hand side (A α).
Rewrite the grammar: Replace the left-recursive rules with a right-recursive structure. For
every left-recursive non-terminal A, split its productions into:

●​ One set of productions where A does not appear on the right-hand side (let's call these
β).
●​ Another set where A appears as the first symbol (let's call these α).

Then, replace the left-recursive rule with a right-recursive version by introducing a new
non-terminal A' (often called the "tail" or "prime" version of A).

91
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Example: For a production of the form:


A ::= A α | β

Convert it into:
A ::= β A'
A' ::= α A' | ε

How would the parser be implemented using this grammar?


Let's implement a full Parser class, so that we can keep some internal state. In this case, the
internal state is the current token that must be consumed.

from Exp import *


from Lexer import *

class Parser:
"""
This class implements a parser for infix arithmetic expressions.

The parser uses the following grammar:

E ::= num F
| '(' E ')' F

F ::= '+' E F
| '-' E F
| '*' E F
| '/' E F
| empty

Example:
>>> lexer = Lexer("2 + 3 + 4")
>>> parser = Parser(lexer)
>>> e = parser.E()
>>> e.eval()
9

>>> lexer = Lexer("2 * 3 * 4")


>>> parser = Parser(lexer)
>>> e = parser.E()
>>> e.eval()
24

>>> lexer = Lexer("(2 + 3) * 4")


>>> parser = Parser(lexer)
>>> e = parser.E()
>>> e.eval()
20

>>> lexer = Lexer("4 * (2 + 3)")


>>> parser = Parser(lexer)
>>> e = parser.E()

92
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> e.eval()
20

>>> lexer = Lexer("2 * 3 + 4")


>>> parser = Parser(lexer)
>>> e = parser.E()
>>> e.eval()
10

>>> lexer = Lexer("2 - 3 - 4")


>>> parser = Parser(lexer)
>>> e = parser.E()
>>> e.eval()
-5
"""
def __init__(self, lexer):
self.lexer = lexer
self.current_token = self.lexer.next_valid_token()

def eat(self, token_type):


if self.current_token.kind == token_type:
self.current_token = self.lexer.next_valid_token()
else:
raise ValueError(f"Unexpected token: {self.current_token.kind}")

def E(self):
"""
Parse the base case (numbers or parenthesized expressions).

This case implements the productions:

E ::= num F
| '(' E ')' F
"""
token = self.current_token

if token.kind == TokenType.NUM:
self.eat(TokenType.NUM)
num = Num(int(token.text))
return self.F(num)

elif token.kind == TokenType.LPR: # '('


self.eat(TokenType.LPR)
node = self.E()
self.eat(TokenType.RPR) # ')'
return self.F(node)

def F(self, left):


"""
Parse the continuation of an expression (right recursion).

This case implements the productions:

F ::= '+' E F
| '-' E F

93
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

| '*' E F
| '/' E F
| empty

Notice that we must pass the left side of the expression to it, otherwise we
would not be able to build the expression node.
"""
token = self.current_token

if token.kind == TokenType.ADD:
self.eat(TokenType.ADD)
right = self.E()
return self.F(Add(left, right))

elif token.kind == TokenType.SUB:


self.eat(TokenType.SUB)
right = self.E()
return self.F(Sub(left, right))

elif token.kind == TokenType.MUL:


self.eat(TokenType.MUL)
right = self.E()
return self.F(Mul(left, right))

elif token.kind == TokenType.DIV:


self.eat(TokenType.DIV)
right = self.E()
return self.F(Div(left, right))

# If no more operators, return the current node.


return left

The grammar provides the wrong result for "2 * 3 + 4". Why?
That happens because the grammar does not recognize any precedence between * and +. It
solves the operands right-to-left. and 3 + 4 comes first (in the right-to-left order). Thus, the
grammar is right associative (what is not what we want!) and it does not define the precedence
of the multiplicative operators over the additive operators. More generally, the grammar is
ambiguous, something that is a big problem!

What are ambiguous grammars?


Ambiguity occurs when a grammar can produce more than one parse tree (or derivation) for the
same string. In the grammar you provided, the ambiguity comes from the fact that there are no
explicit rules for operator precedence or associativity. As a result, multiple valid parse trees
can be created for expressions with more than one operator.

Consider the following example expression: num + num * num

According to the grammar:

94
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ We can interpret the expression as:


○​ First, apply + to the first two nums, then multiply the result by the third num.
○​ This corresponds to the parse tree: (num + num) * num.
2.​ Alternatively, we can interpret the expression as:
○​ First, multiply the last two nums, and then apply + to the first num and the result of
the multiplication.
○​ This corresponds to the parse tree: num + (num * num).

Both interpretations are valid according to the grammar, leading to two different parse trees.
This ambiguity arises because the grammar does not enforce any precedence between the
operators (+, -, *, /), nor does it define associativity (whether operations should be grouped
left-to-right or right-to-left).

Could you draw two parsing trees for "num + num * num"?

Sure. See the figure below, which shows two different (non-isomorphic) parsing trees:

95
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How do we know if a grammar is ambiguous? Is there a general algorithm


to find that out?
Not really. The problem of determining whether a grammar is ambiguous is undecidable. In
formal language theory, it has been proven that there is no general algorithm or method that can
decide whether an arbitrary context-free grammar (CFG) is ambiguous. This undecidability
stems from the fact that the problem involves checking if there are two or more distinct parse
trees for the same string generated by the grammar, and this task cannot be automated for all
possible grammars.

In essence, while for some specific grammars it may be easy to check for ambiguity manually,
the existence of a general procedure that works for every context-free grammar is impossible.
This result is akin to other undecidability problems in computer science, such as the halting
problem.

Therefore, while there are heuristics and techniques to analyze particular grammars, the
ambiguity problem in general cannot be decided algorithmically.

96
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How did people show that the problem of determining if a grammar is


ambiguous was undecidable?
This undecidability can be understood in relation to the Post's Correspondence Problem (PCP),
which is a well-known undecidable problem in computability theory.

To elaborate, a grammar is considered ambiguous if there exists at least one string that can be
generated by the grammar in multiple distinct ways (i.e., there are two or more different parse
trees for the same string). There is no algorithm that can decide for every CFG whether it is
ambiguous or not.

The connection to the PCP arises because you can construct a reduction from PCP to the
ambiguity problem. If you could decide on ambiguity, you could also decide instances of PCP by
encoding them as grammars. Since PCP is undecidable, it follows that the problem of
determining grammar ambiguity must also be undecidable.

Thus, it is correct to say that the ambiguity of grammars is an undecidable problem, and this
conclusion leverages the properties of the Post's Correspondence Problem.

But can we sometimes demonstrate that a grammar is unambiguous?


Yes, sometimes. For instance, our grammar for prefix notation is not ambiguous:

E ::= num
| + E E
| - E E
| * E E
| / E E

Since the operators are explicitly part of the productions (e.g., + E E), there is no ambiguity in
deciding which operator applies at any point, because each production begins with a unique
operator symbol, and no production yields the empty string. For any string generated by this
grammar, the first operator in the string (e.g., +, -, *, /) uniquely determines the structure of the
parse tree. Once the first operator is identified, it’s clear that the next two sub-expressions will
be parsed as E. This fixed structure ensures that no two distinct parse trees can be formed for
the same string.

Notice, however, that the grammar below is still ambiguous, even though every production starts
with a different non-terminal:

E ::= num F
| '(' E ')' F
F ::= '+' E F
| '-' E F
| '*' E F

97
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

| '/' E F
| ε

The production for F can generate the empty string, which allows expressions to stop at
arbitrary points, leading to ambiguity. The empty string production for F plays a role in the
ambiguity by allowing the parser to end a sub-expression (E) at different points. When F ::=
empty, it terminates the current expression, which can lead to different groupings of operators
and operands.

Can you modify the grammar, so that multiplication precedes addition?


To modify your grammar so that multiplication (*) and division (/) have higher precedence than
addition (+) and subtraction (-), you need to separate the handling of addition and subtraction
from that of multiplication and division. This is typically done by introducing a new non-terminal
for the higher-precedence operators (multiplication and division) and adjusting the grammar
accordingly. Here's an example:

E ::= T ('+' E | '-' E | empty)

T ::= F ('*' T | '/' T | empty)

F ::= num
| '(' E ')'

Can you implement a parser following this grammar?


from Exp import *
from Lexer import *

class Parser:
def __init__(self, lexer):
self.lexer = lexer
self.current_token = self.lexer.next_valid_token()

def eat(self, token_type):


if self.current_token.kind == token_type:
self.current_token = self.lexer.next_valid_token()
else:
raise ValueError(f"Unexpected token: {self.current_token.kind}")

def E(self):
exp = self.T()
token = self.current_token

if token.kind == TokenType.ADD:
self.eat(TokenType.ADD)
right = self.E()
exp = Add(exp, right)

98
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

elif token.kind == TokenType.SUB:


self.eat(TokenType.SUB)
right = self.E()
exp = Sub(exp, right)

return exp

def T(self):
exp = self.F()
token = self.current_token

if token.kind == TokenType.MUL:
self.eat(TokenType.MUL)
right = self.T()
exp = Mul(exp, right)

elif token.kind == TokenType.DIV:


self.eat(TokenType.DIV)
right = self.T()
exp = Div(exp, right)

return exp

def F(self):
token = self.current_token

if token.kind == TokenType.NUM:
self.eat(TokenType.NUM)
return Num(int(token.text))

elif token.kind == TokenType.LPR: # '('


self.eat(TokenType.LPR)
node = self.E()
self.eat(TokenType.RPR) # ')'
return node

This code defines a recursive descent parser for arithmetic expressions, using a grammar that
handles addition, subtraction, multiplication, and division. The Parser class processes tokens
provided by the Lexer and generates a tree-like data structure representing the parsed
expression. The parser has three main methods: E() for handling expressions (with + and -),
T() for handling terms (with * and /), and F() for handling factors (numbers and
parenthesized expressions). The parser processes tokens recursively by consuming them with
the eat() method, and based on the current token, it either builds a subexpression or raises an
error if the token is unexpected. This structure enforces the correct precedence and
associativity rules for arithmetic operations.

Can you show how the parser handles "num * num + num"?
Let's walk through how the parser processes the string "num * num + num" step by step,
assuming that num represents a number token.

99
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Input: "num * num + num"

1.​ Initialization:
○​ The parser is initialized with a lexer that tokenizes the string. The first token
(current_token) is num.
2.​ Call to E():
○​ The parser starts at the E() method, which processes expressions.
○​ Inside E(), it first calls T() to handle terms (multiplication and division).
3.​ Call to T():
○​ T() handles terms, which involve factors combined with * or /.
○​ Inside T(), it calls F() to process the first factor.
4.​ Call to F():
○​ F() handles factors, which are numbers or parenthesized expressions.
○​ Since the current token is num, it matches the NUM case, consumes the token
with eat(TokenType.NUM), and returns a Num node representing the first
number.
5.​ Return to T():
○​ Now, the current token is *. Since this is a multiplication operator
(TokenType.MUL), T() consumes the * token with eat(TokenType.MUL).
○​ Next, T() calls itself recursively to process the right-hand side of the
multiplication (the second factor).
6.​ Second call to F() (inside T()):
○​ F() is called again to process the second num.
○​ Just like before, it consumes the num token, returning another Num node
representing the second number.
7.​ Return to T():
○​ Now that both factors have been processed, T() constructs a Mul node with the
two Num nodes as its left and right operands. This represents the expression num
* num.
○​ T() returns this Mul node to the E() method.
8.​ Back to E():
○​ At this point, E() has received the left-hand side of the addition (num * num).
○​ The current token is now +, so E() consumes the + token with
eat(TokenType.ADD) and calls itself recursively to handle the right-hand side
of the addition.
9.​ Recursive call to E() (right-hand side of +):
○​ In this recursive call to E(), the parser calls T() to handle the term on the
right-hand side of the +.
10.​Third call to F() (inside T() of the right-hand side):

100
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ T() calls F() to process the final num.


○​ As before, F() consumes the num token and returns a Num node representing
the third number.
11.​Return to T() (for the right-hand side):
○​ T() has no multiplication or division operators to process, so it directly returns
the Num node to E().
12.​Return to E():
○​ Now, E() constructs an Add node with the Mul node (num * num) as the left
operand and the Num node as the right operand.
○​ This Add node represents the entire expression num * num + num.

This last grammar has an indirect left recursion. Yet we could implement a
parser easily. When is left recursion a problem?
The grammar is indirectly left-recursive, meaning there's an intermediate production (like T or
F) before returning to the original rule (E). This doesn't lead to immediate infinite recursion
because the parser still consumes input as it processes these intermediate productions. So, the
recursion doesn't happen "infinitely" without progress.

Is there a general procedure to change a grammar like " E ::= num '+'
E | num '*' E | empty", so that the precedence of multiplication
becomes higher than addition's?
To make * have higher precedence than +, we need to separate them into different levels of
the grammar. The general rule is to create a new non-terminal for each precedence level.

1.​ Create separate non-terminals:


○​ Introduce a non-terminal for each level of precedence. For example, use T for
terms involving * and F for numbers or parenthesized expressions.
2.​ Make the higher precedence operator appear deeper in the tree:
○​ Place the higher precedence operator (*) at the lower level (i.e., in the T
production), and place the lower precedence operator (+) at the higher level (i.e.,
in the E production).

The parser fails to correctly compute "2 - 3 - 4". Why?


Our current parser produces "3", instead of "-5". This happens because the parser is
right-associative. It parses "2 - 3 - 4" as "2 - (3 - 4)", and not as "(2-3)-4". In this grammar,
recursion happens on the right side of each operator (e.g., + E, * T), so the operations are
grouped from the right, making it right-associative.

101
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you further modify it, so that the grammar becomes left associative?
A grammar is left-associative if operators are grouped from the left in cases of equal
precedence. This means that expressions like a - b - c are parsed as (a - b) - c (the left side is
evaluated first).

E ::= T EE

EE ::= '+' T EE
| '-' T EE
| empty

T ::= F TT

TT ::= '*' F TT
| '/' F TT
| empty

F ::= num
| '(' E ')'

E' ::= '+' T E' and E' ::= '-' T E' allow an expression like a - b - c to be
parsed by first evaluating a - b (through recursion on E'), and then combining the result with
c. The recursion happens on the leftmost operand first. Similarly, T' ::= '*' F T' and T'
::= '/' F T' make sure that multiplication and division are also grouped from the left. In
short, when processing E' ::= '+' T E', we only move onto E' once we are done with T,
which comes on the left of E'.

Can you use the last grammar to implement a parser?


from Exp import *
from Lexer import *

class Parser:
def __init__(self, lexer):
self.lexer = lexer
self.current_token = self.lexer.next_valid_token()

def eat(self, token_type):


if self.current_token.kind == token_type:
self.current_token = self.lexer.next_valid_token()
else:
raise ValueError(f"Unexpected token: {self.current_token.kind}")

def E(self):
exp = self.T()
return self.EE(exp)

102
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def EE(self, left):


"""
"""
token = self.current_token

if token.kind == TokenType.ADD:
self.eat(TokenType.ADD)
right = self.T()
return self.EE(Add(left, right))

elif token.kind == TokenType.SUB:


self.eat(TokenType.SUB)
right = self.T()
return self.EE(Sub(left, right))

# If no more operators, return the current node.


return left

def T(self):
exp = self.F()
return self.TT(exp)

def TT(self, left):


token = self.current_token

if token.kind == TokenType.MUL:
self.eat(TokenType.MUL)
right = self.F()
return self.TT(Mul(left, right))

elif token.kind == TokenType.DIV:


self.eat(TokenType.DIV)
right = self.F()
return self.TT(Div(left, right))

# If no more operators, return the current node.


return left

def F(self):
token = self.current_token

if token.kind == TokenType.NUM:
self.eat(TokenType.NUM)
return Num(int(token.text))

elif token.kind == TokenType.LPR: # '('


self.eat(TokenType.LPR)
node = self.E()
self.eat(TokenType.RPR) # ')'
return node

The code implements left-associative behavior. This is due to how the EE() and TT()
methods recursively call themselves after processing an operator. Each time an operator is
encountered, the parser creates a new binary expression (like Add or Mul) and recursively

103
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

passes this result into the next call of EE() or TT(). This causes the leftmost operators to be
evaluated first, making the expressions group to the left. For instance, when parsing a + b +
c, it will first combine a + b, then combine c with the result of a + b. Thus, the parser is
left-associative. The figure below shows how the parser builds up an expression:

Even though the recursive structure of the EE() and TT() methods might initially suggest
right-association, their design actually leads to left-associative behavior. When the parser
processes an operator, it immediately combines the current left-hand expression with the next
term, before making the recursive call. This ensures that the left-hand side is built up first,
leading to a left-associative parse.

Is it the case that any context-free grammar can be rewritten to remove left
recursion?
Yes, any context-free grammar can be rewritten to remove left recursion, whether it is direct
or indirect.

Important Considerations:

104
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Removing left recursion changes the structure of the grammar, but the resulting
grammar remains context-free and accepts the same language.
●​ Removing left recursion can make the grammar right-recursive, which is suitable for
top-down parsing (e.g., recursive descent).
●​ The process is algorithmic and can be applied to any context-free grammar, ensuring
that any CFG can be rewritten to remove left recursion.

So, is it the case that any context-free language can be parsed by a


recursive descent parser?
Alas, not really! While any context-free grammar (CFG) can be rewritten to remove left
recursion, there are other challenges that arise in recursive descent parsing, especially when it
comes to ambiguity and lookahead requirements.

Limitations of Recursive Descent Parsing:

1.​ LL(k) Grammars: Recursive descent parsers are generally limited to LL(k) grammars,
which means they are suitable for grammars that can be parsed with at most k tokens
of lookahead in a left-to-right scan of the input and leftmost derivation of the parse
tree.
○​ LL(k) grammar: A grammar where the parser can decide which production to
apply based on the next k tokens in the input.

However, not all context-free grammars are LL(k). Some grammars require more
complex parsing strategies (such as LR parsing), or they may require more than k
tokens of lookahead to resolve which production to apply.

2.​ Ambiguity: Some context-free languages are inherently ambiguous. An ambiguous


grammar is one where a given input can have more than one valid parse tree.
Recursive descent parsers can struggle with ambiguity, since they typically expect a
single, deterministic path through the grammar.
○​ Example: The classic dangling else problem in conditional expressions can lead
to ambiguity in parsing.
3.​ Non-Deterministic Grammars: Some CFGs are non-deterministic, meaning that the
parser might not be able to decide which production to use without considering multiple
options. Recursive descent parsers are deterministic, and if the parser encounters a
point where it cannot decide between two options, it might fail.
4.​ Left-Factoring: Even after removing left recursion, certain grammars may require
left-factoring to make them compatible with recursive descent parsing. Left-factoring is
the process of transforming a grammar to avoid common prefixes in different productions
of a non-terminal, so the parser can decide which production to apply based on a single
token or small lookahead.
○​ Example: A ::= α β | α γ Can be left-factored to: A ::= α A' and A' ::= β | γ

105
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, what's LL(k) languages? Or should I say LL(k) parsers?


You should refer to LL(k) parsers, not LL(k) languages. LL(k) describes a class of parsers and
grammars rather than a class of languages. Here's a breakdown of what LL(k) parsers are:

What is an LL(k) Parser?

An LL(k) parser is a top-down parser for context-free grammars that processes the input
left-to-right (L) and constructs a leftmost derivation (L) of the input string. The (k) part refers
to how many tokens of lookahead the parser can use to make decisions during parsing.

●​ L: The parser reads the input from left to right.


●​ L: The parser produces a leftmost derivation of the input, meaning it constructs the
parse tree by expanding the leftmost non-terminal first.
●​ k: The parser uses up to k tokens of lookahead to decide which grammar rule to apply
at each step. For example, LL(1) parsers use 1 token of lookahead.

Characteristics of LL(k) Parsers:

1.​ Top-down Parsing: LL(k) parsers begin parsing from the start symbol of the grammar
and try to match the input string to the grammar’s productions. They expand
non-terminals by applying productions until they can match the input tokens.
2.​ Predictive Parsing: LL(k) parsers are also called predictive parsers because they try
to predict which production to use by looking ahead at the next k tokens in the input. In
an LL(1) parser, for instance, the parser makes decisions based on just one token of
lookahead.
3.​ Deterministic: LL(k) parsers are deterministic. For any input, they follow a single path of
decisions without backtracking or guessing. This determinism means that the grammar
must be unambiguous and compatible with the predictive parsing strategy.
4.​ Left-to-Right Parsing: The parser processes the input from the leftmost token to the
rightmost token, consuming each token as it builds the parse tree.

Why are LL(k) parsers important?

1.​ Efficiency: LL(k) parsers are typically very efficient, especially when k = 1, because
they make decisions based on a small, fixed number of lookahead tokens. LL(1) parsers
are often used in compiler design due to their simplicity and efficiency.
2.​ Simple to Implement: LL(k) parsers can often be implemented as recursive descent
parsers. Each non-terminal in the grammar corresponds to a function in the parser, and
parsing can be accomplished using straightforward recursive functions.
3.​ Limitations: Not all grammars are LL(k) for any finite k. In particular, LL(k) parsers have
trouble with:
○​ Left recursion: Grammars with left recursion (e.g., E ::= E + T) can't be
parsed by LL(k) parsers.

106
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Ambiguity: LL(k) parsers can’t handle ambiguous grammars (where an input


string can have more than one valid parse tree).
○​ Lookahead: Some grammars require more than k tokens of lookahead to make
decisions. In practice, most parsers use LL(1) (1 token of lookahead), but LL(2)
or higher is needed for some more complex grammars.

So, is it an if-and-only-if situation between LL(k) grammars and recursive


descent parsers?
Yes, it is essentially an if-and-only-if relationship between LL(k) grammars and recursive
descent parsers. Here’s the reasoning behind both directions:

1. If a grammar is LL(k), it can be parsed by a recursive descent parser.

This is true. LL(k) grammars are specifically designed to be parsed using predictive parsing
techniques like recursive descent. Recursive descent parsers work by:

●​ Reading the input from left to right.


●​ Applying a leftmost derivation (i.e., expanding the leftmost non-terminal first).
●​ Using up to k lookahead tokens to decide which production rule to apply at each step.

Since LL(k) grammars allow for parsing decisions to be made with up to k tokens of lookahead,
recursive descent parsers can handle them effectively. Therefore, any LL(k) grammar can be
parsed by a recursive descent parser.

2. If a recursive descent parser can parse a grammar, then the grammar is


LL(k).

This is also true. For a recursive descent parser to work:

●​ The grammar must be predictive, meaning that at each step, the parser must be able to
determine the correct production rule based on a bounded number of lookahead tokens.
●​ The grammar must be non-ambiguous and non-left-recursive (left recursion needs to
be eliminated to avoid infinite recursion in a recursive descent parser).

These are precisely the characteristics of an LL(k) grammar. If a recursive descent parser can
successfully parse a grammar, then by definition:

●​ The grammar allows decisions to be made based on k lookahead tokens.


●​ It follows leftmost derivation.

Thus, the grammar must be LL(k) for some k.

In Summary:

107
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ If a grammar is LL(k), it can definitely be parsed by a recursive descent parser.


●​ If a grammar can be parsed by a recursive descent parser, then it must be LL(k)
(because the parser's ability to decide the next production based on bounded lookahead
is a hallmark of LL(k) grammars).

Therefore, the relationship is if-and-only-if:

●​ A grammar is LL(k) if and only if it can be parsed by a recursive descent parser.

Recursive Descent Parsers and Non-LL(k) Grammars:

While recursive descent parsers are naturally suited for LL(k) grammars, they cannot handle all
context-free grammars. For example, grammars with left recursion or ambiguities might not be
suitable for recursive descent parsing without modifications, such as removing left recursion or
transforming the grammar to fit the LL(k) form.

What would be the asymptotic complexity of LL(k) parsing?


Well, the answer to this question depends a bit on how you detail the question itself. If we
consider a fixed grammar, and a fixed k, then the complexity of parsing is O(n), where n is the
size of the input string.

Why LL(k) Parsing is O(n) if we fix the grammar and the k:

1.​ Single Pass over the Input:


○​ LL(k) parsers, including both recursive descent and table-driven variants, process
the input from left to right, making a single pass over the input string. At each
step, they:
■​ Use the current token(s) and possibly the next k lookahead tokens to
decide which production rule to apply.
■​ Consume tokens and build the parse tree incrementally.
○​ Since the parser processes each token once, the total work done is proportional
to the length of the input, resulting in O(n) time complexity.
2.​ Constant Lookahead (k):
○​ The k in LL(k) refers to the number of lookahead tokens that the parser might
need at any point to make a decision. The value of k is constant for a given LL(k)
grammar, so even though the parser may look ahead k tokens at each step, this
lookahead doesn’t depend on the size of the input. Therefore, this constant
lookahead doesn't affect the overall linear complexity.
3.​ Predictive Parsing:
○​ LL(k) parsers are predictive parsers, meaning they can predict which production
to use based on a bounded number of lookahead tokens. They don’t need to
backtrack (unlike backtracking parsers), which is one reason why they achieve
linear time complexity.

108
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

General Complexity of LL(k) Parsers:


If we consider any grammar, and any k, then these factors will contribute to the overall
asymptotic complexity. Given an arbitrary grammar, a parameter k, and an input string of length
n, the overall complexity can be affected by several factors, including both n and k. Here’s how
these can contribute:

1.​ Input:
○​ Parsing requires us going over every token in the input; hence, it is at least O(n).
2.​ Lookahead:
○​ If we don't fix the k (meaning that we want the complexity for any given k), then
the size of the lookahead will be part of the complexity; hence, it will be at least
O(k * n). However, k will be a constant if we fix the implementation of the parser.
3.​ Grammar Depth:
○​ The depth of recursion in the grammar’s structure affects the constant factor in
parsing. The time complexity, letting the grammar vary, will include terms like
O(m * d * n), where m is the number of production rules and d is the depth of
recursion. However, m and d are constants if we fix the grammar.
4.​ Parsing Table Size (for Table-Driven Parsers):
○​ For an LL(k) parser implemented with a parsing table (non-recursive), the size of
the table can grow exponentially with k, leading to space complexity of O(V *
k^V), where V is the number of non-terminals. This space complexity doesn't
usually affect the time complexity directly, but it is a factor in the practicality of
larger k values.

Theoretical Worst-Case Complexity:

In the theoretical worst case, if you consider an arbitrary grammar and a variable k, the
complexity might look like this:

●​ Time complexity: O(k * m * d * n), where:


○​ k is the number of lookahead tokens,
○​ m is the number of production rules,
○​ d is the maximum recursion depth (which can vary depending on the grammar),
and
○​ n is the length of the input string.
●​ Space complexity: The space complexity is usually linear in n, but for table-driven
parsers, the size of the parsing table can grow exponentially in k, resulting in O(V * kV)
for non-terminal V.

109
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Do actual programming languages use recursive descent parsers?


Yes, recursive descent parsers are indeed used in the implementation of actual programming
languages, especially those with simpler or more structured grammars, like LL(1) or LL(k)
grammars, where the parsing decisions can be made easily using lookahead tokens. Recursive
descent parsers are widely favored due to their simplicity, maintainability, and flexibility.

Examples of Languages Using Recursive Descent Parsers:

1.​ Pascal:
○​ Early implementations of Pascal used recursive descent parsers, as Pascal's
grammar is relatively simple and well-suited for predictive parsing.
2.​ Java:
○​ Some early Java compilers, like Sun's javac, were built using recursive descent
parsers. However, modern Java compilers often use more sophisticated parsing
techniques to handle the more complex grammar of the language.
3.​ Modula-2 and Modula-3:
○​ Both languages, designed by Niklaus Wirth (the same creator of Pascal), use
recursive descent parsers in their implementations, as their grammars are
designed to be LL(1) or close to it.
4.​ Oberon:
○​ Another Wirth language, Oberon, was designed with recursive descent parsing
in mind and uses this technique in its implementations.
5.​ Lua:
○​ The lightweight scripting language Lua uses a hand-written recursive descent
parser, which works well for its fairly straightforward grammar.
6.​ Go:
○​ The Go programming language uses a variant of recursive descent parsing in its
compiler. The Go grammar is designed to be simple enough for efficient recursive
descent parsing.
7.​ Python:
○​ Early versions of Python used a recursive descent parser. More recent versions
use a more sophisticated parser generator (PEG), but recursive descent
principles were influential in its design.

I heard that clang's parser is recursive descent. Is that correct?


Yes, you are mostly correct! The Clang parser is largely based on recursive descent. Clang,
which is part of the LLVM project, is a C, C++, and Objective-C compiler front end, and it uses a
recursive descent parser to process source code. See the discussion about the unified parser in
the clang webpage. The parser used by Clang is a complex piece of software that is responsible
for breaking down C, C++, and Objective-C code into a series of tokens that can be used by the
compiler to generate machine code. The parser is implemented in C++ and is part of the Clang
AST Core library.

110
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Here are some resources that you may find helpful:

●​ Clang AST Core library documentation: This documentation provides an overview of


the Clang AST Core library and its components, including the parser. You can find the
documentation on the Clang website: https://clang.llvm.org/docs/index.html
●​ Clang source code: The Clang source code is available on GitHub:
https://github.com/llvm/llvm-project You can browse the source code to see how the
parser is implemented.

Disclaimer: While Clang's parsing process is largely based on recursive descent principles, it's
not a straightforward, textbook implementation. Clang uses a combination of techniques,
including recursive descent and context-sensitive parsing, to handle the complexities of C, C++,
and Objective-C syntax.

Key Files and Structures:

1.​ lib/Parse/Parser.cpp: This is a central file in the parsing process. It contains the
top-level parsing functions and handles the overall structure of the parse.
2.​ lib/Parse/Parse.h: This header file declares the prototypes for the parsing functions
defined in Parser.cpp.
3.​ include/clang/Parse/Parser.h: This header file provides an interface for external
clients to access the parser.
4.​ lib/Basic/Token.h: This file defines the Token class, which represents a single
lexical token in the source code.
5.​ lib/Lex/Lexer.h: This file defines the Lexer class, which is responsible for
tokenizing the source code.

Recursive Descent Pattern:

While you might not find a direct, textbook implementation of recursive descent, you can
observe the recursive nature of the parsing functions in Parser.cpp. For example, the function
that parses a statement might recursively call functions to parse expressions, declarations, or
other statement types.

Specific Examples:

●​ Parser::parseStmt(): This function might recursively call parseCompoundStmt,


parseIfStmt, parseForStmt, and other statement-specific parsing functions.
●​ Parser::parseExpr(): This function might recursively call parseBinaryOperator,
parseUnaryOperator, parsePrimaryExpr, and other expression-specific parsing
functions.

Code Structure:

111
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The parsing code often follows a top-down approach, starting with the top-level grammar rules
and progressively descending into more specific rules. This is consistent with the principles of
recursive descent parsing. Nevertheless, clang's parsing is highly optimized and may use
techniques like context-sensitive parsing to deal with complex language constructs. These
techniques can make the code structure less straightforward than a pure recursive descent
implementation.

Is there a simpler language that uses a recursive descent parser? Clang's


parser is just too huge...
Yes, Lua uses a recursive descent parser that is much simpler! This parsing technique is
well-suited for languages with relatively simple grammars, like Lua.

Finding the Lua Parser:

The Lua parser is implemented in C and can be found in the llex.c and lparser.c files
within the Lua source code distribution.

Key Files:

●​ llex.c: This file handles lexical analysis, breaking the source code into tokens.
●​ lparser.c: This file contains the core parsing logic, implementing the recursive
descent parser.

Recursive Descent Pattern:

You'll find functions in lparser.c that correspond to the various grammar rules in Lua's
syntax. For example, there might be functions for parsing expressions, statements, blocks, and
other language constructs. These functions often call each other recursively to parse nested
structures. As an example, consider the entry point of the parser:

static void statlist (LexState *ls) {


/* statlist -> { stat [';'] } */
while (!block_follow(ls, 1)) {
if (ls->t.token == TK_RETURN) {
statement(ls);
return; /* 'return' must be last statement */
}
statement(ls);
}
}

Here's a breakdown of what the code is doing:

1.​ statlist: This function is responsible for parsing a list of statements.

112
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ block_follow(ls, 1): This function checks if the current token is a valid token to
end a block (e.g., end, else, or elseif).
3.​ TK_RETURN: This is a Lua token representing the return keyword.
4.​ statement(ls): This function presumably parses a single statement.

Recursive Descent Pattern:

The function uses a loop to repeatedly parse statements until it encounters a token that signals
the end of the statement list. This recursive nature is characteristic of recursive descent parsing.

Key Points:

●​ The function's name (statlist) suggests that it's parsing a list of statements.
●​ The loop condition (!block_follow(ls, 1)) ensures that statements are parsed until
the end of the block.
●​ The special handling of return statements ensures that return is the last statement in
a block.

So, can you list the advantages and disadvantages of recursive descent
parsers?

Advantages of Recursive Descent Parsers in Real Languages:

1.​ Simplicity:
○​ Recursive descent parsers are relatively easy to implement and debug. Each
non-terminal in the grammar corresponds to a function in the parser, making the
code very readable and modular.
2.​ Flexibility:
○​ They allow for easy integration of semantic actions, custom error handling, and
more fine-grained control over the parsing process.
3.​ Maintainability:
○​ Hand-written recursive descent parsers can be modified more easily than parsers
generated by parser generators like Yacc or Bison, making them preferable in
cases where the grammar changes frequently.

Disadvantages:

Despite their advantages, recursive descent parsers have limitations:

1.​ Left Recursion:


○​ Recursive descent parsers cannot handle left-recursive grammars without
transformations to remove left recursion, which can make the grammar more
complicated to maintain.

113
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Limited to LL(k):


○​ Recursive descent parsers work best for LL(k) grammars (where decisions can
be made with k lookahead tokens). More complex grammars, such as those
requiring arbitrary lookahead or involving left recursion, are not suitable for
recursive descent.

And who invented Recursive Descent Parsing?


Well, that's a tough question! I'd say either Peter Lucas or E.T. Irons are involved with the
origins of recursive descent parsing. Both of their works were published in 1961 and describe
top-down parsing techniques for ALGOL 60, making them foundational texts for recursive
descent parsing. Let’s explore the two contributions in detail:

E.T. Irons: "A Syntax Directed Compiler for ALGOL 60"

●​ Published in Communications of the ACM in 1961, Irons’ paper is one of the earliest to
describe syntax-directed compilation, focusing on the direct use of grammar rules to
guide parsing.
●​ Irons’ method was top-down:
○​ He demonstrated how to write a compiler for ALGOL 60 by systematically
working through grammar rules.
○​ The compiler directly translated grammatical structures into machine code as it
parsed the input.

Relation to Recursive Descent:

●​ Irons’ method essentially hardcoded grammar rules into the compiler’s structure.
●​ While not named as such, this is an early form of recursive descent parsing because
the parsing process recursively followed grammar rules to recognize syntactic structures.

Peter Lucas: "Die Strukturanalyse von Formelübersetzern"

●​ Published the same year as Irons’ work, Lucas’ paper is also fundamental to recursive
descent parsing.
●​ Lucas focused on structured top-down parsing and explicitly described the use of
recursive procedures for parsing grammar rules.

Relation to Recursive Descent:

●​ Lucas formalized the recursive nature of the parsing process, explicitly using recursive
functions to implement grammar rules.
●​ His approach more closely resembles modern recursive descent parsing, with each
grammar rule corresponding to a specific recursive function.

Which Is the First Description of Recursive Descent?

114
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Both papers contribute significantly, but they approach the problem from slightly different angles:

1.​ Irons described a more general syntax-directed compilation process, which included
elements of recursive descent but wasn’t exclusively focused on parsing.
2.​ Lucas gave a more explicit and systematic description of recursive descent parsing as
we recognize it today.

For this reason:

●​ If we are looking for the first application of recursive descent parsing: Irons.
●​ If we are looking for the first explicit description of recursive descent parsing: Lucas.

Broader Context: Why Recursive Descent Was Revolutionary

Recursive descent parsing was pivotal because:

●​ It provided a simple, systematic way to implement parsers directly from a grammar.


●​ It aligned well with ALGOL 60’s rich syntax and recursive constructs, making it a natural
fit for the language.

The method’s simplicity and clarity ensured its widespread adoption and long-term influence,
even in modern parsing techniques.

Conclusion

Both Irons and Lucas played foundational roles in the development of recursive descent parsing
in 1961. Lucas’ description is often considered the first explicit description, while Irons’ work
laid the groundwork as part of a broader exploration of syntax-directed compilation. Both
deserve recognition for their contributions to this key advancement in compiler design.

Is this the same Peter Lucas who inspired the design of PL/I?
Yes, pretty much! Peter Lucas played a key role in the formal definition of PL/I through his work
on the Vienna Definition Language (VDL). Peter Lucas (born 1935 in Austria) is a pioneering
computer scientist known for his contributions to programming language theory, compiler
design, and formal methods. He studied mathematics and earned his doctorate in 1959 under
Leopold Schmetterer at the University of Vienna. Lucas was part of the team that developed
ALGOL 60 compilers and played a leading role in advancing the formal description of
programming languages.

115
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Throughout his career, Lucas worked on both theoretical and practical aspects of programming,
becoming a key figure in early efforts to formally define programming languages. His influence
extended through his work on the Vienna Definition Language (VDL) and his contributions to
parsing techniques, particularly recursive descent parsing. Lucas held academic and research
positions in Austria and internationally, shaping the foundations of modern computing.

Contributions to Computer Science

1.​ Recursive Descent Parsing (1961):


○​ In his seminal work, Die Strukturanalyse von Formelübersetzern (1961), Lucas
introduced a systematic method for implementing parsers using recursive
procedures.
○​ This approach became known as recursive descent parsing, a technique still
widely used in compiler construction for its simplicity and alignment with the
syntax of many programming languages.
2.​ ALGOL 60 and Compiler Design:
○​ Lucas contributed to the implementation of ALGOL 60, a language that
introduced structured programming and influenced virtually every modern
programming language.
○​ His work involved designing techniques for parsing and translating ALGOL
programs into executable code.
3.​ Vienna Definition Language (VDL):
○​ As part of the Vienna group, Lucas helped develop the Vienna Definition
Language, one of the first tools for formally specifying the semantics of
programming languages.
○​ VDL was famously used to define PL/I, introducing a mathematical approach to
describing language behavior.
4.​ Formal Methods and Semantics:

116
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Lucas advanced the field of formal methods, providing rigorous frameworks for
reasoning about programming languages and their implementation.
○​ His work laid the groundwork for the modern study of programming language
semantics and influenced formal verification techniques.
5.​ Theoretical Computer Science:
○​ Lucas explored topics like syntax-directed translation, block structure in
programming languages, and hierarchical representations of programs.
○​ His theoretical insights into language design and compiler construction have had
long-lasting impacts on both academic research and practical compiler
development.

Legacy

Peter Lucas’ work bridged the gap between theory and practice in early computing. His
pioneering efforts in parsing, formal language definition, and compiler design continue to
influence the development of programming languages and compilers. While his contributions
are often associated with foundational work in the 1960s, their relevance endures in modern
computer science.

117
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Bottom-up Parsing
Code: https://github.com/pronesto/DCC053_Public/tree/main/4_bottomUp

Write a parser for the language of balanced parentheses: S ::= ( S ) S |


Empty:
from Lexer import *

class Parser:
"""
This parser implements an attempt to recognize the language of the following
grammar:

S ::= ( S ) S | Empty
"""
def __init__(self, lexer):
self.lexer = lexer
self.current_token = self.lexer.next_valid_token()

def consume(self, expected_type):


if self.current_token.kind == expected_type:
self.current_token = self.lexer.next_valid_token()
else:
msg = f"Expected {expected_type}, got {self.current_token.kind}"
raise ValueError(msg)

def S(self):
if self.current_token.kind == TokenType.LPR:
self.consume(TokenType.LPR) # consume '('
self.S() # parse S
self.consume(TokenType.RPR) # consume ')'
self.S() # parse S

def parse(self):
self.S()
if self.current_token.kind != TokenType.EOF:
raise ValueError(f"Unexpected token {self.current_token.kind}")

def test_parser(input_str):
"""
This function tests our parser of Dick language.

Example:
>>> test_parser('()()()')
()()() is valid.

>>> test_parser('(()())')
(()()) is valid.

>>> test_parser('((()))')
((())) is valid.

118
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> test_parser('')
is valid.

>>> test_parser('(()))')
(())) is invalid: Unexpected token TokenType.RPR

>>> test_parser('((())')
((()) is invalid: Expected TokenType.RPR, got TokenType.EOF
"""
lexer = Lexer(input_str)
parser = Parser(lexer)
try:
parser.parse()
print(f"{input_str} is valid.")
except ValueError as e:
print(f"{input_str} is invalid: {e}")

The parser we've implemented recognizes the language of balanced parentheses based on the
grammar "S ::= ( S ) S | Empty". It uses a recursive descent approach, where the S method is
called recursively to handle nested structures. The parser consumes tokens from the lexer and
checks for matching pairs of parentheses, raising errors when it encounters unexpected tokens
or if the structure is invalid.

In terms of classification, this parser is an LL parser because it processes the input from left to
right and constructs a leftmost derivation of the sentences in the language. It does not
backtrack; instead, it relies on the structure of the grammar to guide the parsing decisions. If it
encounters an unexpected token, it raises an error rather than attempting to find an alternative
parse path.

Write a parser for the language with no less left parentheses than right
parentheses.
Here's the grammar:
S ::= ( S | P
P ::= ( P ) | Empty
Let's try to write a recursive descent parser. It should look like:

def S(self):
token = self.current_token
if token.kind != TokenType.LPR:
raise ValueError("Unexpected token")
# Now, do we call production "S ::= ( S", or "S ::= P"?

In fact, we can't parse this grammar with a recursive descent parser... When we encounter a left
parenthesis (, the parser cannot determine whether to expand S or P because both productions
begin with (. The choice between these two productions requires more context than can be
provided by just looking at the next token (in this case, "(").

119
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

To correctly decide whether to use S or P, the parser would need to see additional tokens,
specifically a right parenthesis ), to clarify the structure. This dependency on future input makes
it impossible for an LL(k) parser (which makes decisions based solely on the current lookahead
tokens) to parse the grammar correctly, as it can't backtrack or re-evaluate choices based on
later tokens. So, while the grammar is not ambiguous, the inability to make a correct decision
upon seeing a left parenthesis renders it unsuitable for LL(k) parsing.

Could you actually recap what an LL(k) parser is?


An LL(k) parser is a type of parser that reads input from left to right and constructs a leftmost
derivation of the sentence with up to k tokens of lookahead. The parser determines which
production rule to apply based on the next k tokens.

And what are LL(k) grammars?


A grammar is said to be LL(k) if it can be parsed by an LL(k) parser. Specifically, an LL(k)
grammar has the property that, for each non-terminal and each possible sequence of up to k
lookahead tokens, there is at most one production rule that the parser can apply. If a grammar is
LL(1), it means that the parser can always decide the next production rule with just one token of
lookahead.

So, if there are LL(k) grammars, then there should also be LR(k)
grammars?
An LR(k) grammar is a type of context-free grammar that can be parsed by an LR(k) parser.
Here’s what the terms mean:

●​ L: The parser reads the input from left to right.


●​ R: The parser constructs a rightmost derivation in reverse.
●​ k: The parser uses up to k tokens of lookahead to decide which production to use.

And what is an LR(k) parser?


An LR(k) parser is a bottom-up parser that can handle a wider range of grammars compared to
LL(k) parsers. LR parsers are particularly good at dealing with left recursion and are often used
in practice because they can parse a larger class of grammars.

In general, how is an LR parser implemented?


An LR parser is implemented as a state machine. States are of two kinds: shift or reduce. Shift
states read tokens. Reduce states apply production rules. Each state is formed by a stack that
represents the partially constructed parsing tree, plus the stream of tokens. So, for our example
grammar:

S ::= ( S | P

120
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

P ::= ( P ) | Empty

We would have the following transition table, assuming input "( ( ) )":

We have two "shift" states, 0 and 2. Why do they handle ")" differently?
The first state "knows" that there is no P onto the stack. So, if it reads ")", then it must push "P )"
onto the stack. In this way, either we will end up with "(...(P)", which can be reduced, or with
"P)", which is an error.

Can you implement this state machine as a Python Program?


Yes, we can use the code below:

from Lexer import *

def match_and_pop(stack, pattern):


if len(stack) < len(pattern):
return False
if stack[-len(pattern):] == pattern:
stack[-len(pattern):] = []
return True
return False

def print_state(state, stack, token):


print(f"{state} :: {stack} :: {token}")

def shift_0(stack, lexer):


token = lexer.next_valid_token()
if token.kind == TokenType.LPR:
stack.append(TokenType.LPR)
return 0
elif token.kind == TokenType.RPR:

121
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

stack.append('P')
stack.append(TokenType.RPR)
return 1
elif token.kind == TokenType.EOF:
stack.append('S')
return 1
else:
raise ValueError(f"Unknown token: {token.text}")

def reduce_1(stack, _):


if match_and_pop(stack, [TokenType.LPR, 'P', TokenType.RPR]):
stack.append('P')
return 2
elif match_and_pop(stack, ['P']):
stack.append('S')
return 1
elif match_and_pop(stack, [TokenType.LPR, 'S']):
stack.append('S')
return 1
elif stack == ['S']:
return -1
else:
raise ValueError("Reduce error")

def shift_2(stack, lexer):


token = lexer.next_valid_token()
if token.kind == TokenType.RPR:
stack.append('P')
stack.append(TokenType.RPR)
return 1
elif token.kind == TokenType.EOF:
return 1
else:
raise ValueError("Reduce error")

state_machine = {0: shift_0, 1: reduce_1, 2: shift_2}

def test_parser(input_str):
lexer = Lexer(input_str)
next_state = 0
stack = []
try:
while next_state >= 0:
action = state_machine[next_state]
next_state = action(stack, lexer)
print(f"{input_str} is valid")
except ValueError as e:
print(f"{input_str} is invalid")

Why is the code above the implementation of a state machine?


That code represents a state machine (likely an LR parser) for recognizing parentheses, based
on the code you provided. The states and transitions in the figure correspond to the actions

122
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

performed by the functions shift_0, reduce_1, and shift_2 in your code. We can see it as
the following diagram:

State Machine Components:

●​ States:
○​ The figure has three labeled states: 0, 1, and 2.
○​ Each state corresponds to a specific action in the parser. For example:
■​ State 0: Shifts input into the stack (handled by shift_0 in your code).
■​ State 1: Reduces patterns in the stack (handled by reduce_1 in your
code).
■​ State 2: Handles additional input and shifts (handled by shift_2 in your
code).
●​ Transitions:
○​ The arrows between states indicate the transitions based on specific tokens.
○​ For instance:
■​ From state 0 to state 1, the transition happens if an RPR (right
parenthesis) token or EOF (end of file) is encountered. This is consistent
with the code in shift_0.
■​ From state 1 to state 2, it reduces a valid parentheses expression P ->
(P).
■​ From state 2 to state 1, the transition happens when an RPR token is
encountered.

123
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Breakdown of Code Logic:

1.​ Initial State (0):


○​ In state 0, the parser is waiting for tokens like LPR (left parenthesis) or RPR
(right parenthesis).
○​ The parser calls shift_0 when in state 0 to either shift LPR or RPR to the stack
or to accept the input if EOF is reached.
2.​ State 1 (Reduction):
○​ When the parser transitions to state 1, it attempts to reduce the stack by applying
reduction rules in reduce_1.
○​ The rule checks for parentheses matching patterns like (P) and reduces it to a
P.
○​ If valid reductions are made, it either transitions to state 1 or goes to state 2
depending on the stack contents.
○​ If the stack contains only an S, the parser halts successfully (valid input).
3.​ State 2:
○​ In state 2, the parser can handle additional tokens like RPR or EOF.
○​ If an RPR is found, the parser transitions to state 1, trying to reduce again.

Relation to the Figure:

●​ State 0 corresponds to the leftmost state in the diagram.


○​ This is where the parser starts, handling initial shifts of parentheses or EOF.
●​ State 1 handles reductions, trying to reduce valid patterns like P -> (P) or checking if
the input is fully parsed by matching S.
●​ State 2 is reached after a valid reduction in state 1, allowing the parser to shift more
parentheses or reach the end of the input.

Summary:

The figure is a finite state machine that works with an LR parser designed to match balanced
parentheses. Each state corresponds to an action (shift or reduce) in the parser, and the
transitions between states happen based on the input tokens (LPR, RPR, EOF). The program
you've provided implements this state machine, where the states and transitions in the code
map directly to the states and arrows in the diagram.

Which kind of parsing technique did we use in the above code?


Our parser implementation appears to be a shift-reduce parser based on a state machine
approach. This classification is evident from how the parser uses states to manage transitions
between shifting tokens onto the stack and reducing them according to grammar rules.

124
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Features of the Parser:

1.​ Shift and Reduce Actions: The parser uses functions (shift_0, shift_2,
reduce_1) to define actions based on the current state. It shifts tokens onto the stack or
reduces the stack contents when patterns that match the grammar are found.
2.​ State Management: The use of a state machine to dictate transitions allows for flexibility
in handling different parsing scenarios. The current state determines the action to take,
whether to shift or reduce, which is characteristic of shift-reduce parsers.
3.​ Non-Recursive: Unlike a recursive descent parser, this implementation does not rely on
recursive function calls. Instead, it manages the parsing process through an explicit
stack and state transitions, making it more iterative.
4.​ Handles Context Sensitivity: This parser can decide whether to shift or reduce based
on the current stack content and the tokens received, effectively handling the
dependencies inherent in the given grammar.

Overall, this parser effectively implements a bottom-up parsing strategy, classifying it as a


shift-reduce parser (or more generally, a bottom-up parser). It does not conform to LL(k)
parsing, as it handles more complex grammar dependencies through its shift-reduce
mechanism.

Are Bottom-Up Parsers and LR(k) parsers the same thing?


Bottom-up parsers and LR parsers are closely related, but they are not exactly the same.

Bottom-Up Parsers:

●​ General Category: Bottom-up parsing refers to any parsing strategy that starts from the
leaves (tokens) and works up to the root (the start symbol of the grammar). This
category includes several types of parsers, such as LR parsers, SLR parsers, and LALR
parsers.
●​ Reduction: Bottom-up parsers primarily focus on reducing sequences of input tokens
into non-terminals according to the grammar rules until they eventually derive the start
symbol.

LR Parsers:

●​ Specific Type: LR parsers are a specific kind of bottom-up parser. The "LR" stands for
"Left-to-right" scanning of the input and "Rightmost derivation" in reverse.
●​ State Machine: LR parsers use a finite state machine to decide when to shift (add
tokens to the stack) and when to reduce (replace a sequence of tokens with a
non-terminal).
●​ Powerful: LR parsers can handle a larger class of grammars than many other parsing
techniques, including some that are not suitable for LL parsing.

125
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In summary, while all LR parsers are bottom-up parsers, not all bottom-up parsers are LR
parsers. LR parsers are a particular implementation of the bottom-up approach, distinguished by
their method of processing input and managing states.

Is the parser we wrote above an LR parser?


Well, in a way, it's an LR parser; however, it was implemented by hand, so the state transition
table is not very explicit. While that parser employs a shift-reduce strategy similar to that used in
LR parsing, it lacks the formal structure and mechanisms that characterize a true LR parser.

What are the key characteristics of LR parsers?


1.​ Parsing Table: LR parsers typically use a parsing table generated from the grammar,
which defines the shift and reduce actions based on the current state and lookahead
token. Our parser uses a manually defined state machine instead.
2.​ State Management: LR parsers manage a more complex state system, incorporating
multiple states based on the grammar's structure and potential lookahead tokens. Our
parser has a simplified approach with a fixed number of states and lacks the systematic
handling of conflicts that LR parsers do.
3.​ Lookahead Handling: Full LR parsing generally supports lookahead to determine shifts
and reductions, which can help resolve ambiguities and conflicts. Our implementation, in
turn, relies solely on the current stack content and the immediate token.

How do they compare: the LL and the LR parsers?


LL(k) Parsers: These are top-down parsers that construct the parse tree from the top (starting
symbol) to the leaves. They are generally easier to implement but can struggle with
left-recursive grammars.
LR(k) Parsers: These are bottom-up parsers that build the parse tree from the leaves (input
tokens) to the top (starting symbol). They are more powerful than LL(k) parsers because they
can handle a broader set of grammars, including many left-recursive ones.

So, is there a context-free language that can be parsed with an LR parser,


but not with an LL parser?
Yes! Here's one example: the language with more left parentheses than right parentheses:

S ::= ( S | P
P ::= ( P ) | Empty

This grammar is not LL(k) for any k, even though it is a context-free language. Here's why: for a
language to be LL(k), the parser must be able to determine the correct production to use based
solely on the next k tokens of lookahead. However, in this grammar:

126
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The parser must decide whether to continue with S ::= ( S or switch to P. This
decision is based on whether there will eventually be a matching ) after a sequence of
(s, which could be far ahead in the input.
●​ When the parser sees a (, it doesn’t know whether this ( belongs to the S ::= ( S
branch, which could be part of an arbitrary sequence of ('s, or to the P ::= ( P )
branch, which requires matching bs later.

When we talk about LL and LR parsers, the first 'L' means that they read
the input from left to right. What about the second letter, L or R?
The second letter in LL and LR parsers refers to how the parser constructs the parse tree:

●​ LL: The second "L" stands for Leftmost derivation. An LL parser constructs a parse
tree by expanding the leftmost non-terminal first. This means it builds the parse tree from
the top down, choosing productions that match the leftmost part of the remaining input.
●​ LR: The "R" stands for Rightmost derivation in reverse. An LR parser constructs the
parse tree by reducing the rightmost non-terminal first, but it does so in reverse order.
This means it builds the parse tree from the bottom up, starting with the input symbols
and working backward to derive the start symbol of the grammar.

Can you give an example of rightmost derivation?


Sure. Consider the expression grammar that is left associative with multiplication having
precedence over addition:
E ::= E + T | T
T ::= T * F | F
F ::= ( E ) | n

Considering the sentence "n0 + n1 * n2", we would have the following derivations:

127
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you contrast rightmost derivation and leftmost derivation?


The difference between leftmost and rightmost derivations lies in the order in which the
non-terminal symbols in a grammar are expanded during the parsing process.

Derivations:

In context-free grammars, a derivation is a sequence of steps that shows how a string


(sentence) is generated from the start symbol of the grammar. At each step, a non-terminal is
replaced by one of the productions of the grammar. Let's go over an example with the followin
grammar, where the goal is to derive the string ab:

S→AB
A→a
B→b

Leftmost Derivation:

In a leftmost derivation, at every step of the derivation, the leftmost non-terminal (the first one
from the left) is expanded (replaced by its corresponding rule). This approach expands
non-terminals starting from the left side of the string.

1.​ Start from the start symbol S:​


S→A B

128
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ In leftmost derivation, we expand the leftmost non-terminal first (which is A):​
A B→a B
3.​ Now we expand the next leftmost non-terminal, which is B:​
a B→a b

This is the leftmost derivation because we always expand the leftmost non-terminal (A first,
then B).

Rightmost Derivation:

In a rightmost derivation, at every step of the derivation, the rightmost non-terminal (the one
furthest to the right) is expanded first. This approach expands non-terminals starting from the
right side of the string.

1.​ Start from the start symbol S:​


S→A B
2.​ In rightmost derivation, we expand the rightmost non-terminal first (which is B):​
A B→A b
3.​ Now we expand the next rightmost non-terminal, which is A:​
A b→a b

This is the rightmost derivation because we always expand the rightmost non-terminal (B first,
then A).

Connection to Parsers:

●​ LL parsers perform leftmost derivation:


○​ LL parsers read the input from left to right and construct a derivation by
expanding the leftmost non-terminal first. These parsers predict what production
rule to use based on the lookahead token.
●​ LR parsers perform rightmost derivation (in reverse):
○​ LR parsers also read the input from left to right, but instead of expanding the
leftmost non-terminal, they simulate a rightmost derivation in reverse (this is
known as shift-reduce parsing). LR parsers start with the entire string and work
backward, reducing the string step by step to the start symbol by applying
productions in reverse order (starting from the rightmost non-terminal).

Is there LL(0) and LR(0), or the smallest k is 1?


There are both kinds of parsers. LR(0) is more common, although of limited application. LL(0) is
even more restricted: each non-terminal of the grammar can be the left side of only one
production, e.g.:

129
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

S ::= ID <name> <CPF>

Assuming that <name> and <CPF> were already scanned by the lexer. LR(0) parsers, in turn,
are a bit more common. The parser that we have written in this class is LR(1). Based on the
stack, the state and the current token, they don't have any shift/reduce conflict.

Are LR(0) parsers and SLR (Simple LR) parsers the same thing?
LR(0) parsers and SLR (Simple LR) parsers are related but not the same. Here's the difference:
●​ Lookahead: LR(0) parsers do not use any lookahead, while SLR parsers use lookahead
derived from Follow sets.
●​ Parsing Table: The construction of the parsing table is different. SLR parsers can
handle grammars that would cause conflicts in an LR(0) parser's table.
●​ Power: SLR parsers are more powerful and can parse a broader set of grammars than
LR(0) parsers.

What is the relationship between these different parsers that we saw?


There is a figure showing this relationship in Modern Compiler Implementation in C, by Andrew
Appel:

130
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Understanding the Hierarchy in the Image

The diagram presents a hierarchy among different types of parsing strategies based on the
languages they can recognize:

1.​ LL(0) ⊆ LL(1) ⊆ LL(k):


○​ These represent top-down parsers, where LL(k) parsers read input from Left
to right and produce a Leftmost derivation, using k symbols of lookahead.
○​ LL(0) can handle only the simplest grammars, while LL(1) (single-symbol
lookahead) can handle a broader set. Increasing k allows LL(k) parsers to
handle even more languages.
2.​ LR(0) ⊆ SLR(1) ⊆ LALR(1) ⊆ LR(1) ⊆ LR(k):
○​ These represent bottom-up parsers, where LR(k) parsers read input from Left
to right and produce a Rightmost derivation in reverse, using k symbols of
lookahead.

131
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ LR(0) is the simplest class, with no lookahead, while increasing the complexity to
SLR(1), LALR(1), and LR(1) enables the recognition of more complex
languages.
3.​ LR(k) covers more languages than LL(k):
○​ LR(k) parsers can recognize a broader set of languages than LL(k) parsers for
any given k, which is reflected in the image by the nesting. This means there are
grammars that an LR(k) parser can handle but that an LL(k) parser cannot.

How This Relates to Recursive Descent

The recursive-descent parser from our previous class is typically an LL(k) parser. It parses the
input in a top-down manner with a lookahead of k. So:

●​ Recursive-Descent ≈ LL(k): The two are equivalent in terms of what they can parse,
though recursive-descent parsers are usually simpler implementations of LL(k) parsing.
●​ According to the figure, LL(k) parsers are properly contained within LR(k) parsers. This
means that LR(k) parsers can recognize a strictly larger class of grammars than LL(k).

Where does the Chomsky Hierarchy sit on the above diagram?


Well, the Chomsky Hierarchy talks about languages. The diagram from Appel's book talks about
grammars. Perhaps, we could have something like the figure below, which I got from Stack
Exchange:

132
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure produced by Abhishek Ghosh attempts to relate different types of grammars from the
Chomsky hierarchy with families of parsers.

Relation Between Parser Families and Language Families

The Chomsky hierarchy categorizes languages based on their complexity:

1.​ Regular languages (recognizable by finite automata)


2.​ Context-free languages (recognized by pushdown automata)
3.​ Context-sensitive languages (recognized by linear-bounded automata)
4.​ Unrestricted languages (recursively enumerable, recognized by Turing machines)

Meanwhile, parser families (like LL, LR, etc.) are tied to classes of grammars that define
specific languages. The hierarchy of parsers corresponds to the types of grammars and
languages they can recognize.

Grammar-Parser Correspondence

●​ LL(k) and LR(k) correspond to deterministic context-free grammars. These are the
grammars that can be parsed by deterministic parsers like LL and LR.

133
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Unambiguous grammars are a subset of context-free grammars that can be parsed by


deterministic parsers.
●​ Context-free grammars are recognized by pushdown automata, and parsers such as
LR(1), SLR(1), and LALR(1) recognize context-free languages that fit within specific
subclasses of grammars.
●​ Context-sensitive grammars go beyond context-free grammars and are harder to
parse using standard deterministic parsing algorithms like LR(k) or LL(k). These
grammars typically require more powerful models, like linear-bounded automata, for
parsing.
●​ Unrestricted grammars define recursively enumerable languages, which are beyond
the capabilities of any deterministic parser like LL(k) or LR(k).

Language Family Defined by Parsers

We can talk about a family of languages defined by parsers. For example:

●​ L(LL(k)) would refer to the set of languages that can be parsed by an LL(k) parser.
●​ L(LR(k)) would refer to the set of languages recognized by LR(k) parsers.

Each parser family defines a class of languages:

●​ L(LL(1)) is strictly smaller than L(LL(k)), since increasing k allows recognition of more
complex languages.
●​ L(LR(1)) is larger than L(LL(k)) for any fixed k, meaning LR(1) parsers can recognize all
LL(k) languages, but also some grammars that LL(k) parsers cannot.
●​ L(LR(k)) includes a broader class of context-free languages, and when k increases, the
set of recognizable languages also increases.

Does It Make Sense?

Yes, it makes sense to think about the hierarchy of parsers and their relation to the Chomsky
hierarchy. The figure from Stack Exchange aims to capture this idea by placing parsers like
LL(k) and LR(k) within a broader context of language classes.

So, in summary, talking about the language families recognized by different parser families
(like L(LR(k))) is meaningful. The figure shows that LR(k) parsers can handle deterministic
context-free grammars, and by extension, the more complex parsers can handle increasingly
larger subsets of context-free or even context-sensitive languages.

Are there languages that we cannot parse with LR(k) grammars?


Sure there are! Consider the C programming language, for instance. How do you parse the
following program?
int main() {
x * y;

134
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

}
Do you parse 'x' as a type (and then 'x * y' is declaring a variable called y that is a pointer to x),
or do you parse it as another variable (and then 'x * y' is a multiplication). The statement x *
y; can indeed be ambiguous in terms of parsing, but when it comes to actual compilation in C,
the compiler would reject it if x is not defined as a type or as a variable.

Key Points:

1.​ Ambiguity: In a purely syntactic sense, the statement can be interpreted in two ways: as
a declaration (x as a type) or as an expression (multiplication).
2.​ Compiler Behavior: If x is not defined as a valid type (like an int, float, or a
user-defined type) or as a valid global variable, then the compiler will reject the program
because it cannot resolve the symbol x.

From the above figures, it seems that LR parsers cannot deal with
ambiguous grammars, yet, they deal with the if-then-else/if-then commands
in languages like C and Java. How so?
LR parsers typically can't deal with ambiguous grammars directly. However, when handling
common ambiguous constructions like the dangling else problem in languages like C and
Java (where both if-then-else and if-then commands exist), LR parsers rely on
disambiguation rules embedded into the parsing process.

The Dangling else Problem:

This problem arises because, in grammars for languages like C or Java, both the if-then and
if-then-else constructs are allowed. When the parser encounters an if followed by a
nested if-then construct, it might not know which else corresponds to which if, leading to
ambiguity. Consider the following code:

if (a)
if (b)
foo();
else
bar();

There are two possible interpretations:

1. The else belongs to the first if:

if (a) {
if (b) foo();

135
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

} else {
bar();
}

2. The else belongs to the second if:

if (a) {
if (b) foo();
else bar();
}

How LR Parsers Resolve this Ambiguity:

LR parsers handle this ambiguity by introducing precedence and associativity rules during
the parsing process. Specifically, languages like C and Java follow a simple disambiguation
rule for the dangling else problem:

●​ "An else always matches the closest previous unmatched if."

This is a semantic rule that's enforced by the grammar design and the parser.

Implementation in the Grammar:

The grammar for an LR parser is written in such a way that it prioritizes matching an else with
the most recent if. Here’s a simplified grammar that demonstrates how this is handled:

stmt → if_stmt
| other_stmt

if_stmt → IF '(' cond ')' stmt


| IF '(' cond ')' stmt ELSE stmt

In this grammar:

●​ The production IF '(' cond ')' stmt ELSE stmt ensures that when an else is
encountered, it’s matched with the nearest preceding if.
●​ The production IF '(' cond ')' stmt allows if statements without an else.

LR Parsing Tables:

During parsing, the LR parser uses shift-reduce decisions based on the parsing table it
generates from the grammar. When the parser encounters an else, the table guides it to

136
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

reduce using the rule IF '(' cond ')' stmt ELSE stmt, ensuring that the else binds to
the closest if. This decision-making process is encoded in the parser’s shift-reduce actions:

●​ When the parser is in a state where it expects an else (right after a stmt that follows
an if without else), it shifts to process the else.
●​ The parser avoids reducing the earlier if production too early, ensuring the else can
be attached to the nearest if.

Example Parsing Process:

Consider the input:

if (a)
if (b)
foo();
else
bar();

1.​ First if: The parser shifts the first if and matches it to the condition a.
2.​ Nested if: The parser shifts the nested if and matches it to the condition b.
3.​ foo(): The parser shifts the foo() statement.
4.​ else: When the parser encounters else, it shifts and reduces using the rule that
matches else to the innermost if (if (b)), ensuring the correct parsing.

This mechanism ensures the parser always handles the dangling else by matching it with the
closest preceding if.

In Summary:

●​ LR parsers can't handle ambiguity directly, but they can resolve specific ambiguous
constructs like the dangling else through disambiguation rules built into the grammar
and the parser’s shift-reduce actions.
●​ The "match else to the closest if" rule is enforced through the structure of the
grammar and the decisions made by the parser’s parsing table.

And why does the C language require us to put declarations before uses. Is
it to help the parser?
Pretty much! The requirement for declarations before use in C serves primarily to inform the
parser and compiler about the nature of symbols—whether they are user-defined variables,
functions, or types.

137
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Type Information: Declarations provide the necessary type information that the compiler
needs to interpret symbols correctly. For example, without a declaration, the compiler
cannot determine whether x is a variable or a type, leading to ambiguity in our previous
example.
2.​ Correct Parsing: When the parser encounters a symbol, having prior declarations
allows it to resolve the symbol's meaning immediately. This avoids situations where the
parser must guess or defer the decision, which could lead to errors or incorrect
interpretations.

Are there other examples of parsing ambiguities in the C programming


language?
There are indeed several situations where the C compiler might encounter ambiguity in
determining whether a symbol is a variable or a type. Here are some additional examples:

Function Calls vs. Variable Declarations

●​ Function Call: a(b);


○​ If a is a function name, this is a function call.
○​ If a is a variable, it's a semantic error.
●​ Variable Declaration: a(b);
○​ If a is a type, this is a variable declaration.
○​ If a is a variable, it's a semantic error.

The compiler typically resolves this ambiguity by looking at the context. For example, if a is
followed by parentheses, it's more likely to be a function call.

Coercion vs. Binary Expression

●​ Coercion: (a)∗b; (a)-b;


○​ If a is a type, this is a type cast.
○​ If a is a variable, this is a binary expression.
●​ Binary Expression: (a)∗b; (a)-b;
○​ If a is a variable, this is a binary expression.
○​ If a is a type, it's a semantic error.

Again, the context helps the compiler decide. If a is followed by an operator, it's more likely to be
a variable.

Pointer Declaration vs. Multiplication

●​ Pointer Declaration: a ∗ b;

138
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If a is a type, this is a pointer declaration.


○​ If a is a variable, it's a semantic error.
●​ Multiplication: a ∗ b;
○​ If a is a variable, this is a multiplication.
○​ If a is a type, it's a syntax error.

The compiler usually relies on the context and the type of the operands to determine the correct
interpretation.

But then, how can the C compiler parse programs at all?


The C compilers employ a combination of lexical analysis, parsing, and semantic analysis to
resolve these ambiguities. They consider factors such as the context of the symbol, the types of
surrounding expressions, and the rules of the C language to make accurate decisions. In C, the
requirement to declare variables and types before using them is primarily to aid the parser in its
task of interpreting the code correctly.

So, in the end, C's grammar is not context free?


You're correct that while the C language grammar is often described as context-free, it does
have context-sensitive aspects due to the need for type and symbol resolution during parsing
and compilation. Here are some points to consider:

1. Context-Free Grammar:

●​ C can be defined by a context-free grammar (CFG) in terms of its basic syntax rules.
Many elements, such as expressions and statements, can be represented with CFG
constructs.

2. Context-Sensitive Features:

●​ Type Resolution: The interpretation of symbols as types or variables depends on their


declarations, which introduces context-sensitive requirements. The parser needs to
know whether a symbol has been declared and what type it corresponds to.
●​ Ambiguity: As you noted earlier, many constructs can lead to ambiguity that a
context-free grammar alone cannot resolve. For example, distinguishing between a
function call and a variable declaration requires knowledge of previous declarations.

3. Practical Implications:

●​ While parsers for C are often built using techniques that work well with context-free
grammars (like LR or LL parsing), additional mechanisms (like symbol tables) are
needed during semantic analysis to handle context-sensitive aspects.

139
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Conclusion

In summary, while the C grammar can be expressed as context-free for many syntactic
constructs, the language's practical parsing and compilation require context-sensitive
information to resolve ambiguities and ensure correct type handling. This is why C compilers
implement a two-phase process: parsing for syntax and semantic analysis for type and symbol
resolution.

Who invented the LL(k) and LR(k) parsers?


LL parsers (top-down parsers) were introduced by Donald Knuth in the early 1960s. He
described the LL(1) parsing algorithm in 1965, which parses input from left to right and
constructs a leftmost derivation of the sentence. The number k represents the number of
tokens of lookahead used to make parsing decisions.

Shortly after LL parsing, Knuth also developed LR(k) parsers (bottom-up parsers) in 1965,
which parse input from left to right and construct a rightmost derivation in reverse. The LR
parser is generally more powerful, capable of handling a broader class of grammars than LL
parsers.

What about recursive-descent parsers? Were they invented even before


LL(k)?
Recursive-descent parsing is a top-down parsing technique that predates formal LL(k) and
LR(k) parsers. It was known and used in practice even before the theoretical foundations of
parsing were formalized in the 1960s. This technique is based on using recursive procedures
(or functions) to process the structure of a grammar. Each non-terminal symbol in the grammar
is represented by a function, which recursively processes its production rules.

Early recursive-descent parsers were developed without formalizing the constraints of


lookahead (k tokens) or the strict classification of grammars like LL(k). These parsers can often
handle simple, predictive grammars (which later became the LL(1) class of grammars), but they
can fail with more complex grammars that require backtracking or are ambiguous.

When Algol was invented, there was not much parsing theory around. It
seems that much of this theory and practice was invented for Algol. Is that
true?
Yes, when ALGOL (Algorithmic Language) was first developed, formal parsing theory was not
yet well-established, and the process of parsing programming languages was still largely ad
hoc. ALGOL, particularly ALGOL 60, is a significant historical milestone because it influenced
the development of formal syntax description methods, including Backus-Naur Form (BNF),
and also contributed to the emergence of formal parsing techniques.

140
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Parsing ALGOL 60

ALGOL 60’s development began in the late 1950s, and it was first specified in 1960. At the time,
the formal understanding of parsers—such as LL(k) and LR(k)—had not yet been developed,
but ALGOL 60’s syntax played a pivotal role in shaping the future of programming languages
and parsing theory.

Context

1.​ Backus-Naur Form (BNF): One of the key innovations introduced with ALGOL 60 was
the use of BNF, which John Backus and Peter Naur developed to formally describe the
syntax of the language. This was a significant step toward a rigorous understanding of
grammar and parsing.
○​ BNF described context-free grammars, which later became the foundation for
parsing theory. However, BNF itself was not a parser but a way to formalize the
grammar in a way that was easier to implement and reason about.
2.​ Ad hoc and Recursive-Descent Parsers: Early ALGOL 60 compilers relied heavily on
recursive-descent parsing techniques, often implemented in an ad hoc manner.
Recursive-descent parsers are top-down parsers, which are hand-crafted to recognize
different productions of the language. These parsers were written directly from the
grammar, with one function per non-terminal symbol in the grammar. However,
recursive-descent parsers can struggle with certain constructs, particularly
left-recursion, which ALGOL 60's grammar possessed. This led to the need for some
clever modifications, such as:
○​ Refactoring left-recursive grammar rules into right-recursive ones to make
them work with recursive-descent parsers.
○​ Backtracking: Some early ALGOL compilers likely used backtracking in their
recursive-descent parsers when choices in the grammar were ambiguous.
3.​ Limited Parsing Theory: Since LL(k) and LR(k) parsing theory hadn’t been developed,
early ALGOL parsers were typically LL(1)-like recursive-descent parsers. They would
rely on just one token of lookahead or implement backtracking when needed. The
limitations of these parsers, such as their inability to efficiently handle certain grammars,
were tolerated or patched over by practical fixes, but they laid bare the challenges of
parsing more complex languages.
4.​ Practicality over Formal Theory: The main focus was getting a working compiler, so
developers often implemented whatever techniques worked in practice. These early
techniques were not yet formalized in the way that later parsers, such as LL(k) and
LR(k) parsers, were.

Influence on Parsing Theory

The difficulties encountered when trying to parse ALGOL 60 motivated significant advances in
parsing theory. For example:

141
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Donald Knuth developed LR parsing partly to handle the class of grammars that
ALGOL 60 fell into, as it was difficult to parse with the simpler LL(1) methods.
●​ The distinction between different grammar types (such as LL vs. LR) became clearer as
formal parser theory evolved in the 1960s, leading to better parsing techniques for
handling languages like ALGOL.

Later ALGOL Parsers

As parsing theory matured, subsequent implementations of ALGOL compilers were built on LR


parsers or LALR parsers, which could handle the language more effectively. These advances
made it possible to develop more efficient and robust parsers, avoiding the earlier pitfalls of ad
hoc recursive-descent methods.

In summary, ALGOL was initially parsed using recursive-descent parsers, often with
backtracking, at a time when formal parsing theory was still nascent. Its syntax and the
challenges it presented significantly influenced the development of BNF and more sophisticated
parsing algorithms like LR parsing, which were created in response to the difficulties of parsing
languages like ALGOL.

What is this BNF notation that is used to specify grammars?


BNF (Backus-Naur Form) is a formal notation used to describe the syntax of programming
languages, particularly context-free grammars. It was originally developed by John Backus
and Peter Naur in the late 1950s and early 1960s to describe the syntax of ALGOL 60. BNF
has since become one of the foundational tools for describing the syntax of many programming
languages and is still widely used today.

Components of BNF

BNF defines grammars in terms of productions, which describe how strings (sequences of
symbols) in a language can be generated. Each production consists of non-terminal symbols,
terminal symbols, and rules.

Here are the key components of BNF:

1.​ Non-terminal symbols:


○​ These are symbols that can be replaced by other symbols (both terminal and
non-terminal) according to the grammar rules.
○​ They usually represent syntactic categories, like expressions, statements, or
program structures.
○​ In BNF, non-terminals are often enclosed in angle brackets < >. For example:
<expression>, <term>, <identifier>.
2.​ Terminal symbols:

142
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ These are the basic, indivisible symbols of the language, often corresponding to
tokens like keywords, operators, or literals (e.g., if, +, 123).
○​ Terminal symbols are not replaced further in the grammar; they appear directly in
the input program.
3.​ Production rules:
○​ A production rule specifies how a non-terminal symbol can be replaced with a
sequence of non-terminal and terminal symbols.
○​ The general form of a rule is: <non-terminal> ::= <symbol sequence>.
The ::= symbol is read as "is defined as" or "can be replaced by."
○​ Multiple alternatives for a non-terminal can be specified using the vertical bar |,
meaning "or."

Example of a Simple BNF Grammar

Here’s a simple BNF grammar for arithmetic expressions:

<expression> ::= <term> | <expression> + <term> | <expression> -


<term>
<term> ::= <factor> | <term> * <factor> | <term> / <factor>
<factor> ::= <number> | <identifier> | ( <expression> )
<number> ::= [0-9]+
<identifier> ::= [a-zA-Z][a-zA-Z0-9]*

●​ <expression> can be a single <term>, or an addition or subtraction of two


<expressions>.
●​ <term> can be a multiplication or division of two <terms>, or a single <factor>.
●​ <factor> can be a number, an identifier, or a parenthesized <expression>.
●​ <number> represents a sequence of digits.
●​ <identifier> is a sequence of alphanumeric characters, starting with a letter.

How BNF Describes Syntax

BNF grammars describe the valid structure of sentences (or programs) in a language by
recursively defining how higher-level structures (like expressions or statements) can be
constructed from simpler ones. These definitions allow a parser to validate whether a given
input conforms to the language's syntax.

Example: Parsing an Arithmetic Expression

Consider the expression:​


x + 3 * (y - 2)

According to the BNF grammar for arithmetic expressions:

143
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ The top-level structure is an <expression>, which matches <expression> ::=


<expression> + <term>.
2.​ The left <expression> is <identifier>, which matches <term>, which is simply an
identifier (x).
3.​ The right <term> is <term> * <factor>, where <term> is the number 3, and the
<factor> is a parenthesized expression.
4.​ Inside the parentheses is another <expression>, which is y - 2.

This recursive process continues until all the input is matched to the grammar rules.

Variations of BNF

There are a few variations and extensions of BNF that are commonly used:

1.​ EBNF (Extended Backus-Naur Form): This version includes additional notations to
make grammars more concise and readable. For example, it includes operators like:
○​ [] to indicate optional elements.
○​ {} to indicate repetition (zero or more occurrences).
○​ () to group alternatives.
2.​ ABNF (Augmented BNF): This is a variation of BNF used in defining internet standards
(such as in RFCs). It is used in protocols like HTTP and SIP.

BNF’s Historical Significance

When ALGOL 60 was developed, there was a need for a precise way to define its syntax. John
Backus first proposed BNF as part of the ALGOL 60 report, and Peter Naur refined it. This was
one of the earliest formal methods for describing the syntax of a programming language, and it
had a profound impact on subsequent languages, parsing techniques, and compiler design.

BNF laid the groundwork for the formalization of context-free grammars and contributed directly
to the development of modern parsing techniques, including LL and LR parsing.

Summary of BNF

BNF is a formal grammar notation that defines how strings (programs) in a language are
generated by a set of production rules. It’s crucial for understanding and designing programming
languages, and it's a key tool for specifying syntax in language documentation, as well as for
implementing parsers.

144
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Grammars were invented by Noam Chomsky, right? But he did not seem to
have worked on the design of any programming language. So, what's his
actual contribution?
Noam Chomsky's work on formal grammars, especially his classification of grammars (now
known as the Chomsky hierarchy), had its roots in linguistics, not computer science. His
primary goal was to formalize how natural languages could be structured using rules, which led
to the development of formal language theory in the 1950s.

However, it was later that computer scientists saw the potential of Chomsky's work in defining
the syntax of programming languages. When people began designing high-level programming
languages like ALGOL in the late 1950s and early 1960s, they needed a formal way to describe
the syntax of these languages. Chomsky's grammars, particularly context-free grammars,
provided the ideal tool for this purpose.

How Chomsky’s Work Was Applied to Programming Languages:

1.​ Context-Free Grammars (CFGs):


○​ Chomsky’s context-free grammars (CFGs) are particularly useful for describing
the syntax of programming languages. A CFG allows recursive rules and is
powerful enough to describe constructs like loops, conditionals, and block
structures in programming languages.
○​ This was adopted by John Backus and Peter Naur in the development of
ALGOL 60, one of the earliest attempts to create a formal description of a
programming language's syntax. They used Backus-Naur Form (BNF), which is

145
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

essentially a notation for context-free grammars, directly borrowing from


Chomsky’s ideas.
2.​ Parsing Theory:
○​ Chomsky’s grammars allowed for the formal definition of languages, and soon
after, researchers started working on algorithms to parse these languages—that
is, automatically analyze and verify whether a string of text (a program) belongs
to a language defined by a certain grammar.
○​ This led to the development of parsing algorithms like LL and LR, and the
differentiation between top-down (LL) and bottom-up (LR) parsers. These
parsers are rooted in the work of Chomsky’s context-free languages.

Computer Science’s Borrowing of Chomsky's Work:

While Chomsky himself didn’t work directly on any programming language projects, his formal
language theory provided the foundation for the entire field of syntax analysis in computer
science, especially in the development of compilers and interpreters. His classification of
grammars became central to understanding which languages could be recognized and how to
construct parsers for programming languages.

Key contributions borrowed from Chomsky:

●​ Context-Free Grammars: Describing programming language syntax.


●​ Pushdown Automata: Formally underpinning parsing algorithms for CFGs.
●​ Hierarchy of languages: Used to classify the complexity of programming language
constructs.

Summary:

Chomsky’s work on grammars was originally aimed at understanding natural languages, but
computer scientists adapted his ideas to solve practical problems in programming language
design and parsing. His theoretical framework gave rise to the formal methods used to define
and parse the syntax of programming languages, thereby laying the groundwork for modern
compiler design and parsing theory.

146
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Parser Generators and Parser Combinators


Code: https://github.com/pronesto/DCC053_Public/tree/main/5_PracticalParsing

Let's recap: what is parsing?


Parsing is the process of analyzing a sequence of symbols (often text) to determine its structure
according to a set of rules, typically defined by a grammar. In programming, parsing takes
source code (in the ASCII format or similar) or data formats like JSON or XML and converts
them into a structured format, such as an Abstract Syntax Tree (AST), which represents the
hierarchical relationship of components. Parsing is a key step in tasks like compiling code,
interpreting commands, or reading configuration files, turning raw input into something that can
be understood and processed by a computer.

Where do we encounter parsers?


Parsers are encountered in many areas of computing:

1.​ Compilers and Interpreters: They parse source code written in programming
languages to generate intermediate representations, such as ASTs, which are then used
for further processing, optimization, or execution.
2.​ Data Serialization and Deserialization: Parsers are used to read and write data
formats like JSON, XML, and YAML, converting them between text and structured data
formats for software applications.
3.​ Configuration Files: Many applications use parsers to read configuration files, allowing
users to specify settings in a readable format that the software can interpret.
4.​ Natural Language Processing: Parsers analyze human language text to extract
meaningful information, such as in search engines, chatbots, and translation systems.
5.​ Web Browsers: They parse HTML and CSS to render web pages, turning markup and
style sheets into the visual content displayed on screens.
6.​ Data Query Languages: Parsers are used in database systems to interpret SQL
queries, translating them into actions that retrieve or modify data.

What's the difference between parsing and lexing?


Parsing and lexing are two distinct phases in processing text, often used together in compiler
design and data processing:

●​ Lexing: Also known as lexical analysis, lexing is the process of breaking down a
sequence of text into tokens. Tokens are meaningful units such as keywords, identifiers,
operators, and symbols. The lexer (or tokenizer) identifies these units based on
predefined patterns, often using regular expressions. For example, in the code x = 5 +
3, the lexer would produce tokens for x, =, 5, +, and 3.
●​ Parsing: Parsing takes the sequence of tokens produced by the lexer and analyzes their
syntactic structure according to grammatical rules. The parser constructs a hierarchical

147
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

structure, such as an Abstract Syntax Tree (AST), that reflects the relationships and
nesting of these tokens. Continuing with the example, the parser would build a tree that
shows the addition operation and the assignment.

In summary, lexing focuses on breaking text into tokens, while parsing focuses on interpreting
the syntactic structure of those tokens.

Why are context-free grammars so important for parsing?


Context-Free Grammars (CFGs) are important for parsing because they provide a formal way to
describe the syntax of programming languages and other structured text formats. Here’s why
CFGs are so important:

1.​ Formalism: CFGs offer a precise and well-defined set of rules for specifying the syntax
of a language. This formality ensures that parsers can be designed to correctly and
consistently interpret the structure of the text.
2.​ Hierarchical Structure: CFGs allow for the definition of nested structures, which are
common in programming languages and data formats. For instance, CFGs can describe
how expressions are nested within statements or how blocks of code are nested within
functions.
3.​ Ease of Parsing: Many parsing algorithms are designed to work specifically with CFGs.
For example, algorithms like LL(1) and LR(1) parsers are tailored to handle CFGs
efficiently. CFGs make it possible to use these algorithms to parse and process text
systematically.
4.​ Ambiguity Handling: CFGs can express different levels of grammar complexity,
including handling ambiguities through various techniques. While CFGs themselves can
be ambiguous, there are well-established methods for disambiguating grammars or
transforming them into a form suitable for parsing.
5.​ Flexibility: CFGs are expressive enough to model the syntax of many programming
languages, from simple expressions to complex language constructs. This flexibility
makes CFGs applicable to a wide range of applications beyond just programming
languages, such as configuration files and data serialization formats.

In summary, CFGs are essential because they provide a robust framework for defining and
understanding the syntax of languages, enabling the creation of effective and reliable parsers.

So, while context-free grammars are the key theory for parsing, regular
grammars are the key theory for lexing. Is that correct?
Yes, that's correct! Regular grammars are the key theory for lexing, while context-free grammars
are used for parsing. Here’s a brief overview of their roles:

●​ Regular Grammars: These are used for defining the patterns of tokens in lexing.
Regular grammars are less complex than context-free grammars and are well-suited for
specifying regular languages, which include the basic patterns that lexers need to

148
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

recognize, such as keywords, identifiers, literals, and symbols. Lexers use regular
expressions, which are equivalent to regular grammars, to identify and tokenize these
patterns in the text.
●​ Context-Free Grammars: These are used for parsing and defining the hierarchical
structure of a language. CFGs can handle nested and recursive structures, making them
ideal for specifying the syntax of programming languages and other complex data
formats. Parsers use CFGs to build structures like Abstract Syntax Trees (ASTs) based
on the tokens produced by the lexer.

In summary, regular grammars are used for the initial stage of text processing (lexing), where
the goal is to break the text into meaningful tokens, while context-free grammars are used for
the subsequent stage (parsing), where the goal is to interpret the syntactic structure of those
tokens.

Why can't Context-Free Grammars be used for parsing and lexing, if they
are strictly more powerful than Regular Grammars?
Context-Free Grammars (CFGs) are indeed more powerful than Regular Grammars, but their
increased expressiveness comes with complexities that make them less practical for lexing.
Here’s why CFGs are not typically used for lexing:

1.​ Complexity and Performance: Regular grammars, which are equivalent to regular
expressions, are simpler and can be processed efficiently with finite state machines
(FSMs). Lexing involves identifying tokens quickly and efficiently, which regular
expressions and FSMs handle well due to their linear time complexity. In contrast, CFGs
are more complex and often require algorithms with higher time and space complexity,
making them less suitable for the fast, lightweight processing required during lexing.
2.​ Different Requirements: Lexers and parsers have different goals. Lexers need to
recognize and extract discrete tokens from a stream of text, which often involves simple
pattern matching. CFGs, however, are designed to handle hierarchical structures and
nested patterns, which are not typically necessary for the tokenization phase. CFGs
excel at capturing the syntactic structure of a language but are overkill for the token
extraction process.
3.​ Finite State Machines vs. Parsing Algorithms: Lexing can be efficiently implemented
using finite state machines (FSMs) or regular expression engines, which are well-suited
for processing regular languages. Parsers, on the other hand, use more sophisticated
algorithms like LL, LR, or other parsing strategies that can handle the hierarchical and
recursive nature of CFGs. The complexity of these parsing algorithms is justified in the
parsing phase but is unnecessary for simple token recognition.
4.​ Token Boundaries: In lexing, tokens are typically bounded by simple delimiters or
patterns (e.g., whitespace, punctuation), which regular expressions handle effectively.
CFGs are better suited to handle nested and recursive constructs (e.g., nested
parentheses), which are not usually a concern in the tokenization phase.

149
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In summary, while CFGs are more powerful and can theoretically be used for lexing, the
practical requirements of lexing—such as speed, efficiency, and simplicity—make regular
grammars and finite state machines a more appropriate choice. CFGs are reserved for parsing,
where their ability to handle complex, nested structures is essential.

In our last class, we wrote an LR(1) parser. That was very complicated: we
had to design states and transitions by hand. Aren't there alternatives?
Manually implementing an LR(1) parser can be quite complex and involved, especially when
designing states and transitions. Fortunately, there are easier alternatives and tools available
today that can handle this complexity for you. Here are some common approaches and tools
used for parser generation:

1. Parser Generators

Parser generators automatically produce parser code based on a high-level description of the
grammar. This abstracts away much of the complexity involved in manually creating parsers.
Some popular parser generators include:

●​ ANTLR (Another Tool for Language Recognition): ANTLR is a powerful tool that
generates parsers, lexers, and tree parsers for a variety of programming languages. It
uses a high-level grammar definition and provides a range of features for building
complex language parsers.
●​ Bison: An extension of Yacc (Yet Another Compiler Compiler), Bison is a well-known
parser generator for C/C++ that handles LALR(1) grammars. It is widely used for
generating parsers and can be combined with Flex for lexing.
●​ PEG.js: For JavaScript, PEG.js generates parsers using Parsing Expression Grammars
(PEGs). It simplifies grammar definitions and provides a JavaScript parser generator.

2. Parser Combinators

Parser combinators provide a more flexible and modular approach to building parsers. Instead
of defining a complete grammar in one go, you combine simpler parsers to build more complex
ones:

●​ Parsec (Haskell): A library for writing parsers in Haskell using combinators. It provides a
high-level way to build parsers by combining smaller parsers.
●​ PyParsing (Python): A library for constructing grammars and parsers in Python using a
combinator-based approach. It allows for easy and intuitive grammar definitions.
●​ FastParse (Scala): A parser combinator library for Scala, offering a functional approach
to building parsers.

3. Parser Libraries

150
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

There are libraries available for various languages that provide built-in parsing capabilities for
common grammar types:

●​ Tree-sitter: An incremental parser used in many modern text editors to provide syntax
highlighting and code navigation.
●​ ANTLR4 Runtime Libraries: For various programming languages, including Java, C++,
Python, and more, making it easier to integrate ANTLR-generated parsers into your
projects.

Summary

While manually implementing an LR(1) parser is an educational exercise and useful for
understanding the underlying mechanisms, in practice, using parser generators, parser
combinators, or specialized libraries is much more efficient and manageable. These tools
abstract away the low-level details and allow developers to focus on defining and using
grammars without dealing with the complexities of state management and transition design.

What is a Parser Generator?


A parser generator is a software tool that takes a formal grammar specification as input and
produces source code for a parser. The generated parser can then be used to analyze and
process input text based on the grammar rules. Parser generators typically handle the following
tasks:

1.​ Lexical Analysis (Tokenization): Some parser generators include tools for generating
lexical analyzers (or lexers) that tokenize the input text. This is often done with a
separate tool or integrated with the parser generator.
2.​ Parsing: The core functionality of a parser generator is to produce code that can parse
the input according to the grammar rules. This involves creating state machines or tables
that represent the parsing logic.
3.​ Error Handling: Parser generators often include mechanisms for handling syntax errors,
providing useful feedback if the input does not conform to the grammar.
4.​ Tree Construction: Many parser generators build data structures like Abstract Syntax
Trees (ASTs) during parsing, which represent the hierarchical structure of the input.

How does a parser generator work?


1.​ Define the Grammar: You start by writing a formal grammar specification that defines
the syntax rules for the language or format you want to parse. This grammar is typically
written in a specific format supported by the parser generator.
2.​ Generate the Parser: The parser generator reads the grammar specification and
produces code for the parser. This code is often written in a programming language like
C, C++, Java, or Python, depending on the tool.

151
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Integrate and Use: You integrate the generated parser code into your application. You
can then use it to process input text, analyze it according to the grammar, and perform
actions based on the results.

Can you summarize the benefits of using parser generators?


●​ Efficiency: They automate the complex task of writing parsing code, saving time and
reducing errors.
●​ Maintainability: Changes to the grammar only require updating the grammar file, not the
parser code itself.
●​ Consistency: They ensure that the generated parser adheres to the specified grammar
rules and handles syntax correctly.

In summary, parser generators streamline the process of creating parsers by automating the
generation of parsing code based on formal grammar specifications. This allows developers to
focus on defining the language syntax rather than dealing with low-level parsing mechanics.

Let's demonstrate how parser generators work using Antlr. Have you ever
heard of Antlr before?
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator used to read,
process, execute, or translate structured text or binary files. It's widely used for building
compilers, interpreters, and data processors. ANTLR takes a grammar that defines the syntax of
a language and automatically generates a parser (and optionally a lexer) that can recognize and
process text according to that grammar.

ANTLR supports multiple languages, including Java, Python, C#, and JavaScript, making it
flexible for various development environments. Its grammars are highly expressive, supporting
context-free grammars, and it can automatically generate code to handle tokens, parse trees,
and syntax analysis.

Key features of ANTLR include:

●​ Lexer and Parser Generation: Automatically creates a lexer (to recognize tokens) and
a parser (to build the syntax tree) from a given grammar.
●​ Listener and Visitor Patterns: It generates base classes that you can extend to define
custom behaviors during the parsing process, making it easier to build syntax-directed
interpreters and translators.

By using ANTLR, developers can focus on defining the language structure through grammars,
and ANTLR takes care of the low-level details of writing parsers and lexers.

152
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How do we install and use Antlr?


I took these commands from this website. Execute the following two commands to download the
antlr-4.13.1-complete.jar into /usr/local/lib directory. You can store the .jar anywhere you want,
but then you need to adjust the next commands.

$ cd /usr/local/lib
$ curl -O http://www.antlr.org/download/antlr-4.13.1-complete.jar

Add that path in the CLASSPATH, so that Java can find that .jar i.e. ANTLR tool.

$ export CLASSPATH="/usr/local/lib/antlr-4.13.1-complete.jar:$CLASSPATH"

Check if everything works by doing:

$ java org.antlr.v4.Tool
ANTLR Parser Generator Version 4.13.1
...

We shall be using the Python port of Antlr. To install the Python3 runtime, just do:

$ pip install antlr4-python3-runtime

How can we define the grammar that recognizes arithmetic expressions


using the Antlr grammar language?

This grammar defines a simple structure for arithmetic expressions involving addition (+),
multiplication (*), parentheses, and numbers. It is written in ANTLR syntax and can be used to
generate a parser for these expressions. Let’s break down each part:

grammar Expr;

expr : expr '+' term | term;


term : term '*' factor | factor;
factor : '(' expr ')' | NUMBER;

NUMBER : [0-9]+;
WS : [ \t\r\n]+ -> skip;

The expr rule defines an expr (expression) as either:

●​ An expression followed by a + operator and a term (recursive definition), which


represents addition.
●​ Or simply a term on its own (base case).

153
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This allows parsing of expressions like 3 + 5, 2 + 4 + 6, etc.


The term rule defines a term as either:

●​ A term followed by a * operator and a factor (recursive definition), representing


multiplication.
●​ Or simply a factor (base case).

This allows parsing of terms like 3 * 5, 2 * 4 * 6, etc.


A factor can either be:

●​ A subexpression enclosed in parentheses, such as (3 + 4), which forces the


expression within the parentheses to be evaluated first.
●​ Or a NUMBER, which is a terminal representing a number.

The remaining rules define tokes:


●​ The NUMBER token is a lexer rule that matches one or more digits ([0-9]+), effectively
recognizing numbers like 3, 42, 100, etc.
●​ The WS rule matches any sequence of whitespace characters (spaces, tabs, and
newlines) and skips them. The -> skip directive tells the lexer to ignore these
characters when parsing input. This ensures that spaces or newlines between tokens
don’t affect the parsing process.

How do we generate the parser, given that we have the .g4 file?
If we have a .g4 file, containing a grammar like the one we just described, then you can
generate all the files that implement the parser with the command:

$ antlr4 -Dlanguage=Python3 Expr.g4

You will see that this command creates several python files, which implement the parser.

And how can we use these files to write a program that computes the value
of arithmetic expressions?
We can extend one of the classes produced by Antlr, the ExprListener. To compute the value
of an arithmetic expression using the ExprListener, we can extend it by implementing the
exitExpr, exitTerm, and exitFactor methods in ExprEval. We'll use these methods to
evaluate the arithmetic expressions as the parser traverses the tree. Specifically, we'll store
intermediate results and perform operations when we exit the relevant parse tree nodes.

154
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you show the implementation of a program that extends


ExprListner?
Sure. Here it is:

from ExprListener import ExprListener


from ExprParser import ExprParser

class ExprEval(ExprListener):

def __init__(self):
# Stack to store intermediate values during traversal
self.stack = []

def exitExpr(self, ctx: ExprParser.ExprContext):


# Check if the expression is of the form expr + term
if ctx.getChildCount() == 3:
right = self.stack.pop()
left = self.stack.pop()
if ctx.getChild(1).getText() == '+':
self.stack.append(left + right)
else:
raise ValueError(f"Unexpected operation {ctx.getChild(1).getText()}")

def exitTerm(self, ctx: ExprParser.TermContext):


# Check if the term is of the form term * factor
if ctx.getChildCount() == 3:
right = self.stack.pop()
left = self.stack.pop()
if ctx.getChild(1).getText() == '*':
self.stack.append(left * right)
else:
raise ValueError(f"Unexpected operation {ctx.getChild(1).getText()}")

def exitFactor(self, ctx: ExprParser.FactorContext):


# Handle parentheses and numbers
if ctx.getChildCount() != 3 and ctx.NUMBER() is not None:
# It's a number
self.stack.append(int(ctx.NUMBER().getText()))

def getResult(self):
# After parsing is complete, the result will be the last value in the stack
return self.stack.pop()

Explanation:

●​ exitExpr: If the current expression is of the form expr + term, it pops two values
from the stack (the left and right sub-expressions) and pushes their sum.
●​ exitTerm: If the current term is of the form term * factor, it pops two values (left
and right sub-terms) and pushes their product.

155
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ exitFactor: This handles parentheses or numbers. For parentheses, it simply leaves


the evaluated expression inside on the stack. For numbers, it converts the token text to
an integer and pushes it onto the stack.
●​ getResult: Returns the final result after the parsing process.

We still need a driver, eg., a program to run the instance of ExprEval. Can
you do it?
You can use the following code to run the parser and apply the ExprEval listener:

import sys
from antlr4 import *
from ExprLexer import ExprLexer
from ExprParser import ExprParser
from ExprEval import ExprEval

def main(input_stream):
# Create the lexer and parser
lexer = ExprLexer(InputStream(input_stream))
stream = CommonTokenStream(lexer)
parser = ExprParser(stream)

# Parse the input and create the parse tree


tree = parser.expr()

# Create the evaluator and walk the tree


eval_listener = ExprEval()
walker = ParseTreeWalker()
walker.walk(eval_listener, tree)

# Get the final result


result = eval_listener.getResult()
print(f"Result: {result}")

if __name__ == '__main__':
input_expr = input("Enter an expression: ")
main(input_expr)

This implementation walks through the tree produced by ANTLR and evaluates the arithmetic
expression based on the operations defined in the grammar. For instance:

% python3 main.py
Enter an expression: 2 + 3 * 4
Result: 14

156
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

But the implementation of ExprEval is still complicated. How is it that Antlr


simplifies parsing?
ANTLR simplifies the process of writing parsers by handling many of the complex, low-level
tasks that would otherwise need to be implemented manually. While the ExprEval.py might
still seem intricate, the real power of ANTLR lies in how it automates the process of tokenizing
and parsing input, and how it generates a structured framework for interpreting grammar-based
input.

Here’s how ANTLR makes parser development easier:

1. Automated Lexer and Parser Generation:

●​ Without ANTLR: If you were to write a parser manually, you would need to create your
own lexer (to identify tokens like numbers and operators) and design the entire parser by
handling token sequences, managing recursion, and ensuring correct grammar rule
application.
●​ With ANTLR: You only need to define your grammar in a high-level format (like the one
you provided), and ANTLR automatically generates both the lexer (which identifies
tokens) and the parser (which builds the parse tree and enforces grammar rules). This
means you don't need to manually write code to handle recursion, state transitions, or
token recognition—ANTLR does it all.

2. Listener/Visitor Framework:

●​ Without ANTLR: If you wrote a parser manually, you would need to create your own
logic to traverse the parse tree, identify which grammar rules were applied, and figure
out what to do when you encounter certain expressions (like addition, multiplication, or
parentheses).
●​ With ANTLR: ANTLR generates listener and visitor patterns, which allow you to easily
respond to parsing events. For example, in the ExprEval.py, the enter and exit
methods of ExprListener are invoked automatically during parsing. You simply need
to override these methods to execute your desired logic (like evaluating expressions).

3. Error Handling and Reporting:

●​ Without ANTLR: You would need to manually implement error handling to deal with
syntax errors or malformed input, which can be tricky.
●​ With ANTLR: ANTLR automatically provides mechanisms for syntax error detection and
reporting. You can customize error handling if needed, but the basic error checking is
already built into the generated parser.

4. Complex Grammar Support:

157
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Without ANTLR: Designing a parser for more complex grammars (e.g., involving
precedence, associativity, or recursive expressions) can be error-prone and tedious.
Manually handling operator precedence, for example, can become very complicated.
●​ With ANTLR: ANTLR handles precedence, associativity, and recursion for you. The
recursive nature of grammar rules (as in your expr rule) is managed seamlessly by
ANTLR, making it much easier to handle complex expressions like 3 + 4 * 5 where
multiplication has higher precedence than addition.

5. Multi-Language Support:

●​ Without ANTLR: Writing a parser in multiple languages (e.g., Python, Java, C#,
JavaScript) would require developing and maintaining separate implementations for
each language.
●​ With ANTLR: You can define the grammar once and generate parsers for different
languages simply by specifying the target language (-Dlanguage=Python3 for Python,
for example).

6. Parse Trees and Abstract Syntax Trees (AST):

●​ Without ANTLR: You would need to manually build and traverse parse trees or ASTs,
which requires extra effort.
●​ With ANTLR: The generated parser automatically builds a parse tree as it processes
input, and you can traverse this tree easily using listeners or visitors. This makes it
simpler to evaluate expressions, generate code, or perform other tasks based on the
input.

That's a lot of stuff! Let's try to summarize all that?


ANTLR simplifies parser development by allowing you to focus on the what (the grammar)
rather than the how (the implementation). Once you define the grammar, ANTLR automates the
complex details of building and traversing parse trees, handling precedence, and managing
recursion. This drastically reduces the effort and complexity compared to writing a parser from
scratch. Here's a conceptual comparison of writing a parser without ANTLR versus with
ANTLR in a table format:

Task Without ANTLR With ANTLR

Tokenizing Input Manually implement logic to Lexer is automatically generated


(Lexing) identify and extract tokens. based on the grammar.

Recursive Parsing Write recursive functions for each ANTLR generates the recursive
Logic grammar rule manually. parsing logic for grammar rules.

158
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Operator Manually write code to enforce ANTLR automatically handles


Precedence operator precedence and precedence/associativity through
Handling associativity. grammar definition.

Error Handling Implement custom error detection ANTLR provides built-in syntax
and reporting for syntax issues. error handling and reporting.

Tree Building Manually construct a parse tree Parse tree is automatically built
(Parse Tree/AST) or AST during parsing. during parsing.

Tree Traversal Write custom logic to walk Listeners/visitors are generated to


through the tree and perform facilitate easy traversal.
actions.

Multi-language Write separate parsers for each Write the grammar once and
Support target language (e.g., Java, generate parsers for multiple
Python). languages.

Precedence and Handle grammar precedence and Define precedence and associativity
Associativity associativity manually in the directly in the grammar, ANTLR
code. handles it.

Focus Focus on low-level Focus on grammar design and


implementation details high-level logic (evaluation, etc.).
(tokenization, recursion, etc.).

What if I wanted to extend the expression grammar with subtraction and


division, what would it look like?
The new grammar would contain two extra production rules, e.g.:

grammar Expr;

expr : expr '-' term | expr '+' term | term;


term : term '*' factor | term '/' factor | factor;
factor : '(' expr ')' | NUMBER;

NUMBER : [0-9]+;
WS : [ \t\r\n]+ -> skip;

And what would be the corresponding implementation of ExprEval?


We would have to add two extra cases to the code, one for each new production rule:

159
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class ExprEval(ExprListener):

...

def exitExpr(self, ctx: ExprParser.ExprContext):


if ctx.getChildCount() == 3:
right = self.stack.pop()
left = self.stack.pop()
if ctx.getChild(1).getText() == '+':
self.stack.append(left + right)
elif ctx.getChild(1).getText() == '-':
self.stack.append(left - right)
else:
raise ValueError(f"Unexpected operation {ctx.getChild(1).getText()}")

def exitTerm(self, ctx: ExprParser.TermContext):


if ctx.getChildCount() == 3:
right = self.stack.pop()
left = self.stack.pop()
if ctx.getChild(1).getText() == '*':
self.stack.append(left * right)
if ctx.getChild(1).getText() == '/':
self.stack.append(left / right)
else:
raise ValueError(f"Unexpected operation {ctx.getChild(1).getText()}")

...

Which kind of parser does Antlr produce?


ANTLR produces LL(*) parsers. Specifically, it generates parsers that are adaptive,
recursive-descent LL(*).

Here’s what this means:

1.​ LL: ANTLR parsers are top-down parsers that process input from left to right and build a
leftmost derivation of the input string. The "LL" refers to this left-to-right, leftmost
derivation process.
2.​ (*): The * means that ANTLR can look ahead at an arbitrary number of tokens to make
parsing decisions. Unlike traditional LL(k) parsers, which use a fixed amount of
lookahead (k), ANTLR’s LL(*) parser dynamically determines how much lookahead is
needed at each decision point.
3.​ Adaptive LL(*): ANTLR doesn't hard-code how many tokens it needs to look ahead.
Instead, it adapts the amount of lookahead based on the input it’s parsing, making it
more powerful than LL(k) parsers and able to handle complex grammars, including
certain ambiguous or nondeterministic constructs that would be difficult or impossible for
traditional LL parsers to manage.
4.​ Recursive-descent: The parsers that ANTLR generates are recursive-descent parsers,
meaning they use recursive functions to process each grammar rule, which is typical for
LL parsers.

160
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key points about ANTLR’s LL(*) parser:

●​ Direct Left Recursion: ANTLR can handle direct left recursion through an internal
transformation of the grammar rules.
●​ Backtracking: While ANTLR typically avoids backtracking due to its efficiency, LL(*)
parsing enables it to backtrack and explore multiple lookahead options if needed.
●​ Error Handling: ANTLR parsers include robust error recovery strategies, which are
well-suited for practical programming languages and tools.

In summary, ANTLR generates an adaptive LL(*) recursive-descent parser, which allows it to


be more flexible and capable of handling a broader range of grammars than traditional
fixed-lookahead LL(k) parsers.

But LL cannot deal with left recursion, right? And our example grammar is
left recursive. How come?
As for left recursion, classical LL parsers (including LL(k)) cannot handle it because they would
enter an infinite loop when encountering rules with left recursion. However, ANTLR can handle
direct left recursion through a specific transformation technique.

How ANTLR Handles Left Recursion

ANTLR rewrites left-recursive rules to make them non-left-recursive during parser generation. It
does this automatically through a process known as left-recursion elimination. The generated
parser no longer follows the standard LL(*) behavior for these rules but instead uses a more
specialized parsing strategy for handling recursion.

For example, consider the following left-recursive rule:

expr : expr '+' term


​ | term;

This rule is left-recursive because expr appears on the left-hand side of its own production.
Normally, this would cause infinite recursion in an LL parser. However, ANTLR rewrites this
internally into a non-left-recursive form that looks more like this:

expr : term ( '+' term )* ;

This transformation eliminates the direct left recursion by converting it into a loop that can be
handled by the LL(*) parsing strategy.

Indirect Left Recursion

161
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

ANTLR does not automatically handle indirect left recursion, where recursion occurs across
multiple rules. In such cases, you need to manually refactor the grammar to remove the
recursion or redesign it to avoid the issue.

Summary

●​ ANTLR produces LL(*) parsers, which extend LL(k) by allowing arbitrary lookahead.
●​ ANTLR automatically eliminates direct left recursion by transforming the grammar
during parser generation.
●​ It cannot automatically handle indirect left recursion, which needs to be addressed
manually.

This flexibility is one of the reasons ANTLR can handle a wide range of grammars, including
those that would traditionally be problematic for LL parsers.

So, I understand that another way to produce parsers is through parser


combinators. What's that?
Parser combinators are a powerful and flexible approach to building parsers by combining
small, reusable parsing functions or "combinators" to construct more complex parsers. They
offer a different paradigm from traditional parser generators like ANTLR. Fundamentally, parser
combinators are higher-order functions that accept parsers as input and return new parsers as
output. By combining these combinators, you can build complex parsers from simpler ones.

How Do Parser Combinators Work?


1.​ Basic Parsers: Start with basic parsers that can handle simple tasks, like parsing
individual characters or fixed sequences of characters.
2.​ Combinators: Use combinators to combine these basic parsers into more complex
ones. Combinators can sequence parsers, choose between them, or apply
transformations to their results.
3.​ Composition: The idea is to compose parsers in a modular fashion, enabling you to
build and extend parsers easily.

Can you give me an example showing how a parser combinator works?


Sure! Let's write a simple parser to read and print dates in the YYYY-MM-DD format using
parsy. Here's the code:

import sys
from datetime import date
from parsy import regex, seq, string, ParseError

def build_parser():
year = regex("[0-9]{4}").map(int).desc("4 digit year")
month = regex("[0-9]{2}").map(int).desc("2 digit month")

162
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

day = regex("[0-9]{2}").map(int).desc("2 digit day")


dash = string("-")

fulldate = seq(
year=year << dash,
month=month << dash,
day=day,
).combine_dict(date)

return fulldate

def parse_dates():
fulldate = build_parser()
try:
for line in sys.stdin:
parsed_date = fulldate.parse(line.strip())
print(parsed_date)
except ParseError:
print("Done parsing.")

parse_dates()

The program reads a string in the format YYYY-MM-DD and parses it into a Python date object.
If the input is in the correct format, it prints the resulting date object; otherwise, it raises a
parsing error.

This approach simplifies parsing by breaking the task down into modular components (year,
month, day) and using clear and concise syntax to combine these components into a working
parser. Here's an explanation of the key points of your code using Parsy:

1. Imports:

●​ sys: Used to read input from standard input (sys.stdin).


●​ datetime.date: Represents a calendar date (year, month, and day).
●​ parsy.regex, parsy.seq, parsy.string: These are the core building blocks from
the Parsy library for creating parsers:
○​ regex: Parses text matching a regular expression.
○​ seq: Chains multiple parsers together sequentially.
○​ string: Parses exact string matches (like "-").
●​ parsy.ParseError: Exception type raised if parsing fails.

2. The build_parser function:

●​ This function defines a parser for dates formatted as YYYY-MM-DD:


●​ year:
○​ regex("[0-9]{4}"): Parses a sequence of exactly four digits (the year).
○​ .map(int): Converts the string of digits into an integer.

163
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ .desc("4 digit year"): Adds a descriptive label for debugging or error


reporting.
●​ month:
○​ Similar to year, but parses a two-digit month.
●​ day:
○​ Parses a two-digit day in the same manner.
●​ dash:
○​ string("-"): Parses exactly the string "-", used as the delimiter between the
year, month, and day.
●​ Combining the pieces with seq:
○​ seq(year=year << dash, month=month << dash, day=day):
■​ This creates a sequence of parsers: it first matches the year followed by
a dash (<< dash), then matches the month followed by another dash,
and finally the day.
■​ seq returns a dictionary-like structure where keys (year, month, day)
map to the parsed values.
○​ .combine_dict(date): This step takes the parsed year, month, and day
and combines them into a date object from the datetime module.
●​ Return value:
○​ The function returns the complete parser for a full date in the format
YYYY-MM-DD.

3. The parse_dates function:

●​ It calls the build_parser function to get the date parser (fulldate).


●​ Reading input:
○​ sys.stdin: Reads lines of input from the standard input (can be used in
interactive or file input contexts).
○​ For each line, it strips surrounding whitespace with line.strip() and passes
it to the parser with fulldate.parse().
●​ Error handling:
○​ If parsing fails, a ParseError is raised, and the function prints "Done
parsing.".

4. Key points about Parsy:

●​ Parser Composition: Parsy allows you to break down complex parsing tasks into
smaller, composable pieces. Each part of the date (year, month, day) has its own parser,
and these are combined sequentially with seq.
●​ Chaining Parsers: The << operator is used to chain parsers together, enforcing the
presence of delimiters like dashes ("-").

164
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Mapping and Transforming: .map(int) is used to transform the parsed string into an
integer. Parsy makes it easy to transform input data while parsing.
●​ Combining into a Result: .combine_dict(date) takes the parsed dictionary of
values and uses them to create a date object, which is returned after successful
parsing.

Can we use combine_dict to produce objects other than date?

The combine_dict method in Parsy can be used to combine parsed values into any kind of
object, not just date objects. The key feature of combine_dict is that it collects the parsed
results into a dictionary, where each key corresponds to a named part of the parsing result, and
then it passes this dictionary to a function or constructor that you provide. The function can
produce any kind of object, depending on what you want.

In your code, combine_dict(date) is being used to take the parsed year, month, and day
from the input (as dictionary keys) and then call the date constructor from the datetime
module to create a date object.

Here’s a breakdown of what’s happening:

1.​ The parser extracts the year, month, and day from the input string and maps them to
integers.
2.​ These values are labeled as year, month, and day (because of the named arguments
in seq(year=..., month=..., day=...)).
3.​ combine_dict takes these values and calls the date constructor like
date(year=..., month=..., day=...), producing a date object.

However, you can pass any other function or constructor to combine_dict. For example, if
you wanted to produce a dictionary or some custom object instead of a date object, you could
define a different function or class.

Example: Producing a dictionary instead of a date object

If you wanted to simply return the parsed values as a dictionary:

def build_parser():
year = regex("[0-9]{4}").map(int).desc("4 digit year")
month = regex("[0-9]{2}").map(int).desc("2 digit month")
day = regex("[0-9]{2}").map(int).desc("2 digit day")
dash = string("-")

fulldate = seq(
year=year << dash,
month=month << dash,
day=day,

165
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

).combine_dict(lambda d: d) # Just returns the dictionary directly

return fulldate

This would output a dictionary like:


{'year': 2023, 'month': 10, 'day': 17}

Example: Creating a custom class

You could also pass a constructor for a custom class:

class CustomDate:
def __init__(self, year, month, day):
self.year = year
self.month = month
self.day = day

def __repr__(self):
return f"CustomDate(year={self.year}, month={self.month}, day={self.day})"

def build_parser():
year = regex("[0-9]{4}").map(int).desc("4 digit year")
month = regex("[0-9]{2}").map(int).desc("2 digit month")
day = regex("[0-9]{2}").map(int).desc("2 digit day")
dash = string("-")

fulldate = seq(
year=year << dash,
month=month << dash,
day=day,
).combine_dict(CustomDate) # Uses CustomDate constructor

return fulldate

This would produce a CustomDate object like:


CustomDate(year=2023, month=10, day=17)

So, combine_dict is flexible—it can produce any object as long as you provide the
appropriate function or constructor.

Can you modify the implementation a bit, to print the number of days from
the given date to the current date?
To calculate the number of days between each parsed date and the current date, you can
modify the parse_dates function to use Python's datetime.date object and calculate the
difference between the parsed date and the current date using subtraction.

Here’s the modified version of your code:

166
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

import sys
from datetime import date
from parsy import regex, seq, string, ParseError

def build_parser():
year = regex("[0-9]{4}").map(int).desc("4 digit year")
month = regex("[0-9]{2}").map(int).desc("2 digit month")
day = regex("[0-9]{2}").map(int).desc("2 digit day")
dash = string("-")

fulldate = seq(
year=year << dash,
month=month << dash,
day=day,
).combine_dict(date)

return fulldate

def parse_dates():
fulldate = build_parser()
try:
for line in sys.stdin:
parsed_date = fulldate.parse(line.strip())
today = date.today()
delta = today - parsed_date
print(f"Parsed Date: {parsed_date}, Days until today: {delta.days}")
except ParseError as e:
print(f"Parse error: {e}")

parse_dates()

Changes:

1.​ Difference Calculation: After parsing each date, I calculate the difference between the
parsed date and the current date using today = date.today() and delta =
today - parsed_date.
2.​ Printing the Result: For each line, the script now prints both the parsed date and the
number of days from that date to today.

Could you further modify the parser, so that we could accept dates like
2017-01-02, or 2017-1-02, or 2017-01-2 or 2017-1-2?
Sure! To make the parser more lenient and accept dates in formats such as 2017-01-02,
2017-1-02, 2017-01-2, or 2017-1-2, you need to adjust the regular expressions for months
and days to allow for one or two digits.

Here’s the modified build_parser function:

from parsy import regex, seq, string


from datetime import date

167
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def build_parser():
year = regex("[0-9]{4}").map(int).desc("4 digit year")
# Allow one or two digits for month and day
month = regex("[0-9]{1,2}").map(int).desc("1 or 2 digit month")
day = regex("[0-9]{1,2}").map(int).desc("1 or 2 digit day")
dash = string("-")

fulldate = seq(
year=year << dash,
month=month << dash,
day=day,
).combine_dict(date)

return fulldate

Explanation:

1.​ Month and Day Changes: I changed the regular expressions for month and day from
"[0-9]{2}" to "[0-9]{1,2}", allowing either one or two digits for both.
2.​ Functionality: This parser will now accept any date in the format YYYY-MM-DD, where
the month and day can be written with one or two digits.

I would like to modify the parser to read the name of a person before the
date, as in "Fernando Pereira: 1980-01-07". Could you do it?
To handle this new input format, we need to modify the parser to recognize the person's name
followed by the date. The name can be a combination of letters and spaces, and we’ll separate
the name from the date by a colon and a space (: ). Then we’ll combine the name with the date
parsing to calculate the number of days and print the appropriate message.Here’s the modified
code:

import sys
from datetime import date, datetime
from parsy import regex, seq, string, ParseError

def build_parser():
# A name can contain letters and spaces
name = regex("[A-Za-z ]+").desc("person's name")
year = regex("[0-9]{4}").map(int).desc("4 digit year")
month = regex("[0-9]{1,2}").map(int).desc("1 or 2 digit month")
day = regex("[0-9]{1,2}").map(int).desc("1 or 2 digit day")
dash = string("-")
colon_space = string(": ").desc("colon and space after name")

# Full date parsing remains the same


fulldate = seq(
year=year << dash,
month=month << dash,
day=day,

168
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

).combine_dict(date)

fullparser = seq(
name=name << colon_space,
birthdate=fulldate
)

return fullparser

def parse_dates():
fullparser = build_parser()
today = datetime.today().date()

try:
for line in sys.stdin:
parsed_data = fullparser.parse(line.strip())
person_name = parsed_data['name']
birthdate = parsed_data['birthdate']
delta = today - birthdate
print(f"{person_name} lived {delta.days} days.")
except ParseError as e:
print(f"Error parsing input: {e}")

parse_dates()

Changes:

1.​ Name Parsing: I added a name parser using the regular expression "[A-Za-z ]+" to
match a sequence of letters and spaces.
2.​ Colon Separator: I added a colon_space parser to match the : separating the name
and the date.
3.​ Full Parsing: The fullparser now combines the name and birthdate using seq to
create a parser that expects both parts in the correct order.
4.​ Output Format: The function now prints how many days each person has lived, based
on the parsed input.

What are the advantages and disadvantages of parser combinators?


Parser combinators are a flexible and modular approach to parsing where small, reusable
parsers are combined to create more complex parsers. They offer a high degree of flexibility and
ease of use but might have performance and error handling trade-offs compared to traditional
parser generators.

Advantages of Parser Combinators

1.​ Modularity: Parser combinators promote modular design. You build small parsers for
individual components and then combine them to handle more complex structures.
2.​ Flexibility: You can easily extend or modify parsers by adding or changing combinators
without altering existing parsers significantly.

169
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Ease of Use: They allow you to write parsers directly in the programming language of
your choice, often making them more intuitive and easier to integrate with the rest of
your code.
4.​ Declarative Style: Parser combinators often lead to a more declarative style of
programming, where you describe what you want to parse rather than how to parse it.

Disadvantages

1.​ Performance: Parser combinators can sometimes be less efficient compared to


specialized parser generators like ANTLR, especially for very complex grammars or
large inputs.
2.​ Error Reporting: While flexible, error reporting can be less sophisticated compared to
tools that generate dedicated error handling code.

Let's compare parsy with Antlr. How would the days-since-birth app be
implemented in Antlr?
Let’s walk through how you can implement a similar parser in ANTLR.

Step 1: Define the Grammar File (DateParser.g4)

You can define the grammar for parsing names and dates in ANTLR in a file called
DateParser.g4:

grammar DateParser;

// Root rule
entry: person_name COLON_SPACE date EOF;

// Name parsing rule: allows letters and spaces


person_name: NAME (SPACE NAME)*;

// Date parsing rule: YYYY-MM-DD


date: year DASH month DASH day;

// Tokens for year, month, day


year: DIGIT DIGIT DIGIT DIGIT;
month: DIGIT DIGIT?;
day: DIGIT DIGIT?;

// Lexer rules
NAME: [A-Za-z]+;
SPACE: ' ';
DASH: '-';
COLON_SPACE: ': ';
DIGIT: [0-9];
WS: [ \t\r\n]+ -> skip; // Skipping whitespace

170
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ entry: This is the root rule that matches a person’s name, followed by a colon and
space (COLON_SPACE), and then the date.
●​ person_name: Matches the name, which consists of one or more words (NAME),
separated by spaces.
●​ date: Matches a date in the format YYYY-MM-DD.
●​ year, month, day: These rules ensure proper date format.

Step 2: Generate the ANTLR Parser

To generate the Python parser from this grammar, you need to run ANTLR.

1.​ Download ANTLR: ANTLR


2.​ Run the following command to generate the Python parser:

antlr4 -Dlanguage=Python3 DateParser.g4

This will generate files such as DateParserLexer.py, DateParserParser.py, etc.

Step 3: Implement the Python Script

Now you can write a Python script to use the generated parser and process the parsed data.
Below is the Python code for parsing input and calculating the number of days since the birth
date:

import sys
from datetime import date, datetime
from antlr4 import *
from DateParserLexer import DateParserLexer
from DateParserParser import DateParserParser
from DateParserListener import DateParserListener
from antlr4.error.Errors import ParseCancellationException

class DateListener(DateParserListener):
def __init__(self):
self.name = None
self.year = None
self.month = None
self.day = None

def exitPerson_name(self, ctx: DateParserParser.Person_nameContext):


# Combine all parts of the name into a single string
self.name = ' '.join(ctx.getText().split())

def exitDate(self, ctx: DateParserParser.DateContext):


# Extract year, month, day from the context and store them
self.year = int(ctx.year().getText())
self.month = int(ctx.month().getText())
self.day = int(ctx.day().getText())

171
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def main():
today = datetime.today().date()

try:
for line in sys.stdin:
# Create a new input stream and lexer/parser
input_stream = InputStream(line.strip())
lexer = DateParserLexer(input_stream)
stream = CommonTokenStream(lexer)
parser = DateParserParser(stream)

# Add an error listener to handle parse errors


parser.removeErrorListeners()
parser.addErrorListener(DiagnosticErrorListener())

# Parse the input and walk the parse tree


tree = parser.entry()
listener = DateListener()
walker = ParseTreeWalker()
walker.walk(listener, tree)

# Calculate days from birthdate


birthdate = date(listener.year, listener.month, listener.day)
delta = today - birthdate
print(f"{listener.name} lived {delta.days} days.")

except ParseCancellationException as e:
print(f"Error parsing input: {e}")

if __name__ == '__main__':
main()

Explanation:

1.​ DateListener: This class extends the ParseTreeListener from ANTLR. It listens for
exit events when the parser has finished processing a person’s name and date, and
stores the parsed information (name, year, month, day).
2.​ main():
○​ For each line of input, an ANTLR input stream, lexer, and parser are created.
○​ The DateListener walks the parse tree, collecting information.
○​ Once the listener has gathered the name and date, the script calculates how
many days have passed since the birthdate and prints the result.
3.​ Error Handling: The script catches ParseCancellationException to handle parse
errors gracefully.

Step 4: Running the Script

You can run the script like this:

python3 DateListener.py < dates1.txt

172
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you summarize the differences between Antlr and Parsy using a table?
ANTLR is generally more powerful and suitable for larger and more complex grammars. It
provides a clearer separation between lexing and parsing phases and supports advanced
features such as error recovery. However, for small and simple parsers like date parsing, Parsy
offers a quicker and more Pythonic approach without the need to generate code from a
grammar file. Here's a summary of differences:

Feature Parsy ANTLR

Parser Uses Python code and regex-based Uses grammar files to define
Definition parsers parsing rules

Composability Easy to compose parsers in Python More declarative, generates a full


parser

Lexer Built-in with the regex parser Separate lexer defined in grammar

Error Handling Exception-based with ParseError More advanced error handling with
listeners

Use Case Simple, small parsing tasks Large, complex language grammars

Customization Straightforward in Python Modifiable via listeners and custom


actions

Could you list and summarize some of the parser generators that exist out
there?
Here's a brief overview of some popular parser generators that one should consider. These
tools differ in terms of flexibility, ease of use, and performance. Depending on the complexity of
the grammar and the environment, different parser generators might be more appropriate:

1. ANTLR (ANother Tool for Language Recognition)

●​ Languages: Supports multiple languages including Java, Python, C#, JavaScript, etc.
●​ Overview: ANTLR is a powerful tool for generating parsers for any language, and it
provides built-in support for generating parsers, lexers, and even listeners and visitors
for tree traversal. It's widely used in both academia and industry for creating interpreters,
compilers, and DSLs (domain-specific languages).
●​ Example Use: It’s excellent for building parsers with well-defined grammars for complex
languages like SQL, programming languages, or configuration files.

2. Bison

173
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Languages: Primarily used with C and C++.


●​ Overview: Bison is a general-purpose parser generator developed by the GNU Project.
It is a successor of Yacc and is used to convert a formal grammar description for a
language into C/C++ code that can parse that language.
●​ Example Use: Often used in systems-level projects, for developing compilers and
interpreters for various programming languages.

3. PEG.js

●​ Languages: JavaScript
●​ Overview: PEG.js is a simple parser generator for JavaScript that uses Parsing
Expression Grammar (PEG) instead of context-free grammars like ANTLR. It’s
lightweight and easy to use, making it suitable for web-based applications and quick
parsing tasks.
●​ Example Use: Useful for building parsers for web-based DSLs or small configuration file
formats.

4. Lemon

●​ Languages: C
●​ Overview: Lemon is a parser generator developed by D. Richard Hipp (the author of
SQLite). It is designed to be simpler and more efficient than Yacc/Bison. Lemon
generates parsers for LALR grammars.
●​ Example Use: SQLite uses Lemon to handle SQL parsing. It's efficient and well-suited
for embedded systems or software requiring high performance.

5. Parboiled

●​ Languages: Java, Scala


●​ Overview: Parboiled is a lightweight parser generator for Java and Scala. Unlike
traditional parser generators, it allows you to define the grammar directly in code, making
it more readable and easily modifiable.
●​ Example Use: Parboiled is often used in Scala projects where DSLs or simpler
language parsers are needed.

6. PLY (Python Lex-Yacc)

●​ Languages: Python
●​ Overview: PLY is a Python implementation of the traditional Lex and Yacc tools,
providing a way to write lexers and parsers in Python. It’s particularly useful for Python
developers who need to quickly create parsers using Python.
●​ Example Use: Useful for Python-based compilers, interpreters, or even parsers for
specific file formats or protocols.

174
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

7. Ragel

●​ Languages: Multiple, including C, C++, and Java


●​ Overview: Ragel is a state machine compiler that allows you to define complex lexical
analyzers and parsers using finite state machines. It’s a lower-level tool compared to
others like ANTLR but provides fine control over state machines.
●​ Example Use: Ideal for performance-sensitive applications, such as parsing network
protocols or other binary data formats.

8. Spirit (Boost)

●​ Languages: C++
●​ Overview: Spirit is a library in the Boost C++ Libraries that allows for writing parsers
directly in C++ using template metaprogramming. It provides a very expressive and
powerful way to write recursive-descent parsers.
●​ Example Use: Suitable for C++ developers looking to create complex parsers with full
integration in C++ projects.

9. Flex & Bison

●​ Languages: C/C++
●​ Overview: Flex is a fast lexical analyzer generator, often used with Bison. Flex is used to
recognize tokens from the input, and Bison generates the parser.
●​ Example Use: Commonly used together for creating compilers, interpreters, or any tool
that needs to understand a language’s structure.

Cool! What about parser combinators?


Here's a summary of popular libraries and frameworks that implement parser combinators:

1. Parsy

●​ Languages: Python
●​ Overview: Parsy is a simple and lightweight parser combinator library for Python. It
allows you to construct complex parsers by combining small, reusable parsers. It’s
particularly easy to use and integrate into Python projects.
●​ Example Use: Great for parsing simple DSLs, date formats, and custom file formats
within Python applications.
●​ Why Use It: Easy to learn, integrates well with Python code, and is great for lightweight
parsing tasks.

2. Parsec

●​ Languages: Haskell

175
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Overview: Parsec is one of the most well-known parser combinator libraries, developed
in Haskell. It allows you to build parsers for complex languages in a modular and
readable way, where parsers are defined as first-class values.
●​ Example Use: Ideal for parsing complex languages, building interpreters, or working
with DSLs in Haskell.
●​ Why Use It: Parsec is highly expressive and can handle complex, recursive grammars
with ease. It’s used extensively in Haskell for language and data parsers.

3. Scala Parser Combinators

●​ Languages: Scala
●​ Overview: This library is a parser combinator framework in the Scala standard library. It
allows developers to write parsers as Scala code, composing smaller parsers into larger
ones, making it ideal for DSLs and other parsing tasks.
●​ Example Use: Useful for creating parsers for DSLs in Scala, as well as processing
complex structured data.
●​ Why Use It: Seamlessly integrates with Scala’s functional programming style, making
parsers highly compositional and expressive.

4. Attoparsec

●​ Languages: Haskell
●​ Overview: Attoparsec is a fast, efficient parser combinator library in Haskell, designed
for parsing binary and text data. It emphasizes high performance and efficiency,
especially for network protocols or large data streams.
●​ Example Use: Parsing large text files or binary protocols, such as JSON or HTTP.
●​ Why Use It: Offers better performance compared to Parsec in some scenarios,
especially when working with large data streams.

5. Elm Parser

●​ Languages: Elm
●​ Overview: Elm’s parser library provides parser combinators for building parsers in the
Elm programming language. It follows a similar approach as Parsec, allowing complex
grammars to be built using small parsers.
●​ Example Use: Building custom parsers for Elm-based applications, such as parsing
JSON or custom markup languages.
●​ Why Use It: Works natively in Elm, a functional language for building web applications,
making it a good fit for web-based parsing tasks.

6. FastParse

●​ Languages: Scala

176
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Overview: FastParse is a fast and efficient parser combinator library for Scala. It’s
designed to be both easy to use and high-performance, making it suitable for
production-level parsing tasks.
●​ Example Use: Parsing complex text-based formats like programming languages or
configuration files.
●​ Why Use It: Combines ease of use with performance, and it’s especially good for
situations where you need to build a custom parser quickly but also need good runtime
efficiency.

7. PARSIMONIOUS

●​ Languages: Python
●​ Overview: Parsimonious is another Python library that uses a parsing expression
grammar (PEG) approach, which is closely related to parser combinators. It allows for
creating simple, elegant parsers with Pythonic syntax.
●​ Example Use: Parsing structured data formats or defining grammars for custom DSLs in
Python.
●​ Why Use It: It provides a concise syntax for grammar definitions and integrates well with
Python projects.

8. FParsec

●​ Languages: F#
●​ Overview: FParsec is a parser combinator library for F#, designed for building parsers in
a functional style. Like Parsec in Haskell, it is highly modular and allows for defining
parsers by combining smaller ones.
●​ Example Use: Useful for writing parsers for DSLs, programming languages, or
configuration files in F#.
●​ Why Use It: Highly expressive and works well with F#’s functional features, making it a
popular choice for F# developers working on parsing tasks.

9. Megaparsec

●​ Languages: Haskell
●​ Overview: Megaparsec is a modern and flexible parser combinator library in Haskell. It
is similar to Parsec but includes more features, better error reporting, and additional
combinators for more complex parsing needs.
●​ Example Use: Parsing complex grammars for languages and DSLs in Haskell, while
benefiting from rich error reporting.
●​ Why Use It: It’s more feature-rich than Parsec, providing enhanced error handling and
better support for advanced parsing scenarios.

10. Kombu

177
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Languages: Python
●​ Overview: Kombu is a simple parser combinator library for Python. It allows you to
create combinators for parsing data structures or languages in a way that is similar to
Parsec, but for Python.
●​ Example Use: Small-to-medium-scale parsers for structured data formats.
●​ Why Use It: Lightweight and easy to use in Python, making it good for simpler parsing
tasks.

Ok, so, can you again, briefly compare parser generators and parser
combinators in terms of advantages and challenges?
Here’s a more detailed expansion on the comparison between Parser Generators and Parser
Combinators, touching on their strengths, challenges, and applications:

Parser Generators

●​ How They Work:


○​ With parser generators, you write a formal grammar specification (often in a file
separate from your code). This grammar defines the syntax rules of the language
or data format you're parsing.
○​ The tool (e.g., ANTLR, Yacc, Bison) takes this grammar and generates code for
the lexer (which breaks the input into tokens) and the parser (which organizes
these tokens according to the grammar). This generated code is then used in
your project to perform the actual parsing.
●​ Advantages:
○​ Efficiency: Generated parsers are often highly optimized, offering better
performance, especially for large-scale or complex grammars. This makes them
suitable for high-performance needs, such as parsing programming languages or
large data formats.
○​ Handling Complex Grammars: Parser generators excel at handling complex,
recursive, and ambiguous grammars. Tools like ANTLR, for example, can work
with context-sensitive grammars and provide mechanisms like backtracking or
lookahead to resolve ambiguities.
○​ Error Handling: Many parser generators provide detailed error reporting and
recovery mechanisms out of the box, making it easier to identify problems in the
input or grammar.
●​ Challenges:
○​ Separate Grammar Definition: Having the grammar in a separate file can
sometimes feel disconnected from the main codebase, making it harder to
integrate with the rest of the logic.
○​ Debugging and Adaptability: Generated parsers can be more difficult to debug
because you’re working with code that is auto-generated. Adapting or modifying
the parser can require changes to the grammar file, followed by regenerating the
parser. This is usually not as flexible or incremental as parser combinators.

178
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Examples:
○​ ANTLR: Used for defining grammars for large-scale languages, including many
programming languages (e.g., SQL, Java).
○​ Yacc/Bison: Classic parser generators that are often used for implementing
language compilers and interpreters.

Parser Combinators

●​ How They Work:


○​ Parser combinators are a functional programming approach to building parsers.
You write small parsers directly in code and compose them to handle more
complex parsing tasks. These small parsers are first-class values, meaning you
can pass them around and combine them like any other value in your code.
○​ Instead of generating code, you construct the parsing logic directly in the
language you're using (e.g., Haskell, Scala, Python).
●​ Advantages:
○​ Flexibility: Since parsers are written directly in the programming language, you
have more control and can easily modify, extend, or adapt them incrementally.
This allows for very dynamic parser construction, ideal for tasks like parsing
user-defined data formats or Domain-Specific Languages (DSLs).
○​ Easier to Debug: Since parser combinators are written in code, it’s often easier
to debug them directly using the same tools you would for the rest of your
application. This allows for interactive development and fine-tuning.
○​ Modularity: Parser combinators allow you to create small, reusable components,
making the code more modular and easier to maintain for simpler grammars.
●​ Challenges:
○​ Performance: While parser combinators offer great flexibility, they generally
aren’t as performant as parser generators. For large or highly complex
grammars, combinator-based parsers might be slower.
○​ Complex Grammars: Handling highly complex or ambiguous grammars with
parser combinators can become cumbersome. As grammars grow more intricate,
it becomes harder to manage them in a combinator-based approach without
introducing inefficiencies or overly complicated code.
●​ Examples:
○​ Parsec/Megaparsec (Haskell): Well-known parser combinator libraries in
Haskell, used for constructing language parsers or small DSLs.
○​ Parsy (Python): A lightweight parser combinator library for simple to moderately
complex parsers in Python.
○​ FParsec (F#): A powerful parser combinator library for F#, good for building
complex parsers while staying within the functional programming paradigm.

179
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How do these approaches compare in terms of complex grammars, error


handling, debugging, etc?
Handling Complex Grammars:

●​ Parser Generators: Better suited for complex, recursive grammars, especially when
there are ambiguities. Tools like ANTLR or Bison come with built-in capabilities to handle
such complexities, using techniques like backtracking, lookahead, or left-factoring.
●​ Parser Combinators: More challenging to handle complex or ambiguous grammars,
especially as grammars grow larger. While possible, it often leads to verbose and less
efficient code.

Error Handling:

●​ Parser Generators: Typically provide robust error handling and recovery mechanisms
automatically, which is crucial when dealing with user inputs or complex language
constructs.
●​ Parser Combinators: Error handling is more manual, but tools like Parsec and
Megaparsec provide good support for custom error messages and error recovery. Still, it
requires more explicit effort from the developer.

Debugging:

●​ Parser Generators: Debugging generated code can be tricky because you’re working
with auto-generated code that might not be as readable or understandable as
hand-written code.
●​ Parser Combinators: Easier to debug as parsers are written directly in your codebase,
making it straightforward to test, trace, and modify.

Incremental Development:

●​ Parser Generators: Less flexible for incremental development. Changing the grammar
typically requires re-generating the lexer and parser, which can be a heavier process,
especially if the grammar is large.
●​ Parser Combinators: Much more suited for incremental development. Parsers can be
extended and modified directly in the code, allowing for more agile iterations.

Performance:

●​ Parser Generators: Typically more performant because they generate optimized


parsing code tailored to the grammar. This makes them ideal for applications requiring
high-speed parsing, such as compilers.
●​ Parser Combinators: While flexible, parser combinators tend to be slower for
large-scale grammars due to the overhead introduced by composing functions and
performing recursive parsing in code.

180
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Final Comparison

●​ For complex grammars: Parser generators (ANTLR, Bison, Yacc) are better because
they efficiently handle ambiguity, recursion, and performance. They are more suited for
parsing formal languages like programming languages or network protocols.
●​ For simpler or DSL grammars: Parser combinators offer more flexibility, making them
easier to use and modify for lightweight parsing tasks. They shine in functional
programming ecosystems where incremental, compositional parsing is needed.
●​ For error handling: Parser generators usually have the upper hand, as they provide
automatic, detailed error reporting and recovery.
●​ For development ease: Parser combinators are generally easier to use in development
because they don’t require an external toolchain, and parsers can be incrementally
updated. This makes them ideal for quick prototyping and smaller grammars.

Each approach has its strengths, and the choice depends on the complexity of the grammar,
performance needs, and how much flexibility is required in the parsing process.

And which tools are used to build actual programming languages?


Parser Generators are the go-to choice for implementing real-world programming languages
due to their efficiency and ability to handle complex grammars.

●​ ANTLR: Used in languages like C#, Python, SQL dialects, and even for parts of Java.
●​ Bison/Yacc: Frequently used for building language compilers, such as C/C++ compilers.
●​ PEG-based tools like Pest and PEG.js are increasingly used for languages that need
more flexibility in their syntax parsing.

Parser Combinators, while not as commonly used in full-scale programming languages, are
excellent for creating DSLs or embedded languages within other systems. For example:

●​ Parsec is popular for embedded DSLs and languages within Haskell.


●​ FParsec is used to create interpreters or parsers for custom languages in F#.

Is there a place where I can see the grammar of actual programming


languages written in Antlr?
Absolutely! For instance, the grammars-v4 repository on GitHub is an extensive collection of
ANTLR grammars for various programming languages, data formats, and protocols. For
students interested in deepening their understanding of language syntax or developing tools for
analyzing and processing code, this repository is an invaluable resource.

It offers grammars for a wide range of popular languages—like Java 9, C, Python 3.12, and
SQL—as well as more niche or domain-specific languages and formats. By studying these
grammars, you can gain insights into how different languages are structured and how their
syntax rules are formalized. This can be especially useful for comparing the syntax of different

181
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

languages or exploring advanced grammar features such as precedence and associativity


handling.

Beyond exploration, this repository can help you jumpstart your own projects. Whether you are
building a compiler, code analyzer, interpreter, or a custom tool that needs to process a specific
language, you can leverage the ready-made grammars to avoid the time-consuming task of
writing one from scratch. You can modify the grammars as needed to suit your specific use case
and combine them with ANTLR’s powerful parsing capabilities to create efficient
language-processing tools.

How to Use the Repository

1.​ Choose a Language: Select the grammar for the language you're interested in from the
repository.
2.​ Generate a Parser: Use ANTLR to generate a parser from the grammar.
3.​ Parse Code: Feed the parser some code written in the chosen language. The parser will
analyze the code and produce an Abstract Syntax Tree (AST), which represents the
structure of the code.

Comparing Languages

By examining the grammars for different languages, you can gain insights into their similarities
and differences. For example, you might notice that some languages have more complex syntax
than others, or that they use different control flow structures.

Creating Tools

Once you have a parser for a language, you can use it to create various tools, such as:

●​ Code Analyzers: Tools that can identify potential issues in code, such as syntax errors
or performance problems.
●​ Code Generators: Tools that can automatically generate code based on a given
specification.
●​ Translators: Tools that can translate code from one language to another.

What's the history of these parser generators? Did it start with yacc?
Well, not really. It might have started a bit before that. While Yacc (Yet Another
Compiler-Compiler), developed by Stephen C. Johnson in 1975, is one of the earliest and most
influential parser generators, it was not the first ever invented. The history of parser generators
goes back further, as part of the broader development of compiler technology in the mid-20th
century. Here's an overview:

Early Developments (1950s-1960s)

1.​ Context-Free Grammars:

182
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The theoretical foundation of parser generators was laid in the 1950s by Noam
Chomsky's work on formal languages and grammars.
○​ John Backus introduced a practical way to define programming languages
through what became the Backus-Naur Form (BNF) in the 1960s for the ALGOL
language.
2.​ Recursive Descent Parsers:
○​ Early parsers were handwritten, often using recursive descent methods, based
on the structure of context-free grammars.
3.​ Automated Parsing Research:
○​ The 1960s saw the first research into automating the process of generating
parsers, particularly for languages defined by regular and context-free grammars.

The Advent of Parser Generators

1.​ Alfred V. Aho and Donald Knuth:


○​ Donald Knuth's work on LR parsing (1965) and Alfred V. Aho's refinements
(notably LALR parsing) were critical steps in the automation of parsing.
○​ These algorithms laid the groundwork for tools like Yacc.
2.​ Early Tools Before Yacc:
○​ The first practical parser generators likely appeared in the late 1960s and early
1970s, often developed as research prototypes or for specific compiler projects.
○​ One notable early tool was the TWS System by T. H. van der Meulen and R. M.
McKeag in the late 1960s.
3.​ Yacc (1975):
○​ Yacc popularized parser generation by making it widely available as part of the
UNIX operating system.
○​ It used LALR(1) parsing, a compromise between the power of LR parsing and the
efficiency of simpler algorithms like LL(1).
○​ Yacc's design inspired many subsequent tools (e.g., GNU Bison).

Post-Yacc Evolution

1.​ GNU Bison:


○​ An open-source replacement for Yacc, providing additional features and
compatibility.
2.​ Other Innovations:
○​ Antlr (1989): Based on LL(k) parsing, Antlr introduced a more user-friendly
approach to defining grammars and handling errors.
○​ JavaCC: Focused on LL parsing, particularly for Java applications.
○​ PEG (Parsing Expression Grammars): Introduced in the 2000s, PEG-based tools
like Packrat parsers are an alternative to traditional CFG-based tools.

Summary

183
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

While Yacc wasn't the first parser generator, it was a seminal tool that made parser generation
accessible and practical for a broad audience. It became the foundation for many compilers and
inspired a lineage of tools that expanded on its capabilities. The concept of automated parser
generation predates Yacc, but its introduction marked a turning point in the widespread adoption
of such tools.

Who invented yacc?


Yacc was implemented by Stephen C. Johnson at Bell Labs in the early 1970s. Johnson
developed Yacc as part of the UNIX operating system project, which was spearheaded by Bell
Labs.

Key Details About Stephen C. Johnson's Contribution:

●​ Johnson's Yacc was designed to simplify compiler development by automating the


generation of parsers for context-free grammars.
●​ It used the LALR(1) parsing algorithm, which balanced the power of full LR parsing with
the efficiency needed for practical use.
●​ Yacc became a critical tool for developing many early UNIX utilities and compilers.

Why Yacc Was Significant:

●​ It was included in the UNIX operating system's standard distribution, which made it
widely available to researchers and developers.
●​ Yacc's design was extensible and influenced by earlier theoretical work, including
Donald Knuth's introduction of LR parsing and Alfred Aho's refinements.

Legacy:

●​ Yacc's influence persists in modern parser generators like GNU Bison (a free software
implementation of Yacc) and many other tools inspired by its design principles.
●​ Stephen C. Johnson’s work extended beyond Yacc; he contributed broadly to UNIX
development, including the creation of tools like lint for static code analysis.

What else did Stephen Johnson do for Computer Science?


Oh, he did a lot! Stephen Curtis Johnson is an influential computer scientist best known for his
pioneering work in compiler construction and his contributions to the UNIX operating system
during his tenure at Bell Labs. He earned his Ph.D. in mathematics from MIT in 1967, focusing
on group theory, before transitioning into computer science. At Bell Labs, he became a key
figure in the development of UNIX tools and systems that have shaped modern computing.

184
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Contributions to Computer Science

1. Yacc (Yet Another Compiler-Compiler)

●​ Johnson created Yacc, a groundbreaking parser generator that automates the


construction of parsers based on context-free grammars.
●​ Yacc's integration with UNIX made it widely adopted for developing compilers and
interpreters.
●​ It popularized the use of LALR(1) parsing and influenced subsequent tools like GNU
Bison.

2. Lint (Static Code Analysis Tool)

●​ Johnson developed lint, one of the first tools for static code analysis, aimed at identifying
bugs, inefficiencies, and portability issues in C programs.
●​ Lint set the stage for modern static analysis tools, fostering the development of safer and
more robust software.

3. Contributions to the UNIX Operating System

●​ He contributed to the early development of UNIX, particularly through the creation of


tools and utilities that extended the system's capabilities.
●​ His work on Yacc and lint integrated seamlessly into UNIX workflows, reinforcing the
system's reputation for productivity and innovation.

4. Compiler Research and Education

●​ Johnson’s work significantly advanced compiler design, providing tools and


methodologies that made compiler construction accessible to more developers.
●​ His contributions are documented in influential papers and presentations, many of which
remain core references in compiler theory and practice.

185
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Influence on Software Portability

●​ Johnson was deeply involved in improving the portability of software across different
hardware platforms, a challenge during the 1970s and 1980s.
●​ His tools, like lint, helped identify portability issues, aiding the widespread adoption of C
and UNIX across platforms.

Legacy

Stephen C. Johnson's work remains foundational in computer science, particularly in


programming tools, compiler design, and software engineering. His innovations in UNIX and
compiler construction laid the groundwork for many modern software development practices
and tools.

186
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Variables and Bindings


Code: https://github.com/pronesto/DCC053_Public/tree/main/6_LetBinding

What is a variable in a program?


A variable in a program is a symbolic name that represents a storage location in the computer’s
memory, used to hold a value. It allows programmers to store, retrieve, and manipulate data by
referring to that name rather than the actual data itself.

Key Features:

●​ Name: A variable has a unique identifier, or name, chosen by the programmer, which is
used to access its stored value.
●​ Value: The value of a variable is the data it currently holds, which can be of different
types (like integers, strings, etc.).
●​ Binding: A variable is bound to its value, meaning the name is associated with a value
or object in memory.

So, when we write, in Python `x = 10`, here, x is a variable that is bound to the value 10. The
name x acts as a reference to the location in memory where the value 10 is stored.

How do you create variable bindings in SML/NJ


Let blocks are the primary mechanism for creating variable bindings in SML/NJ. Here's a basic
syntax for a let block:

fun days2ms days =


let
val hours = days * 24.0
val minutes = hours * 60.0
val seconds = minutes * 60.0
in
seconds * 1000.0
end;

We can create blocks within blocks too, e.g.:

fun days2ms days =


let
val hours = days * 24.0
in
let
val minutes = hours * 60.0
in
let
val seconds = minutes * 60.0
in
seconds * 1000.0

187
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

end
end
end;

How do we create variables in C?


Easy!

int main() {
int x = 0;
{
int x = 1;
printf("%d\n", x);
}
printf("%d\n", x);
}

And what is the difference between variables in C and in SML/NJ?


●​ C: Variables in C can be mutable, meaning their values can be modified after they are
declared.
●​ SML/NJ: Variables are immutable by default, meaning their values cannot be changed
once they are initialized.

Which techniques could you use to explain this difference more formally?
We could use operational semantics! Operational semantics is a formal way to describe how
programs execute, specifying the behavior of a programming language by defining how each
statement or expression in the language changes the state of the program step by step. It does
so by providing rules that explain how the execution of a particular language construct (like
assignment, conditionals, loops, etc.) affects variables, memory, and control flow.

Key Concepts:

1.​ State: The state of a program typically refers to its variables and the values they hold at
any given point in time, as well as other runtime information like the program counter.
2.​ Transition Rules: Operational semantics uses transition rules to show how the
execution of a statement or expression transforms the program state from one step to
the next. For example:
○​ If x := 5 is executed, the state changes by binding x to the value 5.
○​ If x := x + 1 is executed, it evaluates x, increments its value, and updates the
binding of x in the state.

Can you give me examples of transition rules? Let's assume we don't have
variables yet.

188
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why do these rules look like fractions? What's the meaning of them?
They look like fractions because they are written as implication rules in logic, with the top part
(premises) representing conditions that must be met, and the bottom part (conclusion)
representing the result. Let's consider the rule for addition, for instance:

e0 → v0 e1 → v1
----------------------------------
e0 + e1 → v0 + v1

Premises (top): e0 → v0 and e1 → v1 mean that expressions e0 and e1 evaluate to values


v0 and v1, respectively.
Conclusion (bottom): The expression e0 + e1 evaluates to v0 + v1. In other words, after
evaluating both e0 and e1, their values are added.

The top part of the "fraction" (premises) must hold for the bottom part (conclusion) to be valid.
Each arrow (→) represents evaluation: on the left, we have expressions, and on the right, we
have the values they evaluate to. This kind of structure provides a formal way to reason about
the behavior of expressions in a programming language.

Why is operational semantics useful? What can we do with it?


Operational semantics is a method of defining the meaning of programming languages by
describing the effects of executing their constructs. It provides a formal framework for
understanding and reasoning about how programs execute. Here are some key benefits and
uses of operational semantics:

1.​ Formal Specification: It provides a rigorous way to specify the behavior of a


programming language. This helps in creating precise and unambiguous descriptions of
language features and constructs.
2.​ Program Verification: By defining the operational rules of a language, we can reason
about the correctness of programs. It allows for the formal verification of properties such
as safety, security, and correctness of programs based on their operational behavior.

189
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Compiler Construction: It aids in the design and implementation of compilers.


Understanding how different constructs execute helps in translating high-level code into
machine code while preserving the intended behavior.
4.​ Debugging and Testing: Operational semantics helps in debugging by providing a clear
understanding of what each construct in a program does. This can be used to create test
cases that cover various execution paths and ensure correctness.
5.​ Language Design: It helps in designing new programming languages or modifying
existing ones by providing a clear framework for understanding the implications of
language features and constructs.
6.​ Performance Analysis: Understanding the operational semantics can aid in
performance analysis by revealing how different constructs affect execution time and
resource usage.
7.​ Educational Tool: It serves as a useful educational tool for teaching programming
language concepts and the fundamentals of how programming languages are executed.

Can you give me an example of a formal proof using operational


semantics?
Sure! Consider these rules below:

Prove that "if e0 then e1 else e2" = "if not e0 then e2 else
e1":
To show that if e0 then e1 else e2 is equivalent to if not e0 then e2 else e1
using the given operational semantics rules, we need to demonstrate that both expressions
yield the same result for any evaluation of e0, e1, and e2. We will prove this by considering all
possible values of e0 (there are two cases: true or false) and showing that in each case, both
expressions evaluate to the same result.

Case 1: e0 evaluates to true

●​ Evaluate if e0 then e1 else e2, According to [IfTrue], if e0 then e1 else


e2 evaluates to e1.

190
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Evaluate if not e0 then e2 else e1:


●​ First, evaluate not e0. According to the [NotTrue] rule, not e0 evaluates to
false.
●​ Now evaluate if not e0 then e2 else e1. Since not e0 is false, if
not e0 then e2 else e1 evaluates to e1.

In this case, both if e0 then e1 else e2 and if not e0 then e2 else e1 evaluate
to e1. Case 2 is similar.

Is it possible to implement the semantic rules, as a Python program?


Absolutely! Here's an example:

191
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How can we add immutable variable names to our semantics?


To add variables to our language of expressions, we shall need a "context"; that is, a table that
associates variable names with values. To add this new construct to our programming language,
we will have to follow these steps:
1.​ Add the new syntax of let expressions.
2.​ Implement the semantic rules.
Find below the new semantic rules:

Can you explain how the let binding rule works?


The new rules include a context C in the judgments, which adds a layer of complexity to the
operational semantics. Let me explain both the context and the new let-binding rule in this file.
The context C refers to an environment or symbol table that maps variable names to their
values (or bindings). The environment keeps track of the current bindings of variables during the
evaluation of expressions. In each rule, C evolves as the program executes, and the rules
describe how expressions are evaluated within this environment.

For example:

●​ C ⊢ e0 → v0 means: "In the context C, expression e0 evaluates to value v0."


●​ C + {name: v0} means: "The context C is updated to include the binding of name to
the value v0."

So, to obtain the value of a let-in-end block, we must perform the following steps:
●​ Evaluate the expression e0: The first step evaluates the expression e0 to a value v0 in
the current context C. This part binds the result of e0 to v0.
●​ Extend the context: The context is extended with a new binding {name: v0},
associating the variable name with the result of e0.

192
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Evaluate e1 in the new context: Using the extended context (C + {name: v0}), the
body expression e1 is evaluated, which results in the value v1.
●​ Return the value of e1: The value v1 is the final result of the entire let expression.

Step-by-step evaluation:
The tree below shows how we can compute the value of let x = 12 in 21 + x end:

How could you implement these rules in Python?


The figure at the end of this section implements the semantics for us. Let's break down the
components and explain the operational semantics and their Python implementation:

1. Expression Class

This abstract base class (ABC) defines the interface for all expression types. Every subclass
must implement the eval method, which takes an environment C and evaluates the expression
based on that environment.

2. Var (Variable) Class

This class represents a variable expression. It stores the variable name (self.name) and looks
up the corresponding value in the environment C.

●​ Operational Semantic Rule: If C[name] exists, return the value. If the variable is not
found, it raises an error.

3. Num (Number) Class

193
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This class represents a constant numeric value. The eval method simply returns the number.

●​ Operational Semantic Rule: A constant always evaluates to itself.

4. BinaryExpression and Its Subclasses

This class serves as a base class for binary operations (like addition, subtraction, etc.). It has
two fields: left and right, representing the two sub-expressions of the binary operation. The
actual evaluation logic is provided by subclasses.

Addition (Add)

●​ Operational Semantic Rule: Evaluate both sub-expressions, then return their sum.

194
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Let Class

The Let expression binds a variable to a value for the evaluation of another expression.

●​ Operational Semantic Rule:


1.​ Evaluate the definition (exp_def).
2.​ Extend the environment by binding the variable to the evaluated value.
3.​ Evaluate the body of the let expression (exp_body) in the extended
environment.

195
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Summary

The figure combines formal operational semantics and their Python implementation for
evaluating expressions in a simple language. The key point is that each expression type
(variable, number, binary operation, and let-binding) is implemented with its own evaluation
logic, and the rules on the right formalize how these expressions are evaluated step-by-step.

According to these rules, our variables have immutable semantics. How


can we see that by looking into the rules?
We can observe this property by examining how the rules handle variable bindings and the
absence of any "assignment" operation that would modify an existing variable's value.
1.​ No Rule for Re-assignment:
●​ In these rules, there is no mechanism for changing the value of a variable once it
has been bound. For example, we do not see any rule like C + {name: v0} →
C + {name: v1} that would update an existing variable binding from one value
(v0) to another (v1).
●​ Every time a new binding is created (e.g., in the let-binding rule), a fresh variable
is introduced, but once a variable is assigned a value, that value does not
change.
2.​ The Let-Binding Rule:
a.​ This rule shows that a new binding (name = v0) is added to the environment (C
+ {name: v0}) when evaluating the let expression, but it never modifies any
existing binding in the context C.
b.​ The context C remains unchanged outside the scope of the let expression. If
name was already bound in C, the new binding {name: v0} shadows the old
one within the scope of the let, but the original binding remains intact outside
that scope.
1.​ Evaluation is based on fixed bindings:
○​ When an expression e0 evaluates to v0, this result is stored in the context as a
permanent association until that context is discarded or overridden by a new,
independent let block. There is no operation to overwrite the value of name
once it's been bound.
2.​ Absence of mutation operations:
○​ In languages with mutable variables (like C or Python), we would expect to see
rules involving assignment statements (e.g., name = new_value). However,
these operational semantics lack any such rule, meaning that variables, once
assigned, cannot be changed.

Thus, immutability is enforced by:

●​ Context updates in a purely additive manner: When a new binding is created, it only
affects the current scope without altering previous bindings.

196
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Lack of mutable operations: The rules don't include any that would change the value
of a bound variable.

But, if we have something like let name = 1 in let name = 2 in


name end end, are we not changing name?
It seems like we're "changing" the variable name, but what you're observing here is shadowing
rather than mutation. In immutable semantics, each let introduces a new binding for the
variable within its own scope, temporarily hiding or shadowing the previous binding without
actually changing it.

Let’s break down the example:

let name = 1 in
let name = 2 in
name
end
end

Step-by-step evaluation:

1.​ Outer let name = 1:


○​ This binding introduces name = 1 in the outer context. The environment is now
C = {name: 1}.
○​ Inside this let, we have another let expression to evaluate.
2.​ Inner let name = 2:
○​ Within the body of the outer let, a new let introduces name = 2.
○​ This does not modify the outer name = 1 binding. Instead, it creates a new
binding for name that shadows the previous one, so within the scope of this
inner let, name refers to 2.
○​ The environment inside this inner scope is C + {name: 2}.
3.​ Evaluating the inner name:
○​ The expression name in this inner block refers to the most recent binding, which
is name = 2. Therefore, the result of this inner let expression is 2.
4.​ Back to the outer scope:
○​ After the inner let finishes, we go back to the outer scope, where the original
binding name = 1 still exists. The inner binding of name = 2 is now discarded,
and the environment returns to C = {name: 1}.
○​ The final name in the outer scope now refers to 1.

Key Insight: No Mutation, Only Shadowing

197
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This example shows shadowing rather than mutation. The name in the outer let is never
changed. Each let creates a new, independent binding that exists only within its own scope.
Once you exit that scope, the previous binding remains intact.

In languages with mutable variables, this situation would involve actual modification (mutation)
of the name variable. However, in languages with immutable semantics like SML/NJ, variable
bindings are fixed once they are made. New bindings can temporarily shadow old ones, but they
never modify them.

So, what would the semantics with actual mutable variables look like?
In this case, we'd need rules that allow a variable's value to be updated (mutated) after its initial
binding. With immutable variables, we create new bindings in new scopes (shadowing).
With mutable variables, we directly change the value associated with a variable in the current
environment. In mutable semantics, we'd introduce a rule for assignment. This rule would
update the value of a variable that has already been declared, unlike shadowing where a new
binding is created. Let’s go through what this would look like.

General setup:

●​ Let’s represent a mutable context as a mapping from variable names to locations in


memory, and each location maps to a value. For simplicity, we will treat locations like
storage cells (or references) that can be updated.

The environment C will map variable names to locations (ℓ), and we’ll maintain a separate
store S that maps these locations to values.

For example:

●​ C(name) = ℓ1: Variable name is bound to location ℓ1.


●​ S(ℓ1) = 5: The value stored at location ℓ1 is 5.

We could implement this semantics with the rules below:

198
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you explain the rules above?


State S:

●​ These rules introduce a state S, which maps memory locations (e.g., loc) to values (v).
The state evolves as expressions are evaluated, and this reflects the mutable nature of
variables.

Let Binding with Mutability:

●​ The let rule for bindings shows how a new variable is bound to a location (loc) in
memory (loc is fresh), and how the value of the expression e0 is stored in that
memory location. The state S1 is updated with {loc: v0}. This allows the value
associated with a variable to be changed later by modifying the memory location.

Variable Lookup (name Rule):

●​ The rule C[name] = loc, S[loc] = v shows that a variable name is evaluated by
first looking up its memory location in the context C and then retrieving its value from the
state S.

Assignment (:= Rule):

●​ This is where mutability is clear. The assignment rule for C, S₀ ⊢ name := e → (S₁,
v) updates the value of name by changing the state S₁, associating the memory location
loc with the new value v. This reflects how variables are mutable: we can change their
values in memory.

199
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why are new stores created during the evaluation of expressions? For
instance, when evaluating addition, we go from S0 to S1 and then S2.
The changes in the store across different steps in the operational semantics, as depicted in the
image, can indeed be described as side effects.

In programming language semantics:

●​ Side effect refers to any change in the state of the program that persists beyond the
local scope of an expression or function. This includes things like modifying variables,
writing to memory, printing to the console, or updating files.
●​ Pure expressions (without side effects) always return the same value for the same
inputs and don't alter any external state. In contrast, an expression that has side effects
may return a different result or alter the state of the system (like memory or variables)
each time it is evaluated.

In your operational semantics:

●​ Evaluating an expression (like a sum) produces new stores at each step (e.g.,
S0→S1→S2). This shows that the evaluation of expressions can modify the store, which
is the state of the program. Such modifications are side effects because the state (store)
is globally affected by the operation.
●​ For example, in an assignment operation (name := e), evaluating e changes the store
by updating the memory location bound to name, producing a new store. This
modification is a side effect of the assignment.

These transformations of the store are side effects, since they involve changes to the global
state of the program, and those changes persist and can influence the future execution of the
program.

This seems to be the default semantics of C. Does C have the notion of


immutable variable names?
C does not explicitly distinguish between mutable and immutable variables in the same way
languages like Kotlin, Scala, or SML do because of its historical roots and design philosophy.

Here’s why:

1. C’s Historical Context and Simplicity

C was designed in the early 1970s as a systems programming language to be simple, fast, and
efficient. The primary goal was to give programmers direct control over the hardware,
particularly in writing operating systems like Unix. This led to a focus on:

●​ Low-level memory management.

200
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Flexibility in controlling data storage and manipulation.

In this context, immutability was not a priority. Instead, C focused on mutable variables and
pointers, allowing programmers to change values as needed for system-level operations. The
idea of distinguishing variables by immutability vs mutability (as a language-level construct)
wasn't as important as providing low-level access to memory.

2. Pointer-Based Mutability

C relies heavily on pointers, which allow direct manipulation of memory. In C, a variable can be
reassigned or modified freely unless explicitly declared as const. The use of pointers and
direct memory manipulation provides a level of mutability that was seen as more important than
strict immutability.

For example:

int x = 10;
x = 20; // mutable by default

Here, x is mutable by default, and nothing prevents it from being reassigned.

3. The const Keyword for Immutability

While C does not enforce immutability by default, it does provide the const keyword, which
allows a variable to be declared as immutable (read-only). However, the use of const is
optional and not as strongly enforced or idiomatic as in languages like Kotlin, Scala, or SML.

For example, in C:

const int y = 10;


y = 20; // Error: assignment of read-only variable ‘y’

This is C's way of providing immutability, but it’s something the programmer has to opt into,
rather than the language making it a fundamental distinction between val and var like Kotlin or
Scala.

4. Flexibility and Performance Focus

C’s design philosophy centers on giving developers flexibility and control over their programs.
Adding immutability at the core language level would introduce constraints that go against C’s
ethos of letting the programmer decide how memory and values should be handled. Because C
is used in performance-critical applications (e.g., OS development, embedded systems),
mutability by default aligns with the need for efficiency and control.

201
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Modern Languages Evolve from C

Languages like Kotlin, Scala, Rust, and even SML have evolved over time, building upon ideas
like immutability that became more prominent in software design for reliability, safety, and
concurrency. These newer languages recognize the benefits of distinguishing between mutable
and immutable variables at the language level:

●​ Concurrency: Immutable data structures prevent race conditions and make parallel
processing easier.
●​ Safety: Immutable variables reduce unintended side effects, making code easier to
reason about and less error-prone.

C's design predated many of these concerns, and while it's possible to simulate immutability
with const, it's not enforced in the same manner as more modern languages.

Summary

C doesn’t make a strict distinction between mutable and immutable variables because:

●​ It was designed with simplicity and flexibility in mind, focusing on performance and
low-level control.
●​ Immutability wasn’t a key concern during its development.
●​ It uses the const keyword for immutability, but this is optional and not central to the
language.

Newer languages evolved with the goal of safety, concurrency, and immutability, thus making
mutability vs. immutability a central part of their design.

Can you give me examples of languages that actually distinguish between


mutable and immutable variable names?
Sure! In Kotlin, you can declare variables as either mutable or immutable:

●​ val is used to declare immutable variables, which means their value cannot be changed
once assigned.
●​ var is used to declare mutable variables, which allows their value to be reassigned.

Here’s a simple Kotlin program demonstrating the difference:

fun main() {
// Immutable variable (val)
val immutableVar: String = "I am immutable"
// Trying to reassign a value to immutableVar would cause a compile-time error
// immutableVar = "New value" // Uncommenting this line would cause an error

// Mutable variable (var)

202
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

var mutableVar: String = "I am mutable"


println(mutableVar) // Output: I am mutable

// Changing the value of mutableVar


mutableVar = "I can be changed"
println(mutableVar) // Output: I can be changed
}

Explanation:

●​ immutableVar is declared with val, meaning once assigned, its value cannot be
changed.
●​ mutableVar is declared with var, allowing its value to be reassigned.

If you try to reassign a value to immutableVar, the compiler will throw an error, ensuring
immutability.

In Scala, the distinction between mutable and immutable variables is similar to Kotlin:

●​ val is used to declare immutable variables (their value cannot be reassigned).


●​ var is used to declare mutable variables (their value can be reassigned).

Here's an equivalent program in Scala:

object Main extends App {


// Immutable variable (val)
val immutableVar: String = "I am immutable"
// Trying to reassign a value to immutableVar would cause a compile-time error
// immutableVar = "New value" // Uncommenting this line would cause an error

// Mutable variable (var)


var mutableVar: String = "I am mutable"
println(mutableVar) // Output: I am mutable

// Changing the value of mutableVar


mutableVar = "I can be changed"
println(mutableVar) // Output: I can be changed
}

Explanation:

●​ immutableVar is declared with val, making it immutable. Any attempt to reassign it


would result in a compilation error.
●​ mutableVar is declared with var, so its value can be reassigned, and the change will
reflect in the output.

This behavior in Scala is very similar to Kotlin in terms of the distinction between immutability
(val) and mutability (var).

203
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In SML, the concept of immutability is a bit more pronounced because all variables are
immutable by default. However, you can use references (ref) to create mutable variables.

Here’s an equivalent example in SML:

(* Immutable variable *)
val immutableVar = "I am immutable";
(* Trying to reassign would result in an error *)
(* immutableVar := "New value"; (* Uncommenting this would cause an error *) *)

(* Mutable variable using ref *)


val mutableVar = ref "I am mutable";
(* Dereferencing the ref to print the current value *)
print (!mutableVar ^ "\n"); (* Output: I am mutable *)

(* Reassigning a new value to the mutable variable *)


mutableVar := "I can be changed";
(* Printing the updated value *)
print (!mutableVar ^ "\n"); (* Output: I can be changed *)

Explanation:

●​ immutableVar: Declared using val, which means it is immutable. Once assigned, it


cannot be changed. SML does not allow reassignment of values to variables declared
with val.
●​ mutableVar: Created using ref, which is how you declare mutable variables in SML. A
ref creates a reference (or a pointer) to a mutable memory location.
○​ To get the value of a reference, you use the dereference operator !.
○​ To update the value of a reference, you use the assignment operator :=.

Thus, while immutability is the default in SML, you can simulate mutability with references
(ref), allowing reassignment of values.

Can you implement a Python interpreter for the language with mutable
variables?

Let's implement these semantics in our existing Python code, which currently handles let
bindings but not mutable variables.

Here's an overview of the steps needed to extend your code:

1.​ Introduce a Store class to manage memory locations (store) that map locations to
values.
2.​ Modify the Let construct to allocate memory locations for variables and add them to
the store.

204
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Introduce an Assign class for the assignment operation name := e.


4.​ Update the eval methods to manage both the environment (which maps variables to
memory locations) and the store (which holds the actual values in memory).

Implementation Plan:

1. Store Class:

The store will hold the mutable state (like a dictionary) that maps memory locations to values.
Each variable will get a fresh memory location when it’s introduced.

class Store:
def __init__(self):
self.store = {}
self.next_loc = 0

def allocate(self, value):


loc = self.next_loc
self.store[loc] = value
self.next_loc += 1
return loc

def update(self, loc, value):


if loc in self.store:
self.store[loc] = value
else:
raise KeyError(f"Location {loc} not found in store.")

def lookup(self, loc):


return self.store[loc]

def __repr__(self):
return str(self.store)

2. Modify the Let Expression:

When creating a new variable, we will allocate a fresh memory location for it, store its initial
value, and update the environment to map the variable name to this location.

class Let(Expression):
def __init__(self, name, exp_def, exp_body):
self.name = name
self.exp_def = exp_def
self.exp_body = exp_body

205
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def eval(self, env, store):


# Evaluate the expression to get the initial value
e0_val = self.exp_def.eval(env, store)
# Allocate a fresh location in the store for the variable
loc = store.allocate(e0_val)
# Create a new environment where 'name' maps to this location
new_env = dict(env)
new_env[self.name] = loc
# Evaluate the body of the let expression in the new env:
return self.exp_body.eval(new_env, store)

3. Add the Assign Class:

This new class will handle assignments (name := e). It looks up the variable’s memory
location from the environment, evaluates the expression to get a new value, and updates the
store at that location.

class Assign(Expression):
def __init__(self, name, exp):
self.name = name
self.exp = exp

def eval(self, env, store):


# Look up the location associated with 'name'
if self.name in env:
loc = env[self.name]
# Evaluate the expression to get the new value
new_val = self.exp.eval(env, store)
# Update the store at this location
store.update(loc, new_val)
return new_val
else:
sys.exit(f"Undefined variable {self.name}")

4. Modify Var and Num Classes:

We need to modify the Var class to look up the value from the store using the memory location
found in the environment.

class Var(Expression):
def __init__(self, name):
self.name = name

def eval(self, env, store):


# Check if the variable exists in the environment

206
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

if self.name in env:
loc = env[self.name]
# Look up the value from the store
return store.lookup(loc)
else:
sys.exit(f"Undefined variable {self.name}")

class Num(Expression):
def __init__(self, num):
self.num = num

def eval(self, env, store):


return self.num

Example Usage:

Let’s write a simple example to show how mutable variables work:

if __name__ == "__main__":
# Create an empty environment and store
env = {}
store = Store()

# let x = 5 in x := 10 end
let_exp = Let("x", Num(5), Assign("x", Num(10)))

# Evaluate the let expression


result = let_exp.eval(env, store)

# Print the store after the assignment


print(f"Final store: {store}")

Breakdown:

●​ The Store class manages memory locations and stores variable values.
●​ The Let expression allocates a memory location for a variable and updates the
environment and store.
●​ The Assign expression updates the value of a variable by modifying the store.

This implementation follows the operational semantics that we wrote, where:

●​ The environment C holds variable names mapped to memory locations.


●​ The store S holds values at those memory locations.
●​ Assignments update values in the store, while variable lookups retrieve values from it.

207
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Who invented operational semantics?


Operational semantics, a fundamental approach to formally defining the behavior of
programming languages, was invented by Gordon D. Plotkin in the early 1980s. His seminal
paper, "A Structural Approach to Operational Semantics" (1981), laid the foundation for the
systematic and structured study of operational semantics as we know it today.

Gordon Plotkin's Role

●​ "A Structural Approach to Operational Semantics" (1981):


○​ Plotkin introduced the notion of defining the semantics of programming
languages by specifying how individual language constructs execute.
○​ The framework focuses on small-step semantics (describing the individual
steps of computation) and big-step semantics (describing how expressions are
evaluated to values in a single step).
○​ He emphasized a structured and rule-based approach, using inference rules
and inductively defined judgments, which became a standard in the field.

Precursors and Related Work

While Plotkin formalized operational semantics, earlier influences contributed to its


development:

1.​ John McCarthy (1960s):​

○​ Developed formal semantics for Lisp in terms of recursive functions, providing


an early example of describing program behavior in mathematical terms.
2.​ Peter Landin (1960s):​

○​ Introduced ideas of operational reasoning in his work on abstract machines and


the SECD machine, a precursor to Plotkin's structured operational approach.
3.​ Dana Scott and Christopher Strachey (1960s–1970s):​

○​ While primarily known for denotational semantics, their work influenced the
formalization of programming languages and inspired alternative semantic
frameworks.

So, Plotkin is like the "father" of operational semantics? What else did he
do for Computer Science?

Gordon D. Plotkin: A Short Bio

Gordon David Plotkin is a prominent computer scientist and logician known for his
groundbreaking work in programming language theory, operational semantics, and formal

208
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

methods. Born in Scotland in 1946, Plotkin earned his Ph.D. from the University of Edinburgh in
1972 under the supervision of Rod Burstall. His dissertation focused on lambda calculus and
its applications in programming languages, laying the foundation for his later work in
semantics.

Plotkin became a professor at the University of Edinburgh, where he remained a central figure
in the School of Informatics. His influence extends beyond academia, as his research has
deeply shaped the understanding of programming languages and computational logic. Plotkin
has received numerous accolades, including a fellowship from the Royal Society, recognizing
his outstanding contributions to theoretical computer science.

Contributions to Computer Science

1. Structural Operational Semantics (SOS)

●​ Plotkin’s "A Structural Approach to Operational Semantics" (1981) introduced a


systematic, rule-based framework for defining programming language behavior.
●​ SOS became a cornerstone in programming language theory, influencing how
researchers and practitioners define and analyze language semantics.

2. Denotational Semantics and Domain Theory

●​ Plotkin made significant contributions to denotational semantics, working on ways to


mathematically model the meaning of programs.
●​ He explored the connections between domain theory (developed by Dana Scott) and
operational reasoning, deepening the understanding of how programs compute.

3. Full Abstraction

●​ Plotkin worked on the concept of full abstraction, particularly for programming


languages like PCF (Programming Computable Functions). Full abstraction ensures that
the semantic model captures all observable behaviors of a program.

4. Logic in Computer Science

209
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ He contributed to the development of logical frameworks and their applications in


computer science, including reasoning about programs and formal verification.

5. Abstract Machines and Lambda Calculus

●​ Plotkin analyzed the lambda calculus and its relation to programming languages,
particularly regarding call-by-name and call-by-value evaluation strategies.
●​ His work informed the design of abstract machines and the operational reasoning used
in compilers and interpreters.

6. Bisimulation and Process Calculi

●​ Plotkin collaborated on research involving bisimulation, a key concept in process calculi


and concurrency theory. This work helped formalize notions of equivalence between
computational processes.

7. Probabilistic and Quantum Computing

●​ Later in his career, Plotkin explored probabilistic and quantum models of computation,
contributing to the understanding of how these paradigms relate to traditional
computation.

8. Influence on Programming Languages

●​ His research influenced the design and semantics of programming languages like ML,
Haskell, and others, particularly in areas like type systems, evaluation strategies, and
compiler correctness.

Legacy

Gordon Plotkin's work has shaped the theoretical foundations of programming languages and
computational systems. His pioneering contributions, particularly in operational semantics,
remain a cornerstone of computer science education and research, making him one of the most
influential figures in the field.

210
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The Visitor Design Pattern


Code: https://github.com/pronesto/DCC053_Public/tree/main/7_Visitors

The size of an expression is 1 plus the size of its subparts. Add the size
capability to our representation of expressions.
import sys
from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def eval(self, env):
raise NotImplementedError

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier
def eval(self, env):
if self.identifier in env:
return env[self.identifier]
else:
sys.exit(f"Undefined variable: {self.identifier}")
def size(self):
return 1

class Num(Expression):
def __init__(self, num):
self.num = num
def eval(self, _):
return self.num
def size(self):
return 1

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right

@abstractmethod
def eval(self, env):
raise NotImplementedError
def size(self):
return 1 + self.left.size() + self.right.size()

class Add(BinaryExpression):
def eval(self, env):
return self.left.eval(env) + self.right.eval(env)

class Sub(BinaryExpression):
def eval(self, env):
return self.left.eval(env) - self.right.eval(env)

211
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class Mul(BinaryExpression):
def eval(self, env):
return self.left.eval(env) * self.right.eval(env)

class Div(BinaryExpression):
def eval(self, env):
return self.left.eval(env) // self.right.eval(env)

class Let(Expression):
def __init__(self, identifier, exp_def, exp_body):
self.identifier = identifier
self.exp_def = exp_def
self.exp_body = exp_body
def eval(self, env):
e0_val = self.exp_def.eval(env)
new_env = dict(env)
new_env[self.identifier]= e0_val
return self.exp_body.eval(new_env)
def size(self):
return 2 + self.exp_def.size() + self.exp_body.size()

Now, add some functionality to pretty print expressions.


import sys
from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def eval(self, env):
raise NotImplementedError

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier
def __str__(self):
"""
Example:
>>> e = Var('var')
>>> print(e)
var
"""
return self.identifier

class Num(Expression):
def __init__(self, num):
self.num = num
def __str__(self):
"""
Example:
>>> e = Num(3)
>>> print(e)
3
"""
return str(self.num)

212
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right
def size(self):
"""
Example:
>>> e = Add(Var('x'), Num(2))
>>> e.size()
3
"""
return 1 + self.left.size() + self.right.size()

class Add(BinaryExpression):
def __str__(self):
"""
>>> n1 = Num(3)
>>> n2 = Num(4)
>>> e = Add(n1, n2)
>>> print(e)
(3 + 4)
"""
return f"({str(self.left)} + {str(self.right)})"

class Sub(BinaryExpression):
def __str__(self):
"""
>>> n1 = Num(3)
>>> n2 = Num(4)
>>> e = Sub(n1, n2)
>>> print(e)
(3 - 4)
"""
return f"({str(self.left)} - {str(self.right)})"

class Mul(BinaryExpression):
def __str__(self):
"""
>>> n1 = Num(3)
>>> n2 = Num(4)
>>> e = Mul(n1, n2)
>>> print(e)
(3 * 4)
"""
return f"({str(self.left)} * {str(self.right)})"

class Div(BinaryExpression):
def __str__(self):
"""
>>> n1 = Num(28)
>>> n2 = Num(4)
>>> e = Div(n1, n2)
>>> print(e)
(28 / 4)
"""

213
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return f"({str(self.left)} / {str(self.right)})"

class Let(Expression):
def __str__(self):
"""
>>> e = Let('v', Num(42), Var('v'))
>>> print(e)
let v = 42 in v end

>>> e = Let('v', Num(40), Let('w', Num(2), Add(Var('v'), Var('w'))))


>>> print(e)
let v = 40 in let w = 2 in (v + w) end end

>>> e = Let('v', Add(Num(40), Num(2)), Mul(Var('v'), Var('v')))


>>> print(e)
let v = (40 + 2) in (v * v) end
"""
str_def = str(self.exp_def)
str_body = str(self.exp_body)
return f"let {self.identifier} = {str_def} in {str_body} end"

There is a problem with this design. Whenever we need to add a new


functionality, we have to modify all the subclasses of Expression. Which
software principle is broken in this case?
This design breaks the Open/Closed Principle. The Open/Closed Principle (OCP) states that a
class should be open for extension but closed for modification. In other words, you should
be able to add new functionality (like printing, sizing, verifying, etc) without modifying the
existing classes. This problem—having to change all subclasses when adding a new
method—suggests a design that breaks this principle.

How could you change your design, to avoid having to modify classes
whenever you need a new functionality?
One solution would be to use type inspection to find out the type of each expression, and handle
them accordingly. See one solution below:

def to_str(e):
"""
Pretty print an expression based on its type.

Example:
>>> e = Let('v', Num(42), Var('v'))
>>> print(to_str(e))
let v = 42 in v end

>>> e = Let('v', Num(40), Let('w', Num(2), Add(Var('v'), Var('w'))))


>>> print(to_str(e))
let v = 40 in let w = 2 in (v + w) end end

>>> e = Let('v', Add(Num(40), Num(2)), Mul(Var('v'), Var('v')))

214
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> print(to_str(e))
let v = (40 + 2) in (v * v) end
"""
if isinstance(e, Var):
return e.identifier
elif isinstance(e, Num):
return str(e.num)
elif isinstance(e, Add):
return f"({to_str(e.left)} + {to_str(e.right)})"
elif isinstance(e, Sub):
return f"({to_str(e.left)} - {to_str(e.right)})"
elif isinstance(e, Mul):
return f"({to_str(e.left)} * {to_str(e.right)})"
elif isinstance(e, Div):
return f"({to_str(e.left)} / {to_str(e.right)})"
elif isinstance(e, Let):
return f"let {e.identifier} = {to_str(e.exp_def)} in {e.exp_body} end"
else:
raise ValueError("Unknown expression")

But now we have another design problem. Which one?


The current design of the to_str function violates the Encapsulation principle. Encapsulation
is a fundamental software development principle that promotes the hiding of implementation
details within objects and modules, exposing only necessary interfaces.

The issue we're facing relates to breaking the Law of Demeter (also known as the Principle of
Least Knowledge). This principle states that an object should have limited knowledge of the
internal details of other objects. In your to_str function, you're directly accessing fields like
e.identifier, e.left, e.right, etc., which exposes the internal structure of your
Expression subclasses (e.g., Var, Add, Let).

By directly accessing internal details like e.identifier, the function tightly couples itself to
the specific implementations of the expression types. This makes the code less maintainable
and more prone to errors if the internal structure of these types changes. If we ever change the
implementation of the Var class, we would need to change the implementation of to_str too.

To adhere to the Law of Demeter, you could push the responsibility of converting an expression
to a string onto the subclasses themselves, either by defining a __str__ method in each class
or by adding a specific method like to_str in each subclass. But then we are back into the
Open/Close issue...

Is there a way to solve this issue?


We are facing a classic trade-off between the Open/Closed Principle and the Law of Demeter.

1.​ Breaking the Law of Demeter:

215
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ This happens when you tightly couple an external function (to_str) to the
internal structure of multiple classes.
2.​ Breaking the Open/Closed Principle:
○​ If you add new methods (like to_str) to each class, the design becomes harder
to extend without modifying the base code.

One way of solving this issue is through double dispatch!

Double dispatch? What's that?


Double dispatch is a programming technique that allows a method to be selected based on the
runtime types of two objects involved in the operation. As an example, consider the Julia
programming language. It supports double dispatch natively!

Well, Python doesn't have that. How do we implement double dispatch?


Double dispatch is often simulated through the Visitor Pattern, and it can help solve this
dilemma by allowing you to add new functionality (like to_str) without modifying your
Expression subclasses, while also avoiding the need to access their internal details directly.
The Visitor Pattern is a behavioral design pattern that allows you to separate an algorithm from

216
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

the objects it operates on. It lets you add new operations to a group of objects (often in a
hierarchy, like an AST for expressions) without modifying the objects themselves.

Key Concepts:

1.​ Visitor: A class that defines operations to be performed on different object types in the
hierarchy (e.g., visit_add, visit_var).
2.​ Element: The object structure (like Expression subclasses) that accepts a visitor and
delegates the operation to it via the accept method.

How It Works:

●​ Each object in the hierarchy (e.g., Var, Add, Mul) implements an accept(visitor)
method.
●​ The accept method calls the appropriate method on the visitor, depending on the
object's type (visit_var, visit_add, etc.).
●​ The logic for new operations (like pretty-printing or evaluation) is added to the Visitor,
not the objects themselves.

So, how to implement a Visitor to print our expressions?


There are two changes that we must implement in our code:
●​ Visitor Class: This class will encapsulate the to_str functionality (or any other future
operations) without modifying the expression classes themselves.
●​ Expression Classes: Each subclass of Expression will have an accept method that
takes a visitor object and delegates the operation to it. This way, the logic for printing the
expressions is handled externally without violating the Open/Closed Principle.
So, here's the Visitor:

class VisitorStr:
"""
Pretty print an expression based on its type.

Example:
>>> e = Let('v', Num(42), Var('v'))
>>> v = VisitorStr()
>>> print(e.accept(v, None))
let v = 42 in v end

>>> e = Let('v', Num(40), Let('w', Num(2), Add(Var('v'), Var('w'))))


>>> v = VisitorStr()
>>> print(e.accept(v, None))
let v = 40 in let w = 2 in (v + w) end end

>>> e = Let('v', Add(Num(40), Num(2)), Mul(Var('v'), Var('v')))


>>> v = VisitorStr()
>>> print(e.accept(v, None))

217
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

let v = (40 + 2) in (v * v) end


"""
def visit_var(self, var, arg):
return var.identifier

def visit_num(self, num, arg):


return str(num.num)

def visit_add(self, add, arg):


return f"({add.left.accept(self, arg)} + {add.right.accept(self, arg)})"

def visit_sub(self, sub, arg):


return f"({sub.left.accept(self, arg)} - {sub.right.accept(self, arg)})"

def visit_mul(self, mul, arg):


return f"({mul.left.accept(self, arg)} * {mul.right.accept(self, arg)})"

def visit_div(self, div, arg):


return f"({div.left.accept(self, arg)} / {div.right.accept(self, arg)})"

def visit_let(self, let, arg):


def_str = let.exp_def.accept(self, arg)
def_body = let.exp_body.accept(self, arg)
return f"let {let.identifier} = {def_str} in {def_body} end"

And here's the new implementation of Expressions:

import sys
from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def accept(self, visitor):
pass

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier
def accept(self, visitor, arg):
return visitor.visit_var(self, arg)

class Num(Expression):
def __init__(self, num):
self.num = num
def accept(self, visitor, arg):
return visitor.visit_num(self, arg)

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right

class Add(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_add(self, arg)

218
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class Sub(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_sub(self, arg)

class Mul(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_mul(self, arg)

class Div(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_div(self, arg)

class Let(Expression):
def __init__(self, identifier, exp_def, exp_body):
self.identifier = identifier
self.exp_def = exp_def
self.exp_body = exp_body
def accept(self, visitor, arg):
return visitor.visit_let(self, arg)

Could we evaluate expressions through a Visitor?


Sure! Here's the visitor:

class VisitorEval:
"""
Pretty print an expression based on its type.

Example:
>>> e = Let('v', Num(42), Var('v'))
>>> v = VisitorEval()
>>> print(e.accept(v, {}))
42

>>> e = Let('v', Num(40), Let('w', Num(2), Add(Var('v'), Var('w'))))


>>> v = VisitorEval()
>>> print(e.accept(v, {}))
42

>>> e = Let('v', Add(Num(40), Num(2)), Mul(Var('v'), Var('v')))


>>> v = VisitorEval()
>>> print(e.accept(v, {}))
1764
"""
def visit_var(self, var, env):
if var.identifier in env:
return env[var.identifier]
else:
sys.exit(f"Variavel inexistente {var.identifier}")

def visit_num(self, num, env):


return num.num

219
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def visit_add(self, add, env):


return add.left.accept(self, env) + add.right.accept(self, env)

def visit_sub(self, sub, env):


return sub.left.accept(self, env) - sub.right.accept(self, env)

def visit_mul(self, mul, env):


return mul.left.accept(self, env) * mul.right.accept(self, env)

def visit_div(self, div, env):


return div.left.accept(self, env) // div.right.accept(self, env)

def visit_let(self, let, env):


e0_val = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier]= e0_val
return let.exp_body.accept(self, new_env)

Have we achieved the Open/Closed Principle while preserving the Law of


Demeter?
The Visitor Design Pattern indeed addresses both the Law of Demeter and the Open/Closed
Principle in its design.

1.​ Law of Demeter: This principle, also known as the principle of least knowledge,
suggests that a module should not know about the internal details of the objects it
manipulates. The Visitor Pattern helps adhere to this principle by allowing operations to
be performed on elements of an object structure without the need for those elements to
expose their internal details to the visitor. Essentially, the pattern enables you to add new
operations to existing object structures without modifying their classes, thereby
minimizing the need for deep coupling.
2.​ Open/Closed Principle: This principle states that software entities (classes, modules,
functions, etc.) should be open for extension but closed for modification. The Visitor
Pattern aligns with this principle by allowing you to add new operations to a set of
objects without changing the classes of the objects themselves. Instead of modifying the
classes to add new behavior, you create new visitor classes that implement the
additional operations, thereby extending the functionality while keeping the existing code
closed for modification.

In summary, the Visitor Pattern helps maintain a clean separation of concerns, promoting both
the Law of Demeter by reducing the dependency on internal object details and the Open/Closed
Principle by facilitating the addition of new functionalities without altering existing code.

So, how does our implementation adhere to the Law of Demeter?


This implementation of a visitor pattern for expressions adheres to the Law of Demeter in
several ways:

220
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Encapsulation of Behavior: Each expression type (Var, Num, Add, Sub, Mul, Div,
Let) does not need to know the specific details of the other expression types or their
implementations. The VisitorEval class encapsulates the logic for evaluating
expressions, so each expression type (Var, Num, etc.) only needs to interact with the
visitor interface. This minimizes the need for expressions to interact directly with other
expressions, thus adhering to the Law of Demeter.
2.​ Decoupling Operations from Data Structures: The accept method in the
Expression class allows an expression object to pass itself to a visitor without
exposing its internal details. The visitor (VisitorEval) performs operations on the
expression objects without needing to understand the internal implementation of these
objects. For example, the visit_add method in VisitorEval interacts with Add
objects by calling accept on their left and right components. This avoids direct
manipulation or knowledge of the internal structure of the Add object.
3.​ Layered Interaction: When the VisitorEval methods like visit_add or
visit_let handle expressions, they delegate the evaluation of sub-expressions (like
add.left.accept(self, env)) to the sub-expressions themselves. This layered
approach ensures that the visitor does not need to know the details of how left and
right are implemented internally; it only needs to know how to interact with them
through their accept methods.
4.​ Avoiding Deep Coupling: Since the VisitorEval methods do not directly access or
manipulate the internal data of expressions but rather rely on the accept method to
traverse and evaluate the expressions, it avoids deep coupling between the visitor and
the expression classes. This adheres to the Law of Demeter by reducing the
dependencies between components.

In summary, by using the visitor pattern, this implementation ensures that the VisitorEval
class performs operations on expression objects without needing to understand their internal
workings, thus following the Law of Demeter. Each expression type interacts with the visitor
through a well-defined interface (accept), minimizing direct dependencies and ensuring that
changes to expression types do not affect the visitor’s logic.

And how does the implementation preserve the Open/Closed Principle?


The implementation of the visitor pattern for expressions adheres to the Open/Closed Principle
(OCP) in the following ways:

1.​ Extensibility without Modifying Existing Code: The VisitorEval class can be
extended to add new operations or evaluations without modifying the existing
Expression classes. For instance, if you want to add a new type of operation, such as
Exponentiation, you can create a new visitor class that implements the necessary

221
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

logic for Exponentiation without changing the Var, Num, Add, Sub, Mul, Div, or Let
classes.
2.​ Adding New Visitors: To introduce new functionalities, you create additional visitor
classes. Each new visitor class implements the visit_* methods for the different
expression types. This approach allows you to extend the behavior of the expression
evaluation system without altering the existing expression classes. For example, if you
wanted to add a visitor for pretty-printing or optimizing expressions, you could define a
new visitor class like VisitorPrettyPrint or VisitorOptimize, each
implementing its own visit_* methods.
3.​ Expression Classes Remain Closed for Modification: The expression classes (Var,
Num, Add, etc.) do not need to be changed when new operations are introduced. They
are designed to be open for extension (through new visitors) but closed for modification.
Their implementation of the accept method remains the same, and they do not need to
be altered to support new visitors.
4.​ Consistent Interface for Extensions: The accept method in the Expression class
provides a consistent interface for all visitors. This means that regardless of the type of
visitor or the specific operations it performs, the expression classes remain unchanged.
New visitor classes can be added as needed, demonstrating adherence to the OCP by
enabling extensions through new visitors rather than modifications to existing classes.

Can you write a visitor that optimizes expressions by resolving statically


expressions that involve only numbers?
You can create a new visitor class, VisitorOptimize, which might transform expressions to a
more efficient form. This class will implement the visit_* methods to traverse and transform
expressions without modifying the Expression classes themselves:

class VisitorOptimize:
"""
This visitor optimizes expressions, folding operations that involve only
numbers.

Example:
>>> e0 = Add(Num(40), Num(2))
>>> optimizer = VisitorOptimize()
>>> e1 = e0.accept(optimizer, {})
>>> printer = VisitorStr()
>>> print(e1.accept(printer, None))
42

>>> e0 = Let('v', Add(Num(1), Num(1)), Add(Num(40), Var('v')))


>>> optimizer = VisitorOptimize()
>>> e1 = e0.accept(optimizer, {})
>>> printer = VisitorStr()
>>> print(e1.accept(printer, None))
let v = 2 in (40 + v) end

222
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

"""
def visit_var(self, var, env):
return var

def visit_num(self, num, env):


return num

def visit_add(self, add, env):


left = add.left.accept(self, env)
right = add.right.accept(self, env)
if isinstance(left, Num) and isinstance(right, Num):
return Num(left.num + right.num)
return Add(left, right)

def visit_sub(self, sub, env):


left = sub.left.accept(self, env)
right = sub.right.accept(self, env)
if isinstance(left, Num) and isinstance(right, Num):
return Num(left.num - right.num)
return Sub(left, right)

def visit_mul(self, mul, env):


left = mul.left.accept(self, env)
right = mul.right.accept(self, env)
if isinstance(left, Num) and isinstance(right, Num):
return Num(left.num * right.num)
return Mul(left, right)

def visit_div(self, div, env):


left = div.left.accept(self, env)
right = div.right.accept(self, env)
if isinstance(left, Num) and isinstance(right, Num):
return Num(left.num // right.num)
return Div(left, right)

def visit_let(self, let, env):


new_def = let.exp_def.accept(self, env)
new_body = let.exp_body.accept(self, env)
return Let(let.identifier, new_def, new_body)

Have you ever heard of an issue in software design called the Expression
Problem?
The expression problem is a challenging problem in programming languages that concerns the
extensibility and modularity of statically typed data abstractions. The goal is to define a data
abstraction that is extensible both in its representations and its behaviors, where one can add
new representations and new behaviors to the data abstraction, without recompiling existing
code, and while retaining static type safety (e.g., no casts). The statement of the problem
exposes deficiencies in programming paradigms and programming languages.

223
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How do Visitors deal with the expression problem?


The Visitor pattern addresses part of the Expression Problem, particularly in enabling
behavioral extensibility (i.e., adding new operations on existing data types) in statically typed
languages. However, it struggles with representational extensibility (i.e., adding new data
types or variants) in a type-safe manner, especially in languages with strict static typing.

Here's a breakdown of how the Visitor pattern interacts with the Expression Problem:

1. Behavioral Extensibility (Adding New Operations)

The Visitor pattern excels at adding new behaviors (operations) to existing data types without
modifying their definitions. By defining a new visitor class, you can extend the functionality of
your data structure without changing the underlying structure (such as expression classes in the
example above).

In the context of the Expression Problem, this solves part of the challenge. For example, you
can easily add new visitors to evaluate expressions, pretty-print them, optimize them, etc., by
simply adding new Visitor implementations. This satisfies the need for behavioral
extensibility.

However, it does so with a compromise:

●​ You need to modify the visitor interface every time you add a new data variant (e.g.,
adding an Exponentiation operation). This limits representational extensibility.

2. Representational Extensibility (Adding New Variants)

While the Visitor pattern is good at adding new behaviors, it struggles with adding new data
representations (i.e., new variants of data types) without recompiling or modifying existing code.
In the case of statically typed languages like Java, C++, or Python with type hinting, adding a
new type (e.g., a new expression class like Exponentiation) requires modifying the Visitor
interface and the existing visitor classes to handle the new variant.

This breaks the Open/Closed Principle in terms of representational extensibility. In concrete


terms:

●​ If you add a new expression type (e.g., Exponentiation), you would need to update
the Visitor interface to include a visit_exponentiation method.
●​ You also need to modify all existing visitor implementations (VisitorEval,
VisitorOptimize, etc.) to implement this new method, thus breaking the "closed for
modification" aspect of the Open/Closed Principle.

Example:

224
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Suppose you want to add an Exponentiation expression to the existing set of Add, Sub,
Mul, and Div. You would need to:

●​ Add a new Exponentiation class.


●​ Modify the Visitor interface to include a visit_exponentiation method.
●​ Update every visitor class to include logic for handling Exponentiation.

This violates representational extensibility because you are forced to touch multiple parts of the
code when adding new data types.

Is the Visitor Design Pattern actually used in the design of real-world


compilers?
Yes, the Visitor pattern is used in many real-world compilers, particularly in those written using
object-oriented programming (OOP) languages. The Visitor pattern is often employed to
navigate Abstract Syntax Trees (ASTs) and apply operations such as type checking,
optimization passes, and code generation. Here are some examples of how it's used:

1.​ Clang/LLVM: Although LLVM itself is not strictly object-oriented in the traditional sense,
Clang, the front-end for C, C++, and Objective-C, is written in C++ and uses visitor-like
patterns to traverse ASTs. For instance, Clang has AST visitors that traverse the syntax
tree to perform semantic analysis, including checking types and symbols.
2.​ Java Compiler (javac): The javac compiler, written in Java, uses visitors for AST
processing. In this case, the Visitor pattern is useful for separating the operations (such
as type inference, bytecode generation, or optimizations) from the structure of the AST
nodes, making the code easier to extend without modifying the AST classes themselves.
3.​ GCC (GNU Compiler Collection): While GCC is largely written in C, the design of its
AST traversal shares similar principles with the Visitor pattern. It uses function pointers
or switch-based approaches to traverse different kinds of nodes, which is conceptually
similar to how Visitor pattern works in OOP contexts.
4.​ Scala Compiler (scalac): In the Scala compiler, the Visitor pattern is used in the
implementation of different phases of the compiler, like tree rewriting. Scala’s
pattern-matching features also help manage tree traversals, which serves a similar
purpose to the Visitor pattern in managing AST nodes.

In practice, the Visitor pattern's main benefit in compilers is that it allows different kinds of
operations (e.g., interpretation, compilation, optimization) to be added without altering the AST
node classes. It makes the code more modular and easier to extend, which is especially useful
in large and complex systems like compilers.

225
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Could you show me an example of some analysis that we could implement


in clang by overwriting methods of a visitor?
Let's implement a simple visitor in Clang that analyzes the number of function declarations in a
C++ program. We'll walk through how to create an AST visitor that traverses the source code
and counts function declarations.

Here’s a step-by-step guide:

Step 1: Set Up Clang Tools

Before we get started, we need to have Clang and LLVM installed, along with the Clang Tooling
infrastructure. We'll be working with Clang’s libTooling.

Step 2: Create the Visitor

We’ll create a visitor that looks for function declarations and counts them.

Here’s the code for a minimal Clang tool using the visitor pattern:

#include "clang/AST/AST.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Frontend/ASTConsumers.h"
#include "clang/Frontend/FrontendAction.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Tooling/Tooling.h"

using namespace clang;

// A simple RecursiveASTVisitor to visit function declarations.


class FunctionDeclVisitor : public RecursiveASTVisitor<FunctionDeclVisitor> {
public:
explicit FunctionDeclVisitor(ASTContext *Context)
: Context(Context) {}

bool VisitFunctionDecl(FunctionDecl *FuncDecl) {


// Print the name of each function declaration
llvm::outs() << "Found function: " <<
FuncDecl->getNameInfo().getName().getAsString() << "\n";
return true;
}

private:
ASTContext *Context;
};

// An ASTConsumer to use the visitor


class FunctionDeclConsumer : public ASTConsumer {
public:
explicit FunctionDeclConsumer(ASTContext *Context)
: Visitor(Context) {}

226
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

virtual void HandleTranslationUnit(ASTContext &Context) {


// Use the visitor to traverse the AST
Visitor.TraverseDecl(Context.getTranslationUnitDecl());
}

private:
FunctionDeclVisitor Visitor;
};

// A FrontendAction to run our visitor


class FunctionDeclFrontendAction : public ASTFrontendAction {
public:
std::unique_ptr<ASTConsumer>
CreateASTConsumer(CompilerInstance &CI, StringRef) override {
return std::make_unique<FunctionDeclConsumer>(&CI.getASTContext());
}
};

// A minimal tool execution that doesn't require CommonOptionsParser.


int main(int argc, const char **argv) {
if (argc > 1) {
// Create a ClangTool that will run our FrontendAction.
clang::tooling::runToolOnCode(std::make_unique<FunctionDeclFrontendAction>(),
argv[1]);
} else {
llvm::outs() << "Please provide code as an argument.\n";
}
return 0;
}

Step 3: Explanation

●​ FunctionDeclVisitor: This is our custom visitor that inherits from


RecursiveASTVisitor. It overrides the method VisitFunctionDecl, which is
automatically called whenever a FunctionDecl (a function declaration in the source
code) is encountered in the AST. Inside this method, we increment a counter and print
the name of each function.
●​ FunctionDeclConsumer: This is the class responsible for using the
FunctionDeclVisitor. The HandleTranslationUnit method is called when the
entire AST has been parsed, and we use the visitor to traverse the AST starting from the
root (TranslationUnitDecl).
●​ FunctionDeclFrontendAction: This class ties the visitor and consumer into Clang's
frontend. It specifies that we want to run our visitor when Clang processes the source
code.

Step 4: Running the Tool

227
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

To run this tool, you would compile it using Clang and run it on some C++ source code. It will
print the names of all functions declared in the code, as well as the total number of functions
found.

Here’s an example of input source code:

#include <iostream>

void foo() {
std::cout << "Hello, World!" << std::endl;
}

int bar(int x) {
return x + 1;
}

When running the tool on this code, the output would be:

Found function: foo


Found function: bar
Total functions: 2

Step 5: Building the Tool

To build the tool, assuming Clang and LLVM are installed at llvm-project/build/lib:

% clang++ -std=c++14 -o func_counter func_counter.cpp \


​ -I~/llvm-project/clang/include \
​ -I~/llvm-project/build/include \
​ -I~/llvm-project/build/tools/clang/include \
​ -I~/llvm-project/llvm/include \
​ -L~/llvm-project/build/lib \
​ -lclangTooling -lclangBasic -lclangAST -lLLVM

You can then run the tool on your source code like this:

./func_counter file.cpp

Conclusion

This is a simple example of how you can implement an analysis using the Visitor pattern in
Clang. You can extend this pattern to perform more complex analyses, such as finding variable
declarations, counting loops, or checking coding style.

228
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Do you know who invented the Visitor Design Pattern?


Alas, that would be very difficult to know for sure: many people were using similar techniques
around the same time. The Visitor design pattern is a key structural design pattern in
object-oriented programming. Its invention is not attributed to a single individual, but it became
well-known and formalized in the context of the Gang of Four (GoF) book, Design Patterns:
Elements of Reusable Object-Oriented Software (1994), written by Erich Gamma, Richard
Helm, Ralph Johnson, and John Vlissides.

Early Mentions and Theoretical Origins

While the GoF book is credited with popularizing the Visitor pattern, the concept is rooted in
earlier work on object-oriented programming and double dispatch. Here are some notable
influences and precursors:

1.​ Kristen Nygaard and Ole-Johan Dahl:​

○​ The creators of Simula (1960s), the first object-oriented programming language,


laid the groundwork for ideas about polymorphism and dynamic dispatch, which
are central to the Visitor pattern.
2.​ Bjarne Stroustrup:​

○​ In the development of C++ (early 1980s), Stroustrup explored techniques for


separating algorithms from the data structures they operate on, an idea that
aligns with the Visitor pattern.
3.​ Computer Graphics and Traversals:​

○​ Early examples of patterns resembling Visitor appeared in areas like tree


traversal algorithms, used in compiler design and computer graphics. These
problems often required separating traversal logic from the structure being
traversed.

Visitor Pattern in the GoF Book

●​ The GoF formalized the Visitor pattern as a way to add operations to classes without
modifying them, using double dispatch to resolve method calls based on both the
visitor and the element being visited.
●​ In the GoF's classification, Visitor is primarily used when:
○​ The operations need to work across a variety of classes with distinct interfaces.
○​ You want to keep the element classes unchanged when adding new functionality.

Key Figures Who Popularized It

229
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides: While not the
inventors, their presentation of Visitor in the GoF book made it one of the most
recognized design patterns in object-oriented programming.

If you’re interested in the pattern’s historical applications, compiler theory and tree traversal
algorithms in languages like Lisp or Simula are excellent places to explore its conceptual roots!

John Vlissides passed away in 2005. He was such an endearing person,


right? And he did a lot for Computer Science!

John Vlissides: A Short Bio

Yes, John Matthew Vlissides (August 2, 1961 – November 24, 2005) was a very important and
endearing person. He was an influential computer scientist, best known as one of the "Gang of
Four" authors of Design Patterns: Elements of Reusable Object-Oriented Software. Vlissides
was born in Michigan, USA, and displayed an early aptitude for mathematics and science. He
pursued his education at the University of Virginia, earning a bachelor's degree in electrical
engineering. He later completed his Ph.D. in computer science at Stanford University in 1988,
where his thesis focused on object-oriented user interface design.

Throughout his career, Vlissides worked at IBM’s Thomas J. Watson Research Center, where
he made significant contributions to software design and reuse. His passion for patterns, design,
and usability in software extended beyond his research, as he actively shared his knowledge
through publications, talks, and mentoring.

He passed away in 2005 at the age of 44 after a battle with a brain tumor. Vlissides is
remembered not only for his professional achievements but also for his warm personality,
intellectual curiosity, and dedication to the software engineering community.

Contributions to Computer Science

1. Co-Authoring the Design Patterns Book

230
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ As one of the authors of Design Patterns: Elements of Reusable Object-Oriented


Software (1994), Vlissides helped codify 23 fundamental patterns that provide reusable
solutions to common design problems in software engineering.
●​ The book introduced many developers to the concept of patterns, revolutionizing how
software architecture was taught and practiced.

2. Advocate for Practical Design Patterns

●​ Vlissides was passionate about making design patterns accessible to practitioners,


emphasizing their practicality in real-world software engineering projects.
●​ He worked to bridge the gap between academic research and industry application,
inspiring countless engineers to adopt object-oriented design principles.

3. Contributions to Object-Oriented Programming

●​ Vlissides’s early work focused on object-oriented user interface frameworks and design,
which laid the groundwork for several modern principles in GUI and framework
development.
●​ His research explored ways to improve the reusability and maintainability of software
systems.

4. “Pattern Hatching” Column

●​ Vlissides authored a series of columns called “Pattern Hatching”, published in C++


Report. These essays explored the nuances of design patterns and their application,
providing insights to a wide audience of developers.
●​ His engaging writing style made complex concepts approachable and practical.

5. Work at IBM Research

●​ At IBM, Vlissides contributed to projects related to software architecture, reuse, and user
interface design.
●​ He played a pivotal role in advancing IBM's efforts to incorporate design patterns into its
tools and methodologies.

6. Influence on Software Reuse

●​ Vlissides was a strong proponent of software reuse and modularity, advocating for
practices that reduced redundancy and improved the efficiency of development
processes.

Legacy

John Vlissides’s work has left an indelible mark on the field of software engineering. While his
life was tragically short, his contributions to design patterns and his mentorship have inspired
generations of software developers. His colleagues and friends remember him as a kind,

231
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

thoughtful, and generous individual, making his legacy one of both professional and personal
excellence.

232
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Type Systems
Code: https://github.com/pronesto/DCC053_Public/tree/main/8_TypeVerification

Do you know any modern programming language that does not have
types?
Nearly all modern programming languages use types in some form, though the way they
handle types varies. While some languages don't require explicit type annotations from the
programmer, every language that manipulates data needs some concept of types, whether
implicit or explicit.

Languages that might appear not to use types typically employ dynamic typing where the
types are inferred and checked at runtime, rather than at compile-time. Examples of these are:

●​ JavaScript: JavaScript doesn't require the programmer to declare types explicitly, but
the language internally keeps track of types during runtime.
●​ Python: Like JavaScript, Python does not require the explicit declaration of types, but
every object in Python still has a type that the interpreter checks at runtime.

Even in languages without explicit type systems, the underlying implementation must manage
types for the sake of operations like arithmetic, memory management, and function calls.

In summary, while there are languages that do not require explicit types (dynamic languages),
there are no modern languages that completely avoid the concept of types. Implicit types still
exist, even if they are not directly visible to the programmer.

I heard that BCPL didn't have types. Is that true?


Yes, it is true that BCPL (Basic Combined Programming Language) did not have a type system
in the way modern programming languages do. BCPL, developed in the 1960s by Martin
Richards, is a typeless language where everything is treated as a single, generic type: a
machine word.

Key Features of BCPL:

●​ Typeless Nature: In BCPL, all data is treated as a word of memory, typically a fixed-size
unit that could represent anything: an integer, a pointer, or even part of a string. There
was no distinction between different data types, and it was up to the programmer to
interpret what the bits represented.
●​ Manual Interpretation: Since BCPL lacked type safety or type enforcement, it was the
programmer’s responsibility to ensure that the right operations were applied to the right
kind of data. For example, adding two values was allowed regardless of whether they
were integers or memory addresses.

233
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Influence on C: BCPL is historically significant because it influenced the design of B,


which in turn influenced the development of C, a language that introduced a basic type
system.

Why did BCPL not have types?


BCPL was designed to be a simple and efficient systems programming language for writing
compilers and operating systems, so it abstracted away much of the complexity by treating data
uniformly. Its lack of types made it flexible and allowed direct manipulation of memory, which
was useful for low-level programming tasks, but also error-prone.

Here's a simple example of a BCPL program that highlights its typeless nature. This program
swaps two values in memory and prints them, demonstrating how BCPL treats everything as
raw machine words without regard for types like integers, pointers, or characters.

GET "libhdr"

LET start() = VALOF


{
// Define two variables (treated as memory words)
LET a = 10
LET b = 20

// Print original values


writef("Before swap: a = %n, b = %n\n", a, b)

// Swap the values of a and b


LET temp = a
a := b
b := temp

// Print swapped values


writef("After swap: a = %n, b = %n\n", a, b)

RESULTIS 0
}

Key Points:

1.​ No Explicit Types: There are no type declarations for a and b. They are just treated as
memory locations that store machine words. Here, they are used as integers, but they
could just as easily store pointers or other data.
2.​ Generic Assignments: The swap operation (a := b) is completely typeless. The
program doesn't care whether a and b are integers, pointers, or any other kind of data. It
simply moves raw machine words around.
3.​ Printing Values: The writef function is used to print the values. %n is used to indicate
a numeric value, but BCPL doesn't internally enforce that a and b are numbers. The %n
is more about interpreting the memory word as a number during output.

234
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, what are the advantages/disadvantages of the BCPL approach?

●​ No Type Safety: You can assign anything to a or b, and there’s no type checking. In
modern languages, swapping two variables would require that they have compatible
types. In BCPL, however, everything is just a word of memory.
●​ Flexibility: This flexibility allows you to perform operations directly on memory, but at the
cost of potential errors, since there is no protection against assigning the "wrong type" of
data (e.g., trying to add a pointer to an integer).

BCPL’s typeless nature made it very powerful for low-level programming, but also error-prone,
as the programmer had to manually ensure that memory was used correctly.

Does the Turing Machine have types?


A Turing Machine is a mathematical model of computation introduced by Alan Turing in 1936. It
consists of an infinite tape, a head that reads and writes symbols on the tape, and a set of
states that dictate how the machine operates.
●​ No Types: In its original formulation, the Turing Machine doesn't have a notion of types.
The symbols written on the tape are just abstract symbols, and the machine's behavior
depends on its current state and the symbol it reads. The model is purely concerned with
symbol manipulation rather than data types like integers or strings.
●​ Typeless Nature: The tape and the machine's operations are all abstract and generic. A
symbol on the tape could represent anything depending on the context of the problem
being modeled. There is no notion of "type safety" or type checking in a Turing Machine.

What about the Lambda Calculus?


Lambda Calculus is a formal system for describing functions and computation, introduced by
Alonzo Church in the 1930s. It forms the theoretical foundation of functional programming
languages. It consists of three basic elements: variables, abstraction (functions), and application
(function calls).
●​ No Types in Untyped Lambda Calculus: In its original form, Lambda Calculus is
typeless. The terms (expressions) in Lambda Calculus are not assigned types, and
there are no restrictions on what kind of argument a function can take or what it can
return. A function can take any expression and return any expression.​
For example, in untyped lambda calculus, you can write: λx. x x This expression
applies the function x to itself, and there's no type restriction to prevent this—even
though applying a function to itself could cause logical issues (as in the famous
self-application paradox).
●​ Typed Lambda Calculus: Over time, typed versions of Lambda Calculus were
introduced (e.g., Simply Typed Lambda Calculus) to prevent certain paradoxes and
provide more structure. In typed variants, every term must adhere to a specific type,
such as a function from integers to integers, or a function from a boolean to an integer.

235
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

While both the Turing Machine and Lambda Calculus are extremely powerful and can
theoretically compute anything computable (per the Church-Turing thesis), they do not deal with
types in their original forms. Their typeless nature reflects their role as abstract models of
computation rather than practical programming languages that deal with real-world constraints
like memory safety or data type compatibility.

So, every modern programming language has a type system. Why are
types so important?
Types are important in modern programming languages because they enhance safety,
readability, performance, and correctness. They allow programmers to reason about
programs more effectively, avoid common errors, and ensure efficient and reliable execution.
While early languages (like BCPL or the Turing Machine) were typeless, types have become a
core part of programming practice as languages have evolved to handle increasingly complex
systems. Here's why types are so important:

1. Error Prevention and Safety

●​ Type errors (e.g., adding a number to a string) can cause programs to behave
unexpectedly or even crash. A type system helps detect these errors early—either at
compile time (in statically typed languages) or at runtime (in dynamically typed
languages). This prevents bugs and reduces the likelihood of unpredictable behavior.
●​ Memory safety: Types can enforce rules that prevent common errors like accessing
invalid memory locations (e.g., null pointer dereferences), buffer overflows, or
out-of-bounds errors.

2. Clarity and Maintainability

●​ Types document the program: By specifying the types of variables, functions, and data
structures, types act as a form of documentation. This makes it easier to understand
what the program does, especially when working in large codebases or teams.
●​ Explicit contracts: Types serve as a contract that describes the expected input and
output of functions, making programs more readable and reducing ambiguity.

3. Optimization and Performance

●​ Compiler optimizations: Knowing the types of variables allows the compiler to make
assumptions that lead to more efficient code. For instance, knowing that a variable is an
integer rather than a floating-point number allows the compiler to use more efficient
machine instructions.
●​ Memory management: Types help the compiler allocate memory efficiently. For
example, a compiler knows exactly how much space to reserve for an int versus a
float or a custom data structure.

236
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Example: In languages like C or Rust, knowing the size and layout of types at compile time
allows for stack allocation and optimized memory usage.

4. Preventing Ambiguity and Undefined Behavior

●​ Avoiding unintended behavior: Without types, it’s easy to make mistakes like applying
inappropriate operations on data. Types prevent this by enforcing strict rules on how
data can be used.
●​ Defined behavior: Languages with strong type systems (like Haskell or Rust) often
guarantee that well-typed programs cannot crash due to type-related issues. This avoids
many runtime surprises.

Example: In Rust, the type system ensures memory safety through ownership and borrowing,
preventing common issues like dangling pointers or race conditions without needing a garbage
collector.

5. Support for Abstraction and Polymorphism

●​ Generics and polymorphism: Modern type systems support generic programming,


where algorithms are written once and can operate on different types. For instance, in
languages like Java or Haskell, you can write a function that works with any data type,
as long as certain type constraints are met.
●​ Higher-level abstractions: Types allow programmers to work with high-level concepts
(e.g., collections, functions) while the compiler handles low-level details.

6. Modularity and Code Reuse

●​ Types support building reusable components. Strong type systems encourage


modularity, allowing developers to build reusable, type-safe libraries that can work
across a range of applications.
●​ With interfaces or traits, you can define abstract behaviors that can be implemented by
different types, leading to more flexible and reusable code structures.

7. Enabling Type-Driven Development

●​ Type-driven design: In languages with expressive type systems like Haskell or Idris,
the type system can guide program design. You can define complex behaviors and
invariants directly in the type system, ensuring correctness by construction.
●​ Type-level programming: Some advanced languages use types as a powerful
abstraction tool. In dependent types, for instance, types can encode and enforce
arbitrary properties about data, guaranteeing correctness beyond traditional type safety.

237
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you show an example of errors that types prevent?


Consider the Python program below:

>>> x = 5
>>> y = "four"
>>> x + y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>

This example shows a simple and effective way to illustrate how types prevent common
programming errors by catching them up. In this case:

●​ x = 5;: x is initialized as an integer, so it can hold numeric values.


●​ y = "hello";: y is initialized as a string, so it can hold textual data.
●​ x + y;: This line tries to add a number (int) to a string (String), which doesn't make
sense logically. The Python runtime recognizes this type mismatch and produces an
error before the program runs any further.

Can you show an example of how types prevent accessing invalid


memory?
C allows direct memory manipulation, and if you’re not careful, you can access invalid memory
locations, leading to undefined behavior. Here's an example of accessing an invalid memory
location due to a dangling pointer:

#include <stdio.h>
#include <stdlib.h>

int* createArray() {
int arr[5] = {1, 2, 3, 4, 5}; // Local array (on stack)
return arr; // Returns pointer to a local array
}

int main() {
// ptr now points to invalid memory (stack frame has been destroyed)
int* ptr = createArray();

// Accessing invalid memory location


printf("%d\n", ptr[0]); // Undefined behavior: reading from a dangling pointer
return 0;
}

The function createArray creates an array on the stack. When the function returns, the stack
frame is destroyed, and the pointer ptr refers to invalid memory. Accessing ptr[0] can lead to
undefined behavior, including crashes or corrupt data, because the program is trying to read
from an invalid memory location.

238
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Now, if you were to write a program that tries to return a reference to local memory (like the C
example), Rust’s compiler would give an error:

fn create_array() -> &Vec<i32> { // Returning a reference to a local variable


let arr = vec![1, 2, 3, 4, 5];
&arr // Error: returning a reference to a local variable
}

Concluding:
●​ In C, the lack of strong type safety allows for dangerous operations, such as accessing
invalid memory locations through dangling pointers, leading to undefined behavior.
●​ In Rust, the type system enforces strict rules about ownership and borrowing,
preventing such errors by ensuring memory is valid whenever accessed. Rust’s
ownership system and borrow checker help guarantee memory safety at compile
time, preventing issues like dangling pointers, buffer overflows, and other
memory-related bugs common in languages like C.

What is a type system?


A type system in a programming language is a set of rules that assigns types to various
components of a program, such as variables, expressions, functions, and modules. The primary
purpose of a type system is to ensure that programs behave correctly by preventing certain
kinds of errors, such as trying to add a number to a string or calling a function with the wrong
number of arguments.

Do you remember our language of arithmetic expressions? Let's augment it


with boolean types.
Here you can find a set of semantic rules that add boolean types to our language of
expressions.

239
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you implement these rules in our Python interpreter? Start by defining
classes for the new kinds of expressions.
Find below an implementation of classes for the different types of expressions.

import sys
from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def accept(self, visitor):
pass

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier
def accept(self, visitor, arg):
return visitor.visit_var(self, arg)

class Num(Expression):
def __init__(self, num):
self.num = num
def accept(self, visitor, arg):
return visitor.visit_num(self, arg)

class Bln(Expression):
def __init__(self, bln):
self.bln = bln
def accept(self, visitor, arg):
return visitor.visit_bln(self, arg)

240
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right

class Add(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_add(self, arg)

class Sub(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_sub(self, arg)

class Mul(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_mul(self, arg)

class Div(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_div(self, arg)

class And(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_and(self, arg)

class Lth(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_lth(self, arg)

class Let(Expression):
def __init__(self, identifier, exp_def, exp_body):
self.identifier = identifier
self.exp_def = exp_def
self.exp_body = exp_body
def accept(self, visitor, arg):
return visitor.visit_let(self, arg)

class IfThenElse(Expression):
def __init__(self, cond, e0, e1):
self.cond = cond
self.e0 = e0
self.e1 = e1
def accept(self, visitor, arg):
return visitor.visit_ifThenElse(self, arg)

Now, implement a visitor that evaluates expressions.


class VisitorEval:
def visit_var(self, var, env):
if var.identifier in env:
return env[var.identifier]
else:
sys.exit(f"Variavel inexistente {var.identifier}")

def visit_num(self, num, env):

241
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return num.num

def visit_bln(self, exp, env):


return exp.bln

def visit_add(self, add, env):


v0 = add.left.accept(self, env)
v1 = add.right.accept(self, env)
return v0 + v1

def visit_sub(self, sub, env):


v0 = add.left.accept(self, env)
v1 = add.right.accept(self, env)
return v0 - v1

def visit_mul(self, mul, env):


v0 = add.left.accept(self, env)
v1 = add.right.accept(self, env)
return v0 * v1

def visit_div(self, div, env):


v0 = add.left.accept(self, env)
v1 = add.right.accept(self, env)
return v0 // v1

def visit_and(self, exp, env):


v0 = exp.left.accept(self, env)
if v0:
return exp.right.accept(self, env)
return False

def visit_lth(self, exp, env):


v0 = exp.left.accept(self, env)
v1 = exp.right.accept(self, env)
return v0 < v1

def visit_ifThenElse(self, exp, env):


cond = exp.cond.accept(self, env)
if cond:
return exp.e0.accept(self, env)
else:
return exp.e1.accept(self, env)

def visit_let(self, let, env):


v0 = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier] = v0
return let.exp_body.accept(self, new_env)

242
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The visit of and expressions above implements short-circuit semantics.


What's this?
The visit_and method in your VisitorEval class indeed implements short-circuit
semantics for the and operation. Short-circuit evaluation is a programming concept used in
Boolean logic (e.g., and, or) where the second operand of a logical operation is evaluated only
if necessary. The key idea is that once the result of the logical operation is determined from the
first operand, there's no need to evaluate the second operand. This behavior helps avoid
unnecessary computations and can prevent errors if the second operand could cause issues
(like division by zero, null dereference, etc.).

def visit_and(self, exp, env):


v0 = exp.left.accept(self, env) # Evaluate the left side of the 'and'
if v0: # If the left side is True
return exp.right.accept(self, env) # Only then evaluate the right side
return False # If the left side is False, return False immediately

1.​ First Operand (v0): The left side (exp.left.accept(self, env)) is evaluated first.
2.​ If v0 is False: The method immediately returns False without evaluating the right side.
This is because, in an and operation, if one operand is False, the whole expression is
False regardless of the other operand.
3.​ If v0 is True: Only then does it proceed to evaluate the right side
(exp.right.accept(self, env)) because the result of the and operation now
depends on the right-hand side.

Why is short-circuit semantics useful?


1.​ Efficiency: Short-circuiting saves computational resources by not evaluating the second
operand when it isn't necessary.
2.​ Prevention of Errors: It can prevent runtime errors. For example, if the second operand
involves a risky computation (e.g., division by zero or accessing a null object),
short-circuiting can prevent such errors.
For example:

if x != 0 and (10 / x) > 1:


# Safe to divide by x because the first condition prevents x from being zero.

In this case, the division by x only occurs if x != 0. Without short-circuit evaluation, you could
end up dividing by zero and causing a runtime error.

In your visitor pattern, the visit_and method is a classic example of short-circuit evaluation
applied in the context of evaluating boolean expressions.

243
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What happens if we use values of incompatible types in our interpreter?


We might end up observing results that don't really make sense. Examples:

>>> e = Lth(Num(3), Bln(False))


>>> v = VisitorEval()
>>> e.accept(v, {})
False

>>> e = And(Bln(True), Num(4))


>>> e.accept(v, {})
4

In the current implementation of the visitor, the methods assume that the operands are of the
correct types but do no type checking. Here are the problems in the specific examples you
provided:

1.​ Example 1: Lth(Num(3), Bln(False))


a.​ Lth is a comparison operator (<), expecting both operands to be numbers.
b.​ Here, however, the left operand is a number (Num(3)), while the right operand is
a boolean (Bln(False)).
c.​ In the current visit_lth method, no type check occurs, so it tries to evaluate 3
< False. In Python, this results in False (since False is implicitly treated as
0), but this is illogical behavior for mixed types.
2.​ Example 2: And(Bln(True), Num(4))
a.​ The And operation expects both operands to be booleans.
b.​ Here, the left operand is True, but the right operand is a number (4).
c.​ Since the visit_and method only checks the truth value of the left operand
(True), it evaluates and returns 4 as the result, which makes no sense for a
logical and operation.

How can we fix these problems?


To make the visitor type-safe, we need to perform explicit type checking in each method to
ensure that the types of the operands are valid for the operation being performed. You can add
a helper function to check types and raise an error when an invalid type is encountered. Now,
when you try to run the same expressions that previously led to incorrect results, you'll get
explicit error messages. Something like:

>>> type_safe_eval(And(Bln(True), Num(4)))


Type error in And: expected bool, got int
>>> type_safe_eval(And(Bln(True), Num(False)))
False

>>> type_safe_eval(Lth(Bln(True), Num(4)))

244
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Type error in Less Than: expected int, got bool


>>> type_safe_eval(Lth(Bln(1), Num(4)))
True

Can you implement a VisitorTypeSafeEval that crashes gracefully?


class VisitorTypeSafeEval:
def visit_var(self, var, env):
if var.identifier in env:
return env[var.identifier]
else:
sys.exit(f"Variable not found {var.identifier}")

def visit_num(self, num, env):


return num.num

def visit_bln(self, exp, env):


return exp.bln

def visit_add(self, add, env):


v0 = add.left.accept(self, env)
v1 = add.right.accept(self, env)
self.ensure_type(v0, int, "Addition")
self.ensure_type(v1, int, "Addition")
return v0 + v1

def visit_sub(self, sub, env):


v0 = sub.left.accept(self, env)
v1 = sub.right.accept(self, env)
self.ensure_type(v0, int, "Subtraction")
self.ensure_type(v1, int, "Subtraction")
return v0 - v1

def visit_mul(self, mul, env):


v0 = mul.left.accept(self, env)
v1 = mul.right.accept(self, env)
self.ensure_type(v0, int, "Multiplication")
self.ensure_type(v1, int, "Multiplication")
return v0 * v1

def visit_div(self, div, env):


v0 = div.left.accept(self, env)
v1 = div.right.accept(self, env)
self.ensure_type(v0, int, "Division")
self.ensure_type(v1, int, "Division")
return v0 // v1

def visit_and(self, exp, env):


v0 = exp.left.accept(self, env)
self.ensure_type(v0, bool, "And")
if v0:
v1 = exp.right.accept(self, env)
self.ensure_type(v1, bool, "And")
return v1

245
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return False

def visit_lth(self, exp, env):


v0 = exp.left.accept(self, env)
v1 = exp.right.accept(self, env)
self.ensure_type(v0, int, "Less Than")
self.ensure_type(v1, int, "Less Than")
return v0 < v1

def visit_ifThenElse(self, exp, env):


cond = exp.cond.accept(self, env)
self.ensure_type(cond, bool, "If-Then-Else Condition")
if cond:
return exp.e0.accept(self, env)
else:
return exp.e1.accept(self, env)

def visit_let(self, let, env):


v0 = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier] = v0
return let.exp_body.accept(self, new_env)

def ensure_type(self, value, expected_type, operation_name):


if not type(value) == expected_type:
op_str = f"Type error in {operation_name}"
ex_str = f"expected {expected_type.__name__}"
gt_str = f"got {type(value).__name__}"
raise TypeError(f"{op_str}: {ex_str}, {gt_str}")

def type_safe_eval(e):
v = VisitorTypeSafeEval()
try:
return e.accept(v, {})
except TypeError as tp_error:
print(tp_error)

Would this code work if your implementation of ensure_type were using


isinstance?

Let's imagine that we had used the following implementation:

def ensure_type(self, value, expected_type, operation_name):


if not isinstance(value, expected_type):
op_str = f"Type error in {operation_name}"
ex_str = f"expected {expected_type.__name__}"
gt_str = f"got {type(value).__name__}"
raise TypeError(f"{op_str}: {ex_str}, {gt_str}")

246
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Using isinstance(value, int) would allow booleans to pass through since, in Python,
bool is a subclass of int. This can cause unintended behavior, especially when strict type
checking is desired, such as when you're distinguishing between integers and booleans.

By switching to type(value) == expected_type, we ensure strict type matching without


allowing implicit conversions. This prevents bool values from being treated as int, which is
what you want for a type-safe evaluation.

Here's an example:

v = VisitorEval()

# Old implementation:
e = Lth(Num(3), Bln(True)) # Could have passed because bool is a subclass of int
e.accept(v, {}) # Would give False instead of a type error

# New strict check implementation:


e = Lth(Num(3), Bln(True))
e.accept(v, {}) # Raises a TypeError

We have implemented a type-safe dynamically typed language. Why is that


the case?
The way we’ve structured the VisitorEval class with the ensure_type checks resembles
how interpreters for dynamically typed but type-safe languages operate.

Here's why:

1.​ Dynamic Typing: In dynamically typed languages like Python, JavaScript, or Ruby,
types are determined at runtime rather than at compile time. Similarly, in our interpreter,
we don't know the types of expressions until they are evaluated. The type-checking is
deferred until execution time, where the ensure_type function verifies whether the
value matches the expected type.
2.​ Type Safety: Despite being dynamically typed, our interpreter enforces type safety.
Type safety means that the program will not perform operations on data of incompatible
types. For instance, if we attempt to compare a number with a boolean, or add a string to
an integer, our interpreter throws a TypeError, similar to how dynamically typed,
type-safe languages handle such cases.​
Many dynamically typed languages provide runtime checks to prevent type errors:
○​ In Python, adding an integer to a string (1 + "hello") raises a TypeError.
○​ In Ruby, trying to add a string and integer ("1" + 1) will also result in an error.
3.​ Run-time Errors: Unlike statically typed languages (e.g., Java, C#), which catch type
errors at compile time, our interpreter (like dynamically typed languages) detects type
mismatches during execution. But since we're ensuring type correctness at runtime, we
maintain a form of dynamic type safety.

247
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How does our interpreter implement Dynamic Typing with Type Safety?
Here’s how the flow works in our interpreter:

●​ Type inference at runtime: When the interpreter evaluates an expression, the type of
each value is determined at the moment it’s executed, much like how dynamically typed
languages work.
●​ Runtime checks: The ensure_type method acts like a type checker that runs during
execution, ensuring that operations (like +, <, and) are only performed on compatible
types.

Example Comparison:

●​ Python (a dynamically typed language):

x = 5
y = "hello"
x + y # TypeError: unsupported operand type(s) for +: 'int' and 'str'

●​ Our Interpreter:

e = Add(Num(5), Var("y")) # Suppose y has the value "hello"


v = VisitorEval()
env = {"y": "hello"}
e.accept(v, env) # Raises a TypeError due to incompatible types

Both examples demonstrate runtime detection of type errors, ensuring type safety despite the
dynamic typing. The interpreter we've created behaves like an interpreter for a dynamically
typed language with runtime type safety. It allows flexibility in how expressions are typed but
enforces correctness during execution, preventing invalid operations from proceeding, which is
the hallmark of type-safe dynamic languages!

So, the more "Dynamic" a language is, the more checks happen at running
time? In this case, are there languages more or less "Dynamic"?
Yes, that makes some sense. You're describing an important aspect of the distinction between
dynamically and statically typed languages. In dynamically typed languages, type information is
not known at compile time, so type checks must be performed at runtime, as in your example of
the ensure_type method. This results in a lot of conditional checks being required to ensure
type safety during execution. In contrast, statically typed languages check and enforce types at
compile time, meaning that such runtime checks are usually unnecessary.

This difference can be viewed as a spectrum:

●​ Very static languages (like C, Rust, or Haskell) rely heavily on compile-time type
checking. Once compiled, these languages assume that the types are correct, and no

248
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

additional type checks are performed during execution. This leads to more efficient code,
but it also means the programmer has to provide all the necessary type information
upfront.
●​ Very dynamic languages (like Python, Ruby, or JavaScript) allow you to write code
without declaring types explicitly, but this flexibility requires more runtime checks,
especially when operations depend on specific types (like division requiring integer
operands). These checks add some overhead and potential for runtime type errors.
●​ Languages in between (like Java or C#) have a combination of both compile-time and
runtime checks. For example, Java uses a static type system, but due to features like
reflection or casting, there can still be situations where runtime type checks are
necessary, leading to a mix of static and dynamic behavior.

So, yes, you can think of a spectrum based on how much a language relies on runtime type
checking versus compile-time type enforcement. The more dynamic a language is, the more
frequently runtime checks like ensure_type are needed to maintain safety and correctness.
Conversely, the more static a language, the fewer such checks occur at runtime, because the
compiler ensures type correctness ahead of time.

It seems that there are two categories of programming languages:


type-safe and type-unsafe. What's the difference between them?
Programming languages can be categorized into type-safe and type-unsafe languages, based
on how they handle types and prevent errors. The key difference lies in whether or not the
language prevents operations that misuse types, ensuring program correctness.

Type-Safe Languages

Type-safe languages ensure that operations are only performed on data of compatible types,
either at compile time or runtime. This prevents certain classes of errors, such as trying to add a
string to an integer or accessing invalid memory.

Features of Type-Safe Languages:

1.​ Prevention of Type Errors: Type-safe languages guarantee that you cannot perform
illegal operations on types (like adding a string and an integer, or accessing memory
outside the bounds of an array).
2.​ Static or Dynamic Type Checking:
○​ Statically Typed: In statically typed type-safe languages, like Java or Haskell,
type errors are caught at compile time. The program won’t run if there's a type
mismatch.
○​ Dynamically Typed: In dynamically typed type-safe languages, like Python or
Ruby, type checks happen at runtime. The program will raise an error if it
encounters a type mismatch during execution.

249
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Memory Safety: Type-safe languages often provide memory safety by ensuring that
data is accessed in a controlled manner, preventing issues like buffer overflows, invalid
pointer dereferencing, or use-after-free errors.
4.​ Examples:
○​ Statically typed and type-safe: Java, C#, Haskell
○​ Dynamically typed and type-safe: Python, Ruby, JavaScript (these raise
runtime errors for invalid type operations)

Benefits:

●​ Fewer runtime errors: The type system catches many potential errors either at compile
time or runtime.
●​ Clearer code: Types act as documentation, helping developers understand the kind of
data being manipulated.
●​ Memory protection: Prevents common memory-related bugs such as accessing
unallocated memory or buffer overflows.

Type-Unsafe Languages

Type-unsafe languages allow operations on incompatible types without necessarily preventing


or detecting the error. This can lead to undefined behavior, incorrect results, or even crashes,
especially when dealing with low-level memory operations.

Features of Type-Unsafe Languages:

1.​ No Strong Type Checking: In type-unsafe languages, you can perform operations on
incompatible types without any guarantee that the types are correct. The language won't
prevent you from adding incompatible types or accessing invalid memory.
2.​ Direct Memory Access: Type-unsafe languages often allow direct manipulation of
memory, which can lead to security vulnerabilities like buffer overflows, memory leaks, or
crashes. This is common in low-level languages where performance and control over
memory are prioritized.
3.​ Pointer Arithmetic: Some type-unsafe languages, like C or assembly, allow pointer
arithmetic and raw memory manipulation, which can lead to accessing uninitialized or
invalid memory locations.
4.​ Examples:
○​ C: While C has types, it doesn’t enforce strong type checking in all scenarios. For
example, casting pointers between incompatible types can lead to undefined
behavior.
○​ Assembly language: Type-safety is absent, as operations directly manipulate
memory addresses, with no distinction between different types of data.

Risks:

250
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Undefined behavior: Performing invalid operations may result in unpredictable


behavior, such as crashes or incorrect results.
●​ Memory issues: Errors like buffer overflows, segmentation faults, and memory leaks
can occur due to uncontrolled memory access.

Example in C (Type-Unsafe):

int x = 5;
char* ptr = (char*)&x;
*ptr = 'A'; // Modifying the memory where 'x' is stored, treating it as a char

This code treats the memory of an int variable as a char, which can result in undefined
behavior, a type of problem that would be prevented in a type-safe language.

Key Differences:
Type-Safe Languages Type-Unsafe Languages

Prevent operations on incompatible types Allow operations on incompatible types

Enforce type checking at compile time or runtime No strict type enforcement

Help ensure memory safety (no buffer overflows, Allow direct memory manipulation and
invalid accesses) pointer arithmetic

Raise errors when type mismatches occur Undefined behavior or incorrect results

Example: Java, Python, Haskell Example: C, Assembly

Why does Type Safety matter?


Type safety is important because it prevents bugs and improves program reliability. In
type-unsafe languages, unchecked type mismatches can lead to hard-to-find bugs, memory
corruption, and security vulnerabilities. By enforcing type safety, languages reduce the likelihood
of these issues, making software more robust and secure.

However, type-unsafe languages are still widely used in systems programming (like C), where
low-level control over memory and performance is essential. These languages trade type safety
for finer control, but they require more careful programming and thorough testing to avoid errors.

Conclusion:

The primary distinction is that type-safe languages catch type mismatches and prevent
undefined behavior, while type-unsafe languages allow greater flexibility but at the cost of
potential bugs and unpredictable outcomes. Type safety is a crucial concept that helps
maintain program correctness and security.

251
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

But there should be advantages in type-unsafe languages also, right?


While type-unsafe languages, like C, come with risks, they also offer distinct advantages,
especially in terms of performance, flexibility, and low-level control. Here's a breakdown of
the key advantages of type-unsafe languages and why they remain widely used, especially in
systems programming:

1. Performance Efficiency

One of the biggest advantages of type-unsafe languages is runtime performance. Since they
don’t enforce type checks or other safety mechanisms at runtime, they can execute faster.
Every additional type check, dynamic dispatch, or safety guarantee in type-safe languages adds
some overhead.

●​ C's Efficiency: For instance, C doesn’t check whether an array index is within bounds or
whether pointers point to valid memory locations. This lack of overhead makes C highly
efficient, which is why it’s commonly used for systems programming (like operating
systems or embedded systems) where performance is critical.

2. Low-Level Memory Control

Type-unsafe languages give you direct access to memory, enabling fine-grained control over
how data is stored and manipulated. This control is particularly valuable when writing code that
interacts with hardware or system resources, such as:

●​ Pointer Arithmetic: In C, pointers can be directly manipulated to control memory


locations, making it easier to implement data structures like linked lists, trees, or even
perform memory-mapped I/O.
●​ Memory Management: Without the overhead of garbage collection or automatic
memory management, programmers in C can allocate and free memory directly, leading
to more efficient memory usage in resource-constrained environments (e.g., embedded
systems).

3. Flexibility and Power

In type-unsafe languages, you can bend the rules in ways that type-safe languages would
prevent. This flexibility allows you to write code that may be highly optimized or implement
complex data structures more easily. For example:

●​ Type Casting: In C, you can cast variables of one type to another, even if they are not
related. While this can lead to errors if misused, it also offers the ability to work with raw
data or hardware interfaces more flexibly.
●​ Union Types: C allows the use of unions, where multiple data types can share the
same memory space. This is useful in situations where you need to pack different types
of data into a smaller memory footprint.

252
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4. Bare-Metal Programming

Type-unsafe languages like C are commonly used in embedded systems and real-time
systems because they allow you to write bare-metal code — code that runs directly on the
hardware without an intervening layer of abstraction like a virtual machine or runtime system.
This is important for:

●​ Writing Device Drivers: You can directly interact with hardware registers,
memory-mapped I/O, and interrupt handlers, which would be hard to achieve in
type-safe languages.
●​ Operating System Kernels: Operating systems like Linux and Windows are often
written in C because it provides the control needed for tasks like memory management,
scheduling, and hardware control.

5. No Type System Limitations

Type-safe languages can sometimes impose restrictions that make certain programs hard or
impossible to write. Type-unsafe languages give you the freedom to write programs without
being blocked by the limitations of a strict type system. This can be beneficial when you want to:

●​ Write Low-Level Optimizations: In C, you can optimize code in ways that would be
difficult or impossible in languages with a strict type system. For example, casting
between types or manipulating memory directly allows you to fine-tune performance.
●​ Handle Generic Data: In C, you can write generic code by using void* pointers or
unions to work with data of any type, without needing to use templates or type-safe
generics.

Trade-offs and Risks

While these advantages make type-unsafe languages like C appealing in many contexts, they
come with serious trade-offs:

●​ Safety Risks: The lack of runtime checks can lead to dangerous bugs (e.g., buffer
overflows, memory corruption, null pointer dereferencing), which are harder to detect
and can cause security vulnerabilities.
●​ Complex Debugging: Without type-safety guarantees, debugging errors caused by
incorrect memory access or type mismatches can be difficult and time-consuming.
●​ Manual Memory Management: The absence of garbage collection in C means that
developers need to manage memory manually. While this can lead to efficient memory
usage, it also increases the risk of memory leaks and dangling pointers.

Balancing Flexibility and Safety

There is always a balance between performance/flexibility and safety/abstraction. Type-safe


languages, while sometimes limiting, protect developers from many common errors. However,

253
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

type-unsafe languages offer power and control that is essential for certain low-level
programming tasks, especially in performance-critical or hardware-specific applications.

Conclusion

Type-unsafe languages like C continue to be widely used because they offer:

1.​ Better performance due to the lack of runtime checks.


2.​ Greater control over memory and low-level operations.
3.​ Flexibility in dealing with various data types and system resources.

However, the risks of type-unsafe languages, such as memory errors and security
vulnerabilities, require careful programming practices. Type-safe languages may restrict certain
operations, but they offer robustness and security, reducing the likelihood of runtime errors.

So, while both types of languages have their place, the choice of language depends on the
trade-offs between performance needs and safety guarantees.

Can you show me some real-world implementation of runtime type


verification rules?
Yes! Let's take a look into the implementation of the Lua Virtual Machine, which is publicly
available in the Lua repository. To have an idea about how types can be enforced at running
time, try to find out the implementation of these two methods:

1. luaV_objlen:

This function handles the # (length) operator for different data types.

●​ Table check: It first checks if rb (the right operand) is a table.


●​ Metamethod check for custom length: If it's a table, it tries to find a metamethod
(TM_LEN) that defines custom behavior for getting the length.
●​ Error handling for non-supported types: If it's not a table or a string (which has a
built-in length), and there's no custom metamethod, it calls luaG_typeerror to report
an error ("get length of"). This highlights how Lua raises a type error if an operation is not
supported for the given type.

2. luaV_arith:

This function handles arithmetic operations (+, -, etc.).

●​ Attempt to convert to numbers: It attempts to convert both operands (rb and rc) to
numbers using luaV_tonumber. If successful, it performs the arithmetic operation
using luaO_arith.

254
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Metamethod fallback: If conversion fails (operands are not numbers), it tries to call a
binary metamethod (based on the operation) using call_binTM.
●​ Error handling for invalid types: If neither conversion nor a metamethod works, it calls
luaG_aritherror to report an error.

Key Takeaways:

●​ Both examples showcase how Lua prioritizes type conversion: First attempts to convert
values to expected types (e.g., numbers for arithmetic).
●​ These functions showcase the two primary ways Lua handles type mismatches at
runtime:
○​ Raising errors: When a type mismatch is encountered and no alternative action
is available (e.g., luaG_typeerror, luaG_aritherror).
○​ Metamethod fallback: Allowing custom behavior through metamethods for
specific types (e.g., call_binTM).

Further Exploration:

●​ Look at the implementations of luaG_typeerror and luaG_aritherror to


understand how they generate the actual error messages.
●​ Explore other functions within lvm.c like luaV_concat (string concatenation) to see
how different operations handle type checks.
●​ Remember that these are just a few examples. Many other functions within the Lua VM
enforce type safety in similar ways.

Do you know which popular programming languages started using types?


Well, Fortran, introduced in the 1950s, had a primitive form of types, but its type system was
rudimentary by modern standards. Fortran's types were essentially tied to the sizes of numeric
values (e.g., INTEGER, REAL) and were designed for efficiency on the hardware of the time,
with implicit typing based on variable naming conventions (I-N for integers, others for reals).
While it introduced the notion of types to programming languages, it didn’t establish the
foundations for the rich type systems we think of today.

Algol’s Contributions to Type Systems

Algol 60 (introduced in 1960) was a major step forward in formalizing types. It introduced:

●​ Explicit type declarations, requiring variables to have types like INTEGER or REAL,
which improved clarity and safety.
●​ Scope rules that connected types to block-structured programming, influencing how
types interact with the language's scoping and control structures.
●​ The idea of type ascription for parameters, making type checking at least partially
formalized.

255
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

While Algol didn’t introduce a fully general type system, it laid the groundwork for structured
programming and type-related ideas that heavily influenced later languages. Algol's design was
instrumental in showing how types could contribute to both the design of programming
languages and their formal semantics.

The Language That Really Advanced Types: ML

The programming language ML (Meta Language), developed in the 1970s by Robin Milner,
was arguably the most influential language in shaping modern type systems. ML introduced:

1.​ Hindley-Milner Type Inference:


○​ ML’s type system automatically inferred the types of expressions, enabling static
typing without requiring explicit type annotations.
○​ This approach made type systems both expressive and user-friendly, setting a
precedent for many modern statically typed functional programming languages,
like Haskell.
2.​ Parametric Polymorphism:
○​ ML allowed generic definitions (e.g., of functions and data structures), a concept
foundational to modern programming languages.

Other Key Languages Influencing Types

1.​ Simula 67:


○​ Often regarded as the first object-oriented programming language, Simula
introduced the idea of classes and objects with types tied to these constructs.
This influenced type systems in later OOP languages, like Smalltalk, C++, and
Java.
2.​ Ada:
○​ Ada, developed in the 1970s and 1980s, had a rich and highly structured type
system designed for correctness in mission-critical applications (e.g., avionics). It
emphasized strong typing and type safety, influencing later languages like Pascal
and Modula.
3.​ Smalltalk:
○​ Introduced dynamic typing as a core feature of object-oriented languages,
providing a contrasting perspective to static type systems. It wasn’t focused on
compile-time type safety but emphasized runtime flexibility.
4.​ Pascal:
○​ Pascal, introduced by Niklaus Wirth in the 1970s, brought structured type
definitions, record types, and strong typing, heavily influencing educational
programming and laying the foundation for languages like Modula and Ada.

Modern Perspective on Type Systems

Today’s type systems combine ideas from multiple languages:

256
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Static type inference (e.g., in Haskell, Rust, and Kotlin) owes much to ML.
●​ Dynamic typing (e.g., in Python, Ruby, JavaScript) builds on ideas explored in
Smalltalk.
●​ Type polymorphism and advanced features like dependent types (e.g., in Coq and
Idris) push type theory into formal verification and proof systems.

Was it Pascal the language that introduced this notion of "Strong Typing"?
Oh, no, not really! While Pascal is often associated with "strong typing," the term itself can be
ambiguous and context-dependent. Let me elaborate on Pascal's role and clarify whether it truly
"introduced" strong typing.

Pascal and Strong Typing

Introduced by Niklaus Wirth in 1970, Pascal was designed to teach structured programming
and good software engineering principles. Its type system emphasized:

1.​ Explicit type declarations: Variables and function parameters required clear type
annotations.
2.​ Strict type compatibility: Operations were only allowed on values of the same type
unless explicit casting was used.
3.​ User-defined types: Pascal allowed programmers to define their own types, such as
enumerated types (type Color = (Red, Green, Blue);) or records (akin to
structs), enabling more precise modeling of problems.
4.​ Type safety in arrays: Array bounds were enforced (at least in theory; some
implementations didn’t), catching errors at compile time or runtime.

These features certainly advanced the concept of strong typing, making Pascal a significant
milestone in type system design. However, Pascal did not "invent" strong typing, as similar
ideas existed in earlier languages like Algol 60.

Did Pascal Introduce Strong Typing?

Not quite. Here's why:

1.​ Algol 60’s Influence: Algol 60 already required explicit type declarations and enforced
type-checking at compile time. Pascal inherited much of its type system philosophy from
Algol 60, so it didn’t introduce strong typing in a vacuum.
2.​ Fortran's Rudimentary Typing: Fortran had basic type safety rules (e.g., distinguishing
between integers and floating-point numbers), but its type system allowed implicit type
conversions, making it weaker than Algol’s.
3.​ Ada and Later Innovations: Ada, which came after Pascal, took strong typing even
further, introducing more rigorous type constraints, such as type ranges and modular
types.

257
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What Made Pascal Special?

While Pascal didn’t invent strong typing, it did popularize and refine the concept by:

●​ Introducing type safety into a language explicitly designed for teaching and structured
programming.
●​ Formalizing user-defined types, which made it easier for programmers to model
real-world problems with custom, precise types.

In practice, Pascal's strong typing was strict enough to prevent common errors while being
simple enough to teach beginners. This balance contributed to its lasting reputation as a
"strongly typed" language.

Conclusion

Pascal didn’t "introduce" strong typing but played a crucial role in popularizing and advancing
the practical application of strong typing. Its influence can still be seen in languages like
Ada, Modula, and even modern languages like Python (which enforces types dynamically) and
Rust (which enforces types strictly and safely at compile time). Would you like to discuss other
historical aspects of strong typing or Pascal’s legacy in programming?

Niklaus Wirth... he passed away last January... He's a programming


language legend! Do you know what he did for computer science?
That's right. Niklaus Wirth, a pioneering computer scientist, made profound contributions to
programming languages, software engineering, and education. Born in 1934 in Switzerland, he
is best known for designing influential programming languages such as Pascal, Modula, and
Oberon. Wirth was awarded the Turing Award in 1984 for his groundbreaking work in
programming languages and algorithms. Wirth passed away on January 1, 2024, at the age of
89, leaving a legacy that continues to influence computer science and programming education​

258
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Niklaus Wirth made numerous contributions to computer science that have had a lasting impact
on the field. Here is a summary of his key contributions:

1.​ Programming Languages:


○​ Pascal (1970): A language designed for teaching structured programming and
widely adopted in academia and industry.
○​ Modula and Modula-2 (1975, 1978): Built on the modular programming
paradigm, influential in systems programming.
○​ Oberon (1987): Introduced simplicity and efficiency in programming,
accompanied by an operating system also named Oberon.
○​ Earlier designs: Euler, PL360, and Algol-W, which expanded on and improved
existing paradigms of the time.
2.​ Software Engineering:
○​ Pioneered Stepwise Refinement, a top-down approach to software
development, through his seminal 1971 paper Program Development by
Stepwise Refinement. This methodology is foundational in software engineering.
3.​ Education:
○​ Authored influential textbooks, such as Algorithms + Data Structures = Programs,
which combined theoretical and practical aspects of programming. His books are
renowned for their clarity and integration of core concepts.
4.​ Operating Systems:
○​ Co-developed the Oberon Operating System, an innovative system designed
with efficiency and simplicity in mind. It introduced a tightly coupled programming
environment with a minimalistic philosophy.
5.​ Hardware Design:
○​ Designed Lola, a tool for digital circuit design and simulation, bridging the gap
between software and hardware development.
6.​ Thought Leadership:

259
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Formulated Wirth's Law, which states, "Software is getting slower more rapidly
than hardware becomes faster," advocating for lean and efficient software design.
7.​ Recognition and Impact:
○​ Won the Turing Award in 1984 for his contributions to programming languages
and algorithms.
○​ His work influenced generations of programmers and shaped modern software
engineering and language design.

Wirth’s contributions span education, research, and practice, making him one of the most
influential figures in the history of computer science.

260
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Type Checking
Code: https://github.com/pronesto/DCC053_Public/tree/main/9_TypeChecking

In the last class we talked about dynamic and static typing. Do you
remember what that is?
In programming languages, static and dynamic type systems refer to how and when types are
checked in a program.

Static Type Systems

In a static type system, types are checked at compile time, before the program runs. The type
of every variable, function, or expression is known and fixed before the program executes. This
system aims to catch type errors early, preventing type-related bugs before execution.

●​ Examples: Java, C, C++, Rust


●​ Advantages:
○​ Type errors are caught early during compilation, improving reliability.
○​ Better optimization opportunities since the compiler knows types in advance.
●​ Disadvantages:
○​ Code can be more rigid since you must specify types upfront.
○​ Requires explicit type declarations (though some statically typed languages like
Haskell or Rust use type inference to deduce types automatically).

Dynamic Type Systems

In a dynamic type system, types are checked at runtime. Variables do not have fixed types,
and types can change as the program runs. This system is more flexible but shifts the
responsibility of type safety to runtime.

●​ Examples: Python, JavaScript, Ruby


●​ Advantages:
○​ More flexible and easier to write in certain situations, especially during rapid
prototyping.
○​ You don’t need to specify types upfront, making the code shorter and simpler.
●​ Disadvantages:
○​ Type errors are only caught at runtime, potentially leading to runtime crashes.
○​ Typically less performant since type checks are done during execution.

To summarize:

●​ Static typing ensures type correctness at compile time but may require more upfront
type information.

261
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Dynamic typing offers flexibility by allowing type changes at runtime but increases the
risk of runtime errors.

It seems that there are more valid programs in a dynamically typed


language. Does that make sense?
Well, more or less. In both cases, we have an infinite number of valid programs, assuming a
recursive grammar. Yet, that makes some sense :) Your assumption captures an interesting
aspect of dynamic typing. With dynamic typing, there are fewer constraints on the types of
expressions, so more programs are considered valid compared to statically typed languages
where type-checking happens at compile-time.

For instance, a program like "3.14 if True else [1, 2, 3]" is valid in a dynamically
typed language because type compatibility is checked at runtime. This means that as long as
the code can be executed (e.g., as part of a control flow statement), it is considered valid. In
contrast, a statically typed language would require the program to pass a type-checking phase
before execution, rejecting programs that don't meet strict type requirements.

So, if you imagine fixing a length for all programs and then checking how many of those
programs are "valid," the dynamic typing system would indeed have a higher density of valid
programs because it imposes fewer constraints. Static typing reduces the set of valid programs
by enforcing stricter rules at the type level, making certain programs invalid that would run in a
dynamically typed setting.

This comparison holds when you define the "density" in terms of valid programs out of all
possible programs of a given length, as dynamic typing allows for more flexibility and fewer
rejections.

Our interpreter was dynamically typed, right?


The interpreter we implemented in the last class using the visitor pattern resembles a
dynamically typed programming language because it checks types at runtime rather than at
compile-time. In dynamically typed languages, values have types, but variables do not;
operations on values are type-checked when executed. Similarly, our interpreter evaluates
expressions and verifies types during the execution of each operation, allowing for flexibility but
introducing the possibility of type errors at runtime if incompatible types are encountered. This
runtime type checking mirrors the behavior of dynamically typed languages like Python or
JavaScript.

Could you show me an example of a program that illustrates at least one


advantage of dynamic typing?
A good way to illustrate the advantage of dynamic typing is through a simple example where the
flexibility of not specifying types upfront allows you to write more concise and adaptable code.
Let's look at an example in Python, a dynamically typed language, where we can easily define

262
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

a function that works with multiple types without needing to define different types or overload
functions.

Example: Dynamic Typing in Python


def add(a, b):
return a + b

# This function works with integers


print(add(2, 3)) # Output: 5

# It also works with floating-point numbers


print(add(2.5, 3.7)) # Output: 6.2

# And even strings (concatenation)


print(add("Hello, ", "World!")) # Output: Hello, World!

# Or lists (concatenation)
print(add([1, 2, 3], [4, 5])) # Output: [1, 2, 3, 4, 5]

Explanation:

1.​ Flexibility: The add() function works for integers, floats, strings, and lists. Python’s
dynamic typing allows the same function to handle different types without modification. In
a statically typed language, you would either:
○​ Have to write multiple versions of the function (e.g., using function overloading).
○​ Use more complex constructs such as generics or templates to achieve similar
behavior.
2.​ Concise Code: You don’t need to specify the types of a and b. Python automatically
handles the correct operation based on the types passed in at runtime.

How this is advantageous:

●​ Rapid Prototyping: You can quickly write a generic function that works across various
types without worrying about type annotations or specific implementations.
●​ Code Simplicity: No need to declare types explicitly, making the code cleaner and more
focused on functionality rather than type constraints.

This is one of the core strengths of dynamically typed languages, especially in scenarios where
flexibility and rapid development are prioritized over the strict type safety provided by static
typing.

But static typing has its advantages too, right?


Absolutely! Static typing comes with several important advantages, especially in larger, more
complex systems. Here are some key benefits:

1. Early Error Detection

263
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In statically typed languages, type errors are caught at compile time, before the program runs.
This prevents many runtime errors related to type mismatches.

●​ Example: In a statically typed language like Java, if you try to add a string to a number
without explicitly converting one of them, the compiler will throw an error. This prevents
potential crashes at runtime, making the system more reliable.

// Java (statically typed language)


public class Main {
public static void main(String[] args) {
int a = 5;
String b = "hello";
// This will not compile and will catch the error early
int c = a + b;
}
}

2. Performance Optimization

Statically typed languages allow compilers to optimize code more efficiently. Since the types of
all variables and functions are known ahead of time, the compiler can generate optimized
machine code for those types.

●​ Example: C++ and Rust are known for their high performance partly because they are
statically typed, allowing the compiler to produce highly efficient executables with
minimal runtime overhead.

3. Better Tooling and Autocompletion

IDEs and editors can provide better autocomplete suggestions, refactoring tools, and code
navigation (e.g., "jump to definition") in statically typed languages, as the type information is
available during development.

●​ Example: In statically typed languages like Java or C#, when you start typing the name
of a variable, method, or class, the IDE can automatically suggest options that match the
correct types, leading to faster and more accurate coding.

4. Improved Documentation and Readability

Static typing acts as self-documentation. By looking at the function signature or variable


declaration, you immediately know the types of inputs and outputs, making the code easier to
read and understand.

●​ Example:

// Java function with explicit types


public int add(int a, int b) {
return a + b;

264
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

}
●​ In this case, anyone reading the code knows that add() takes two integers and returns
an integer. There is no ambiguity about the expected types.

5. Refactoring Safety

Refactoring (changing existing code) is safer in statically typed languages because the compiler
will catch any inconsistencies in types across the codebase. When you modify a function's
signature or change the type of a variable, all incorrect uses will be flagged during compilation.

6. Predictable Behavior

Since types are enforced and known at compile time, you can have more confidence that your
code behaves as expected, reducing unexpected type coercion or runtime errors from type
mismatches.

That's a lot of stuff. So, when should static typing be beneficial?


Static typing excels in situations where early error detection, performance, maintainability, and
tooling support are essential, whereas dynamic typing offers flexibility and simplicity, making it
suitable for rapid development and prototyping.
●​ Large Projects: As projects grow in size, static typing helps catch more errors during
development, making it easier to maintain and scale the code.
●​ Performance-Critical Applications: Systems that require high performance (e.g.,
embedded systems, games, financial applications) often benefit from the optimizations
possible in statically typed languages.
●​ Long-Term Maintenance: Code that's maintained over long periods benefits from the
explicit types, as they act as documentation and help ensure correctness.

But, how can static typing work? Knowing if a variable will ever be used as
an integer should be undecidable, as per Rice's Theorem, isn't it?
While Rice's Theorem proves that we can't decide all non-trivial properties of programs (like
whether a variable is always used as an integer), static typing works by:

1.​ Limiting the scope of the problem: It checks the structure of the code and uses local
information rather than trying to analyze all possible executions.
2.​ Conservatively approximating the type behavior of the program, potentially rejecting
valid programs for the sake of soundness.
3.​ Imposing restrictions that make type checking decidable, avoiding the general case
where undecidability applies.

By focusing on syntactic structure and conservative type rules, static typing provides a
decidable, sound system that catches many common errors at compile time, without needing to
solve the general undecidability of arbitrary program properties.

265
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, if Rice's Theorem still applies, then there are programs that are type
safe, but that we would never know, right?
Yes, Rice's Theorem still applies, and the undecidability that it implies means there are
programs where it's impossible to know at compile time whether they are truly "type safe" in all
cases. However, static type systems bypass this issue by making conservative checks and not
analyzing certain complex or dynamic behaviors.

Can you give me an example of such a program?


To illustrate this, we can look at an example of a program that is type safe, but where
determining that it's type safe could involve a potentially undecidable problem. Let's use
Goldbach's Conjecture to demonstrate why it's difficult to demonstrate some properties in a
type system while also relating it to type safety.

Goldbach's Conjecture claims that every even natural number greater than 2 can be
expressed as the sum of two primes. This is a mathematical statement that hasn't been
proven or disproven yet, which makes it a great analogy for undecidable problems in computer
science. In this sense, consider the program below:

#include <stdio.h>
#include <stdbool.h>

// Hypothetical function that checks Goldbach's Conjecture


bool violates_goldbach(int n) {
// Here, we'd theoretically have some logic to check if 'n'
// violates Goldbach's conjecture. But since the conjecture
// is unproven, let's assume this function can never definitively
// return 'true' for all 'n'.
return false; // Placeholder for undecidable logic
}

int zero_is_int() {
int x = 0;

// Hypothetical check for an even number that violates Goldbach's Conjecture


for (int n = 4; n < 1000000; n += 2) {
if (violates_goldbach(n)) {
x = (int) "zero"; // Attempt to assign a string to an integer variable
break;
}
}

return x; // If Goldbach's conjecture holds, this returns 0 safely


}

int main() {
int result = zero_is_int();
printf("Result: %d\n", result);
return 0;
}

266
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What's Happening Here?

1.​ zero_is_int() starts with an integer x initialized to 0.


2.​ The function iterates over even numbers starting from 4, hypothetically checking if any of
them violate Goldbach's Conjecture using the violates_goldbach() function.
○​ If a violation is found, the function attempts to assign the string "zero" to the
integer variable x. This is clearly a type mismatch.
○​ If no violation is found (which aligns with the current assumption that Goldbach's
Conjecture holds), the program returns x = 0 safely, with no type errors.
3.​ The program is safe as long as Goldbach’s Conjecture holds, but if ever proven false,
it would introduce a type error at runtime due to the assignment of a string to an integer.

Why Is This Example Interesting?

●​ Type Safety Depends on an Unknown Condition: The assignment of "zero" to an


integer is only attempted if Goldbach’s Conjecture is violated. Since we don't know
whether any even number greater than 2 cannot be expressed as the sum of two primes,
it's undecidable whether the program is always type-safe.
○​ If the conjecture holds true, the type system will never see the type violation at
runtime.
○​ If the conjecture is false, there would be a runtime type error, which can't be
caught statically.
●​ Undecidability and Static Typing: In a statically typed language, the compiler can’t
determine whether x = "zero" will ever happen, because that would require solving
Goldbach’s Conjecture, which is currently beyond the reach of computation.
○​ The static type checker would reject this program unless you take extra
measures to handle this type mismatch in a way that the type system can reason
about.
○​ This illustrates how static type systems enforce safe programming by disallowing
assignments like x = "zero" before execution.

Generalizing to Type Systems

This example mirrors the problem of undecidable properties in type systems:

●​ Rice's Theorem tells us that non-trivial properties of programs (e.g., "will this program
ever assign a string to an integer?") are undecidable.
●​ In this case, determining whether x = "zero" will ever execute requires solving a
conjecture that hasn't been proven or disproven.
●​ The compiler can't know the behavior of all possible inputs (or mathematical truths)
ahead of time, so static type systems rely on conservative rules that catch these
mismatches before runtime, forcing the programmer to handle them explicitly.

267
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This example demonstrates how static type systems prevent undecidable runtime behaviors by
rejecting programs that violate type safety—even when that violation depends on an
undecidable property, like the truth of Goldbach’s Conjecture. The type system is inherently
conservative, preferring to prevent unsafe behavior (even if the condition under which it
happens is undecidable).

So, how can static typing work?


Rice's Theorem makes it impossible to generally decide non-trivial properties of programs, such
as whether a variable will always be used as an integer in all possible executions. However,
static typing is possible because it doesn't attempt to fully solve this undecidable problem in
the general sense. Instead, static type systems operate by using conservative
approximations and restrictive rules that can be checked at compile time. Here's how it
works:

1. Conservative Approximation

A static type system assigns types based on the structure of the program without needing to
evaluate it. The type checker will use the declared types (or inferred types in some cases) and
validate that operations between them are safe according to the type rules of the language.

●​ Static type checkers don't need to evaluate the full behavior of the program or explore
every possible execution path (which would indeed be undecidable).
●​ They instead use type annotations (or inferred types) to conservatively reason about
how values and expressions can be used.

For example, in Java, when you declare a variable of type int, the static type checker will
ensure that this variable is only used in ways compatible with integers throughout the code. If it
detects an attempt to use it in an incompatible way (such as adding it to a string without proper
conversion), it flags it as a type error.

2. Type Soundness

A sound type system guarantees that if the program type-checks successfully, certain errors
(like type mismatches) will not occur during runtime. However, to achieve this, type systems
impose restrictions that ensure correctness without attempting to solve every possible edge
case.

●​ The type system doesn't need to account for all possible runtime behaviors—just
whether the program, as written, adheres to its type rules.
●​ This ensures a safe subset of the program’s behavior can be guaranteed, while other
behaviors that the type system cannot handle may either result in compile-time errors or
be prohibited altogether.

3. Decidable Subset of Type Checking

268
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Static typing defines a subset of rules that are decidable, meaning that even if we can't predict
the full runtime behavior of the program (as per Rice's Theorem), we can still enforce rules on
the structure of the code.

●​ For instance, in statically typed languages like Java or C++, you can define that integers
are only added to other integers and arrays are indexed with integers, and the compiler
will ensure this is true based on how you’ve written the code.
●​ While this may limit some flexibility (since it can't reason about all possible runtime
conditions), it guarantees that, within its constraints, the program will be type-safe.

4. Partial Soundness and Type Annotations

To work around the limitations imposed by undecidability, many languages use explicit type
annotations (or allow type inference) to limit what kinds of values a variable can hold or
operations it can participate in.

●​ Explicit annotations: Programmers explicitly declare the types, which makes type
checking a local, syntactic property. The compiler does not need to analyze all possible
uses of a variable; it just ensures that every use matches the declared type.
●​ Type inference: Some languages (e.g., Haskell or Rust) use sophisticated inference
algorithms to deduce the type of expressions without explicit annotations, but these
systems still work within the boundaries of decidable, sound rules.

5. Imposed Restrictions

Static type systems also make conservative assumptions that can sometimes reject valid
programs. This happens because they are designed to err on the side of safety. Since we can't
decide all possible program behaviors, the type system will flag some programs as type errors
even though they would never fail at runtime. This trade-off is made to ensure the soundness of
the system.

●​ For example, in Java, if you declare a method to return an int, the compiler won't allow
it to return a string, even if that particular code path might never be executed.

How is static typing implemented then?


Type checking and type inference are two different mechanisms that can both be used to
implement static typing, but they function in distinct ways.

1. Type Checking

Type checking is the process where the compiler (or interpreter) verifies that the types of
variables, functions, and expressions are used consistently according to the rules defined in the
programming language.

269
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Explicit Types: In many statically typed languages, you explicitly declare the types of
variables and functions. The compiler then checks that these declared types are used
consistently throughout the program.
●​ Example: In languages like Java, C, or C++, you explicitly declare variable types:

int x = 10; // x is explicitly declared as an integer


x = "hello"; // Compile-time error: type mismatch

●​ Role of Type Checking: In this case, type checking ensures that the value assigned to
x matches its declared type (int in this example). Any violations (such as attempting to
assign a string to an integer variable) will result in compile-time errors.
●​ Proactive Error Checking: Since the types are declared up front, errors related to type
mismatches are caught early, often during compilation.

2. Type Inference

Type inference, on the other hand, allows the compiler to deduce the types of variables and
expressions based on the context in which they are used. In this system, you don't always have
to declare types explicitly; the compiler infers them for you.

●​ Implicit Types: In languages with type inference, you may not need to explicitly declare
variable types. Instead, the compiler looks at how a variable is used and assigns a type
accordingly.
●​ Example: In languages like Haskell, ML, or Rust, type inference is commonly used:

x = 10 -- Compiler infers x is an integer

●​ Role of Type Inference: Here, the compiler looks at the assignment x = 10 and knows
from the value 10 that x must be an integer. The type is inferred without the programmer
needing to specify it. The type checking still happens in the background, ensuring
consistency, but the programmer doesn’t need to write explicit types.
●​ Reduction of Boilerplate: Type inference reduces the need for repetitive type
declarations while still ensuring type safety. The programmer can write less code, but the
type system remains just as strict.

Differences Between Type Checking and Type Inference


Aspect Type Checking Type Inference

Type Types are explicitly declared by the Types are inferred automatically by
declaration programmer. the compiler.

Programmer Requires more effort to manually Less effort, as types are inferred
effort annotate types. without annotations.

270
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Common Java, C, C++, C# Haskell, Rust, OCaml, Swift, Python


languages (with type hints)

Flexibility Explicit but rigid. Flexible but maintains the same


level of type safety.

Error detection Errors are caught when declared Errors are caught when inferred
types are inconsistent. types are inconsistent.

Complementary Role in Static Typing

●​ Type checking and type inference are not mutually exclusive. In fact, many modern
languages with static typing use a combination of both.
○​ For example, Rust and Scala use type inference to reduce verbosity while
ensuring that every variable still has a well-defined type at compile time.
○​ In Java, some variation of type inference (forward only) has been introduced in
recent versions (e.g., with var in Java 10) to reduce the amount of boilerplate
code, though Java still performs strict type checking based on the inferred or
declared types.

Both type checking and type inference are mechanisms to implement static typing. Type
checking typically involves explicit type declarations, ensuring correctness via compile-time
verification. Type inference deduces types from the context, offering more flexibility without
sacrificing type safety. They complement each other in modern statically typed languages,
enabling a balance between strictness and convenience.

Is it possible to combine type checking and type inference?


Yes, it is indeed possible to combine type checking and type inference, and many modern
statically typed languages do exactly that. Combining these two approaches provides the best
of both worlds: the rigor of static typing (with compile-time guarantees of correctness) and
the convenience of less boilerplate code (where types are automatically inferred by the
compiler). When a language supports both type checking and type inference:

●​ Type inference allows the compiler to deduce types when they are not explicitly stated,
reducing the need for verbose type declarations.
●​ Type checking ensures that inferred types, along with any explicitly declared types, are
used consistently and correctly throughout the program.

Example: Combining Type Checking and Inference (Rust)

In Rust, for instance, type inference is used extensively, but the type checker verifies their
consistency. Let’s expand on the previous example:

fn main() {

271
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

let x = 10; // Type inference: x is inferred to be an integer (i32)


let y: f64 = 5.5; // Explicit type declaration: y is explicitly declared as f64

// Type inference in functions


let z = add_one(x); // Compiler infers z to be i32 since add_one returns i32
println!("x: {}, y: {}, z: {}", x, y, z);
}

// A function with explicit type annotations


fn add_one(num: i32) -> i32 {
num + 1
}

In this example:

1.​ x: The type of x is inferred to be i32 because it is assigned the integer literal 10, which
in Rust defaults to i32.
2.​ y: The type of y is explicitly declared as f64 (a floating-point number).
3.​ z: The type of z is inferred based on the return type of the add_one function, which
explicitly declares its parameter and return type as i32.

Even though some variables (like x and z) do not have explicit type annotations, type checking
ensures that all types are used consistently. For example, the type checker will prevent us from
accidentally assigning y (a f64) to x (an inferred i32).

That's a lot of info. Can you summarize why one benefits from combining
type checking and inference?
●​ Reducing Boilerplate: In many statically typed languages, requiring type annotations
everywhere can be cumbersome, especially for types that can easily be inferred from
context. Type inference reduces the need for these redundant annotations.
●​ Early Error Detection: Despite not needing explicit type annotations everywhere, the
compiler knows the type of every expression at compile-time due to inference. This
enables early error detection, just like in traditional static type checking.
●​ Flexibility and Convenience: Type inference offers flexibility for programmers, allowing
them to write cleaner, more concise code without losing the benefits of static typing.

For type checking to work, the user needs to have a way to specify types.
How is this done?
For type checking to work, the user needs to have a way to specify types in the program. This
can be done either explicitly (where the programmer manually declares the types) or implicitly
(through type inference, where the compiler deduces the types automatically), but in both cases,
the type system must have a way to assign and enforce types.

Here’s a more detailed breakdown:

272
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explicit Type Annotations

In many statically typed languages, the programmer explicitly specifies the types of variables,
function parameters, and return values. These declarations provide clear instructions for the
type checker to ensure that variables and expressions are used consistently according to their
declared types.

●​ Example (Java):

int x = 10; // Programmer explicitly declares x as an integer


String name = "John"; // Programmer explicitly declares name as a string

●​ In this case, the types (int, String, etc.) are explicitly stated, and the type checker can
verify at compile time that all operations involving these variables are type-safe.

Implicit Type Specification (Type Inference)

Even if types are not explicitly written, the language must have some mechanism (e.g., type
inference) to infer the types from the context of the code. The user may not need to write types
explicitly, but the compiler still deduces types behind the scenes, so the type system can
enforce type safety.

●​ Example (Rust):

let x = 10; // The compiler infers that x is of type i32 (integer)


let name = "John"; // The compiler infers that name is of type &str (string slice)

●​ Although the programmer didn’t explicitly state the types of x or name, the compiler
infers the types, and the type checker can then verify that the code uses these types
correctly.

How Types Can Be Specified

For type checking to function correctly, the type system must know the types of:

1.​ Variables: Whether explicitly declared or inferred, the compiler must know what type a
variable holds to perform type checking.
2.​ Function Signatures: The types of function parameters and return values are crucial for
ensuring that functions are called and used correctly. In some languages, function types
can be inferred from the body of the function.
3.​ Expressions and Operations: The type checker verifies that the types involved in
operations (e.g., arithmetic, comparisons, method calls) are compatible.

Example: Combining Explicit and Inferred Types

273
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In many modern languages, you can mix both explicit and implicit types:

●​ Explicit types for function signatures and complex logic where clarity is important.
●​ Inferred types for local variables where the type is obvious from context.

func add(a: Int, b: Int) -> Int { // Type annotations for parameters and return type
let sum = a + b // Type of local var sum is inferred as int
return sum
}
let result = add(a: 5, b: 3) // Type of result is inferred as Int

Here, the user specifies types explicitly for the function parameters (a, b) and return value
(Int), while relying on inferred types within the function body (the compiler infers that sum is
also an Int).

What Happens if No Types Are Specified?

If a language doesn't have a way to specify types (whether explicitly or implicitly), it would be a
dynamically typed language. In such cases, the language doesn't enforce type safety at
compile time. Instead, types are checked at runtime, meaning errors related to type mismatches
only occur when the program is running, not during compilation.

●​ Example (Python):

x = 10 # No type declaration; x is dynamically typed


x < "hello" # Error at runtime, when you use x in a way that expects an int

In dynamically typed languages, there’s no compile-time type checking, so the user doesn’t
have to (and typically cannot) specify types in a formal way. However, this can lead to runtime
errors when incompatible types are used.

Can you modify our language to include type annotations?


Yes. We can change the Let expression to include type annotations. The new rules would be as
follows:

274
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Notice that the only difference to our language of expressions thus far is the fact that let
bindings now have an extra parameter: the type of the bound variable:

Ok, but how is type checking implemented?


To demonstrate how that works, let's try writing a simple type checking system for our language
with booleans. We can use the following type checking rules:

275
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let us analyze each one of these rules in detail:

1.​ TAdd (Addition): It checks that both expressions e0 and e1 are of type int, and the
result of their addition is also int.
2.​ TLth (Less-than comparison): It ensures that both operands of the less-than
comparison are integers, and the result is a boolean.
3.​ TAnd (Logical AND): It checks that both expressions e0 and e1 are booleans, and the
result of their logical conjunction is also a boolean.
4.​ TIf (Conditional expression): The condition e0 must be a boolean, and both branches
(e1 and e2) must have the same type t, which becomes the type of the whole
conditional expression.
5.​ TLet (Let-binding): The expression e0 should have type t0, and within the scope of the
let, name is assigned type t0. The body of the let (expression e1) should have type
t1, and this becomes the overall type.
6.​ TVar (Variable typing): It states that if the variable name is in the context Σ and has type
t, then name can be used as type t.

Can you implement these rules as a Python Visitor? Which kind of tests
would you have for your implementation?
We can start with the tests. For instance, consider the tests below:

# Example Usage
env = {'x': int, 'y': bool}
checker = TypeChecker()

# Example expressions:
expr = Add(Var('x'), Num(5)) # x + 5
print(expr.accept(checker, env)) # Should return int

# Example expressions:
expr = And(Var('y'), Bln(True)) # y and True

276
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

print(expr.accept(checker, env)) # Should return bool

expr2 = IfThenElse(Var('y'), Num(1), Num(0)) # if y then 1 else 0


print(expr2.accept(checker, env)) # Should return int

expr3 = Let('z', int, Num(3), Add(Var('z'), Num(2))) # let z = 3 in z + 2


print(expr3.accept(checker, env)) # Should return int

Can you implement a visitor that passes the tests?


To implement the type checking rules using the visitor pattern in Python, we can follow the
structure of the typing rules you've shown. We'll create a TypeChecker class that implements
the visitor for each expression type (Var, Num, Bln, binary expressions like Add, Lth, And,
etc.), and for constructs like Let and IfThenElse.

We'll assume a context (env) which maps variable names to types (similar to the Σ in your
typing rules).

Here's the Python implementation of the TypeChecker:

class TypeChecker:
def visit_var(self, var_exp, env):
# Look up the variable's type in the environment
if var_exp.identifier in env:
return env[var_exp.identifier]
else:
raise TypeError(f"Variable {var_exp.identifier} is not defined")

def visit_num(self, num_exp, env):


# Numbers are always of type int
return int

def visit_bln(self, bln_exp, env):


# Boolean literals are always of type bool
return bool

def visit_add(self, add_exp, env):


# Both operands must be of type int
left_type = add_exp.left.accept(self, env)
right_type = add_exp.right.accept(self, env)
if left_type == int and right_type == int:
return int
else:
raise TypeError(f"Type error: {left_type} + {right_type}")

def visit_and(self, and_exp, env):


# Both operands must be of type bool
left_type = and_exp.left.accept(self, env)
right_type = and_exp.right.accept(self, env)
if left_type == bool and right_type == bool:
return bool
else:

277
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

raise TypeError(f"Type error: {left_type} and {right_type}")

def visit_lth(self, lth_exp, env):


# Both operands must be of type int and result is of type bool
left_type = lth_exp.left.accept(self, env)
right_type = lth_exp.right.accept(self, env)
if left_type == int and right_type == int:
return bool
else:
raise TypeError(f"Type error: {left_type} < {right_type}")

def visit_let(self, let_exp, env):


# Type-check the definition and bind it to the environment
def_type = let_exp.exp_def.accept(self, env)
if def_type == let_exp.type_identifier:
# Temporarily extend the environment with the new variable
new_env = dict(env)
new_env[let_exp.identifier] = let_exp.type_identifier
# Restore the environment
return let_exp.exp_body.accept(self, new_env)
else:
epct_str = f"expected {let_exp.type_identifier}"
raise TypeError(f"Type error in let: {epct_str} but got {def_type}")

def visit_ifThenElse(self, if_exp, env):


# The condition must be of type bool, and both branches must have
# the same type
cond_type = if_exp.cond.accept(self, env)
if cond_type != bool:
raise TypeError(f"Type error: expected bool but got {cond_type}")
then_type = if_exp.e0.accept(self, env)
else_type = if_exp.e1.accept(self, env)
if then_type == else_type:
return then_type
else:
then_str = f"then branch is {then_type}"
else_str = f"else branch is {else_type}"
raise TypeError(f"Type error: {then_str}, {else_str}")

Explanation:

●​ TypeChecker: Implements the type checking logic using visitor methods for each
expression type.
●​ Environment (env): This is a dictionary that keeps track of variable bindings and their
associated types.
●​ Visitor Methods: Each visitor method checks the types of sub-expressions and ensures
that they conform to the expected types (e.g., addition requires two ints, if requires a
bool condition, etc.).

This setup should allow you to type-check the expressions in your language according to the
rules you provided.

278
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Evaluation is not too different from type checking. Can you see how similar
these visitors are?
The structure for program evaluation and type checking is indeed quite similar. Both
approaches:

1.​ Traverse the abstract syntax tree (AST) recursively using a visitor pattern.
2.​ Visit each node type (like Var, Num, Add, IfThenElse, etc.) to apply specific logic.
3.​ Rely on an environment (env) to track variable bindings, whether it's values (for
evaluation) or types (for type checking).

Similarities:

●​ AST Traversal: Both implementations walk the AST and process expressions in the
same order.
●​ Environment Handling: In both cases, variables are looked up in an environment
(env)—during evaluation, this contains variable values; during type checking, it contains
types.
●​ Binary Operations: For binary operations like Add or And, both implementations ensure
that the two operands conform to specific rules—either they must be integers/booleans
for evaluation or types must match for type checking.
●​ Conditionals: In IfThenElse, both visitors check the condition and then process the
branches, although the type checker ensures type consistency while the evaluator
computes the actual value.

279
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Some programs still run fine under the dynamic type verifier, but are ruled
out by the static type verifier. Can you give examples?
See this comparison below:

280
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the code below, the dynamic evaluation runs fine because the interpreter is focused on
whether the if condition is True and simply selects the appropriate branch without caring that
the branches have different types.

e = IfThenElse(Bln(True), Num(0), Bln(False))


dynamically_type_safe_eval(e) # 0

However, in this new example, the static checker fails because it enforces that both branches of
the conditional must have the same type, ruling this program out due to the mismatch between
the int and bool branches.

e = IfThenElse(Bln(True), Num(0), Bln(False))


statically_type_safe_eval(e)
# Type error: then branch is <class 'int'>, else branch is <class 'bool'>

This discrepancy occurs because static type checking aims to prevent type errors in any
execution path, while dynamic type checking only raises errors when a particular path is actually
followed. This is why some programs pass the dynamic check but fail the static one.
●​ Dynamic type checker: Evaluates expressions at runtime and checks types as needed.
If a branch is never executed, its type mismatch won't cause issues (e.g., IfThenElse
with mismatched types in different branches).
●​ Static type checker: Enforces type rules at compile time, meaning it checks both
branches of an IfThenElse even if one is never executed. This makes the static type
checker stricter, as it ensures type correctness for all possible execution paths, even if
some are never taken.

281
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, is that why you said that dynamic typing allows more programs to run
than static typing?
Yes, that is it! Dynamic typing allows more programs to run because it performs type checks at
runtime, evaluating only the code that is actually executed. This means that type mismatches in
parts of the code that aren't run do not cause issues, which can lead to greater flexibility.

For example, in a dynamically-typed language or checker, the following program would run fine
as long as the branch with the type error isn't executed:

if condition:
x = 5 # int
else:
x = "string" # str

If condition is True, the program never evaluates the second branch, so the type mismatch
doesn't matter. The dynamic checker only cares about the types when the values are actually
used.

On the other hand, static typing checks all possible branches of code at compile time,
regardless of which branches are executed. This ensures stricter guarantees about type
correctness, but it also means that some programs that are valid in a dynamic system would be
rejected by a static type checker. In the example above, the static type checker would reject the
program due to the type mismatch in the else branch, even if that branch is never executed.

Summary:

●​ Dynamic typing: More flexible and allows more programs to run because it only checks
types when they are needed at runtime. This can sometimes allow type inconsistencies
that never manifest in execution.
●​ Static typing: Stricter and catches more potential type errors at compile time. This
ensures type safety for all execution paths but also prevents some programs from
running due to type mismatches, even if those mismatches are not actually problematic
during execution.

Dynamic typing's flexibility comes at the cost of potentially discovering errors only at runtime,
whereas static typing's rigidity prevents certain kinds of programs from running at all.

What does it mean when we say that a type system is sound?


That means that the type system adheres to the properties of progress and preservation. These
two properties are essential to ensure that a program behaves as expected at runtime if it
successfully type-checks at compile-time.

Here’s a breakdown of each property:

282
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Progress: This property ensures that a well-typed program is never "stuck" — it can
always make progress in its computation. This means that, for any well-typed term
(piece of code), it is either a value (it represents a final, non-computable result) or it can
take a step forward in its evaluation (i.e., reduce to another term). This property
guarantees that, as long as a program type-checks, it won’t encounter runtime errors
due to unhandled operations (like trying to apply a function to something that isn’t a
function).
2.​ Preservation: Also known as the "type preservation" or "type safety" property,
preservation ensures that if a well-typed term evaluates to another term, the resulting
term is still well-typed and has the same type as the original. In other words, during
evaluation, types are preserved — types don’t change unexpectedly as the program
executes. This property makes sure that types stay consistent throughout the program’s
execution, so if a function returns an integer, it will keep returning an integer through
every evaluation step.

Together, progress and preservation form the foundation of a sound type system. They ensure
that type-checked programs can execute safely without certain kinds of runtime errors, providing
guarantees about both the program’s ability to continue running and its consistency in type
behavior. The figure below states these two theorems, together with examples of evaluation and
typing rules:

Is it possible to prove that a type system meets the Progress property?


Yes, that's a bit part of programming language theory! Usually, these proofs happen by
induction. For instance, for an if-then-else expression. Here’s how it goes:

283
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Goal (Progress Statement): The goal is to show that if Σ⊢e:t, then either e is a value or
there exists an v such that e→v. For the if-then-else expression, this means proving
that if the expression type-checks, it can always take a step forward in evaluation.
2.​ Inductive Hypothesis: Since this proof relies on induction, the dashed arrows labeled
"Induction" suggest applying the inductive hypothesis to sub-expressions:
○​ By induction, we assume that the sub-expressions e0, e1​, and e2 satisfy
progress. This means each one can either evaluate to a value or take a step
forward.
3.​ Case Analysis: The evaluation rules (IfTrue and IfFalse) for the if-then-else
construct imply that:
○​ If e0​evaluates to true, then the entire expression reduces to e1.
○​ If e0​evaluates to false, then it reduces to e2.
4.​ The proof diagram suggests that if we apply progress to e0​, we can show that it must
eventually evaluate to either true or false (since it’s a boolean). In either case, the
whole if-then-else expression can take a step forward by reducing to e1​or e2.
5.​ Conclusion: Once e0 is evaluated, the entire if-then-else expression can proceed
according to the IfTrue or IfFalse rules, achieving progress.

What's this proof technique called?


This type of proof is often called induction on the derivation tree or structural induction on typing
derivations.

284
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In this approach, you use the structure of the derivation tree, which is formed by applying typing
rules, to prove a property (like progress) for each expression that type-checks. Here’s a more
detailed breakdown of what this means in the context of your proof:

1.​ Derivation Tree: The derivation tree is constructed by applying typing rules to build up a
proof that a given expression has a certain type. For an if-then-else expression, the
tree would start with the Tif typing rule, which in turn relies on the types of the
sub-expressions e0, e1, and e2.
2.​ Induction on the Tree Structure: To prove the progress property, you proceed by
induction on the structure of this derivation tree. Each node in the tree corresponds to a
different typing rule, and you handle each rule as a separate case in the proof. For
example:
○​ If the derivation ends with the Tif rule, then you examine the sub-expressions
e0​, e1, and e2.
○​ You then apply the inductive hypothesis to these sub-expressions, assuming
progress holds for each one individually.
3.​ Using Typing Rules as Cases: Each rule applied in the derivation tree corresponds to a
case in the proof. By handling each rule and using induction on the derivation of
sub-expressions, you show that if the overall expression is well-typed, it can either take a
step in evaluation or is a value.

In summary, induction on the derivation tree is a method of proof where you use the structure of
the type derivation (proof tree) to inductively prove a property for all well-typed expressions. It’s
a standard technique in proving properties like progress and preservation in type systems.

What about preservation? What's the proof like?


Sure! See below a graphic representation of one of the cases for preservation:

285
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The diagram outlines a proof of preservation for the if-then-else expression. Here’s an
explanation:

The Goal of the Preservation Proof

Preservation is a property that says: if an expression e has a type t (written Σ⊢e:t), and e
evaluates to v (written e→v), then the resulting value v should also have the type t as well (i.e.,
Σ⊢v:t). This ensures that evaluation doesn’t “break” the type of an expression—any steps taken
by well-typed expressions produce new expressions of the same type.

Proof Structure

To prove preservation, we do induction on the derivation tree of the typing judgment. The proof
involves a case analysis on the last typing rule used in the derivation. In this case, we’re
focusing on the if-then-else expression, specifically the typing rule Tif.

Step-by-Step Breakdown of Your Diagram

1.​ Assumption (Top Left):


○​ We start with the assumption that we have a well-typed if-then-else
expression: Σ⊢if e0 then e1 else e2:t.
○​ This assumption comes from the typing rule Tif, which tells us that:
■​ e0 has type bool (i.e., Σ⊢e0:bool),
■​ e1 has type t (i.e., Σ⊢e1:t), and
■​ e2​also has type t (i.e., Σ⊢e2:t).
2.​ Case Analysis on Evaluation (Right Side):
○​ The preservation proof now considers how the if-then-else expression might
take a step in evaluation.
○​ There are two main cases based on the possible values of e0:
■​ Case 1: e0 evaluates to true. In this case, the if-then-else
expression reduces to e1.
■​ Case 2: e0 evaluates to false. In this case, the if-then-else
expression reduces to e2.
3.​ Induction Application (Dashed Arrows):
○​ We apply the induction hypothesis to e1 and e2, assuming that if e1 or e2 can
take a step in evaluation, they will preserve their types.
○​ For Case 1: If e0→true, the expression reduces to e1​, and by the induction
hypothesis, since Σ⊢e1:t, we conclude that the resulting expression e1 has type t
as required.
○​ For Case 2: If e0→false, the expression reduces to e2, and similarly, since
Σ⊢e2:t has type t as required.
4.​ Conclusion (Bottom):
○​ In both cases, the result of evaluating if-then-else has the same type t.

286
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Therefore, we conclude that if a well-typed if-then-else expression takes a


step in evaluation, the resulting expression is still well-typed with the same type t.

Summary

The diagram walks through a preservation proof for the if-then-else construct by:

●​ Starting with the typing rule (Tif) for if-then-else.


●​ Analyzing the two possible evaluation outcomes based on e0e_0e0​evaluating to true
or false.
●​ Applying the induction hypothesis to show that each outcome preserves the type.

This approach confirms that well-typed if-then-else expressions maintain their type after
evaluation, satisfying preservation. Great work on structuring this!

So, is it correct to say that strongly typed language is a language whose


type system meets progress and preservation?
Well, if you ask two different people what strong and weak type systems mean, you probably get
two different answers :) So, whereas safety (soundness) is a well-defined term, the distinction
between strong and weak typing is not so much. The terms "strongly typed" and "weakly typed"
are indeed quite subjective and often lack precise definitions. People use them to express
different intuitions about type strictness or implicit conversions for instance, but they don’t have
a formal grounding in type theory like soundness or safety do.

In contrast, type safety (which often implies soundness, i.e., progress and preservation) has a
clear, formal meaning. A type-safe language prevents certain kinds of runtime errors by
ensuring that programs respect their type annotations throughout execution. This is a property
that can be precisely stated and proven, making it a more useful term in formal discussions
about type systems.

So, is it true that C does not meet these properties of progress and
preservation?
Yes, it’s generally true that C does not meet either the progress or preservation properties in
the same way that type-safe languages do. Here’s why:

Progress

●​ Progress says that a well-typed program is either a value or can take a step in
evaluation without getting "stuck" (i.e., encountering an error that halts execution
unexpectedly).

287
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ In C, a program can be well-typed (according to C’s type system) but still get "stuck" at
runtime due to issues like null pointer dereferencing, buffer overflows, uninitialized
variables, or out-of-bounds memory access. These issues often lead to undefined
behavior, crashes, or segmentation faults, which violate the progress property.

Preservation

●​ Preservation means that if a program is well-typed and it takes a step in evaluation, the
result is still well-typed.
●​ C does not strictly maintain preservation because of unsafe type casts and pointer
arithmetic. For example, a void* pointer can be cast to any other type of pointer, and
C’s type system will not prevent you from treating incompatible data as if it were a
different type. This can lead to type inconsistencies during execution, which violate
preservation.

Why C Doesn’t Aim for Soundness

●​ C was designed with a focus on performance and low-level memory control rather than
type safety. This means that the language allows many operations that sacrifice safety
(and soundness) in favor of giving the programmer more direct control over the
hardware.
●​ These design choices allow C to be extremely powerful and efficient, but they also mean
that the language doesn’t enforce the same guarantees about types and memory safety
that sound type systems do.

In summary, C’s type system is intentionally permissive and does not enforce progress or
preservation in the way that a sound type system does. This is why C is often described as an
unsafe language—its type system doesn’t prevent certain kinds of runtime errors that sound
type systems are designed to avoid.

And are there languages whose type system has been formally proven to
be sound?
In certain languages like Standard ML (specifically, implementations like SML/NJ) and Haskell,
the type system has been formally proven to satisfy progress and preservation. These
properties make these languages type-safe and provide strong guarantees that well-typed
programs will not encounter certain runtime errors.

Here’s why languages like Standard ML have these properties:

Formal Proof of Type Safety

Researchers have formally defined the semantics of languages like Standard ML and proved
the type safety theorems for their type systems. This involves constructing formal models that

288
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

capture the language's syntax, type rules, and evaluation rules. Using these models, they can
prove that if a program type-checks, it will always satisfy progress and preservation:

1.​ Progress: A well-typed Standard ML program will always either be a value (i.e., fully
evaluated) or be able to take a well-defined step in its execution without getting "stuck."
2.​ Preservation: During evaluation, a well-typed Standard ML program maintains its type
correctness. Each step of evaluation preserves the types, ensuring that the program
remains well-typed throughout its execution.

Why Some Languages Aim for Type Safety

Languages like Standard ML and Haskell were designed with type safety as a core goal. Their
type systems are expressive but restrictive enough to prevent unsafe operations, like arbitrary
pointer manipulation or unchecked casting. This is one of the reasons these languages can
guarantee certain kinds of safety at runtime.

Benefits of Proven Type Safety

●​ Reliability: These guarantees mean that many classes of errors (such as null pointer
dereferences or type mismatches) simply cannot occur in well-typed programs.
●​ Optimization: Because of these strict guarantees, compilers for type-safe languages
can make certain optimizations that wouldn’t be safe in languages with weaker type
systems.
●​ High-Assurance Systems: In areas where reliability is critical (e.g., in financial software
or formal verification), languages with proven type safety can offer strong assurances to
developers and users alike.

Examples of Type-Safe Languages

In addition to Standard ML (SML/NJ) and Haskell, other languages like Coq, Idris, F#, and
OCaml also have robust type systems that support various forms of type safety, often extending
these guarantees through additional features like dependent types (in the case of Coq and
Idris).

Summary

Yes, it’s true that certain languages—especially those rooted in formal logic and type
theory—have been shown to meet the progress and preservation properties through formal
proofs. Standard ML and its implementation SML/NJ are classic examples, and their formally
proven type safety is a significant reason for their reliability in handling type-related errors.

289
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How do people prove these properties formally?


Proving properties like progress and preservation for a type system requires formal reasoning,
and Coq (along with similar proof assistants like Agda, Lean, and Isabelle/HOL) is a powerful
tool for this.

How Formal Proofs Work

The process involves creating a formal model of the programming language. This model
includes:

1.​ Syntax: Defining the structure of expressions, types, and terms.


2.​ Typing Rules: Defining how types are assigned to expressions (e.g., a rule that says if
e0 then e1 else e2 is well-typed if e0 has type bool and e1 and e2 have the same
type).
3.​ Operational Semantics: Defining the evaluation rules, which describe how expressions
reduce or step to other expressions.
4.​ Properties: Formally stating properties like progress and preservation that we aim to
prove.

Once the language is defined, we use induction and structural reasoning on the syntax and
evaluation rules to prove properties like progress and preservation. This can be very tedious by
hand, so proof assistants help by checking each logical step and ensuring there are no
mistakes.

How Coq Helps

Coq is particularly useful because it allows you to:

●​ Define the syntax and typing rules: Coq has a built-in language to define types and
functions, so you can model the type system and evaluation rules of a language
precisely.
●​ Express formal properties: You can formally state the properties you want to prove
(e.g., "for all expressions e of type t, either e is a value or it can take a step").
●​ Automate proofs with tactics: Coq provides tactics (proof strategies) that help
automate some parts of the proof, especially those that involve routine reasoning or
induction.
●​ Ensure correctness: Coq’s type-checking ensures that only correct proofs are
accepted. If there’s an error or a gap in the reasoning, Coq will reject the proof.

Example of Proving Progress and Preservation

In a proof of progress, you might:

1.​ Use induction on the structure of the derivation tree for Σ ⊢ e : t.

290
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Show that each possible form of e can either take a step or is already a value,
depending on the evaluation rules for that expression.

For preservation, you might:

1.​ Use induction on the evaluation step e → e'.


2.​ Show that if Σ ⊢ e : t, and e → e', then Σ ⊢ e' : t.

Proofs like this are broken down into cases, and Coq can verify each case individually.

Example Project: Software Foundations

A well-known example of this approach is the Software Foundations series, which includes
formal proofs of type safety for a small functional language. The proofs in this book are
developed entirely in Coq, and they cover progress and preservation among other properties.
This resource is widely used for learning how to formalize and prove properties of programming
languages using Coq.

Why This Matters

Using Coq to prove properties of type systems and programming languages is beneficial for:

●​ Type-Safe Language Design: Ensuring that a new or experimental language will


behave predictably and prevent certain kinds of errors.
●​ Compiler Verification: Proving that a compiler preserves types across transformations,
which is crucial for safety-critical applications.
●​ Formal Verification of Programs: Beyond languages, Coq can be used to formally
verify algorithms and even full programs, giving strong guarantees about their behavior.

In short, Coq provides a framework where you can model a language precisely, express formal
properties, and rigorously prove those properties with machine-checked accuracy.

Do you know how this notion of "Soundness" of a type system ended up


associated with these two notions of progress and preservation?
The terminology and framework of progress and preservation as central components of type
soundness originated in a formalized way with Andrew K. Wright and Matthias Felleisen in
their landmark paper, A Syntactic Approach to Type Soundness (1992). This paper is often
credited with popularizing the structured proof technique that ties type soundness to progress
and preservation.

Historical Background

1.​ Pre-Wright & Felleisen Era (Pre-1992):

291
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The foundational ideas behind type soundness and the relationship between
types and program behavior can be traced back to early studies of typed lambda
calculi, such as Alonzo Church’s work in the 1930s on the simply typed lambda
calculus.
○​ Later, logicians like Haskell Curry and William Howard (of the Curry-Howard
correspondence fame) explored the correspondence between types, proofs, and
program correctness.
○​ Robin Milner’s work in the 1970s on polymorphic type systems (notably in ML)
also emphasized the role of types in preventing runtime errors, laying
groundwork for the notion of type safety.
2.​ Wright & Felleisen (1992):
○​ Wright and Felleisen’s syntactic approach departed from earlier, more
semantic-based techniques for proving type soundness (such as those using
denotational semantics).
○​ They introduced the progress and preservation theorems as distinct, separable
components of proving soundness.
■​ Progress handles the dynamic behavior of terms, ensuring evaluation
doesn't get stuck.
■​ Preservation formalizes the static guarantees of types through
evaluation.
○​ The key insight in their paper was to show that type soundness could be reduced
to these two components in a way that was simple, modular, and accessible.
3.​ Impact on the Field:
○​ Wright and Felleisen’s paper is one of the most cited works in programming
languages because it provided a unified proof methodology that could be
applied across a wide variety of type systems, from simple typed lambda calculi
to more complex type systems (e.g., polymorphism, subtyping).
○​ Their framework became a cornerstone for the study of type theory in
programming languages and influenced numerous textbooks, including Pierce's
"Types and Programming Languages", which solidified progress and
preservation as the standard terminology.

Why Progress and Preservation?

●​ Wright and Felleisen likely chose these terms to align with the conceptual essence of
what the proofs represent:
○​ Progress captures the forward motion of evaluation (i.e., programs keep
"progressing" towards completion).
○​ Preservation emphasizes the idea of maintaining type-correctness (i.e., type
invariants are "preserved" through execution).

Before their formalization, there wasn't a canonical terminology or proof framework for type
soundness, though similar ideas were implicit in earlier works on operational semantics and
type systems.

292
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Influence and Modern Usage

The success of Wright and Felleisen’s approach lies in its syntactic nature:

●​ It is well-suited to the operational semantics of modern programming languages.


●​ It has since been extended to more advanced type systems, including dependently
typed languages, effect systems, and gradual typing.

This Felleisen guy did quite a lot for computer science, especially for
programming languages, right?
Oh, yes! Matthias Felleisen is a highly influential figure in computer science, particularly in the
fields of programming languages, software engineering, and education. His work spans a wide
range of topics, from foundational research in programming language theory to the development
of tools and methodologies that have had a practical impact on how programming is taught and
practiced.

Born in Lübeck, Germany, in 1958, he completed his undergraduate studies in computer


science in Germany before earning a Ph.D. from Indiana University in 1987 under Daniel P.
Friedman. Currently a professor at Northeastern University, Felleisen is known for his pioneering
research in programming language theory, including the syntactic approach to type soundness,
operational semantics, and continuations.

Key Contributions

1.​ Theoretical Foundations of Programming Languages:


○​ Syntactic Approach to Type Soundness: His collaboration with Andrew K.
Wright (1992) introduced the widely adopted framework of progress and
preservation for proving type soundness.

293
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Operational Semantics: Felleisen developed techniques for reasoning about


programs based on operational semantics, making significant advances in
understanding programming language behavior.
2.​ Programming Language Design:
○​ Racket (formerly PLT Scheme): Felleisen was the driving force behind the
creation of Racket, a descendant of Scheme, which serves as both a practical
programming language and a platform for designing new languages. Racket is
notable for its extensibility and is widely used in education and research.
○​ He also contributed to the development of tools for language-oriented
programming, allowing programmers to create domain-specific languages.
3.​ Program Verification and Contracts:
○​ Felleisen has made foundational contributions to program contracts (lightweight
formal methods for runtime correctness checks), which have influenced how
software reliability is approached.
4.​ Educational Contributions:
○​ How to Design Programs (HtDP): Felleisen co-authored this groundbreaking
textbook, which introduced a structured and systematic approach to teaching
programming to beginners. The book emphasizes design principles and
problem-solving strategies over rote syntax, reshaping programming education.
○​ TeachScheme! Project: He spearheaded the TeachScheme! initiative, a
curriculum that uses functional programming as a foundation for teaching
computational thinking in high schools and universities.
5.​ Reduction and Continuations:
○​ Felleisen’s work on continuations has been highly influential in understanding
control flow in programming languages. His contributions to reduction
semantics provided a framework for reasoning about program transformations
and optimizations.
6.​ Contributions to Research on Effects:
○​ He studied how programming languages can manage side effects, such as
mutable state, in a principled way. This research has influenced modern
language designs.
7.​ Awards and Recognitions:
○​ ACM Fellow: Felleisen was named an ACM Fellow for his contributions to
programming languages and education.
○​ SIGPLAN Distinguished Achievement Award (2012): Recognized for his
lasting contributions to the field of programming languages.
○​ Numerous other awards for his educational initiatives and academic
contributions.

Legacy

Matthias Felleisen is celebrated for:

294
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Bridging the gap between theoretical computer science and practical software
development.
●​ His ability to innovate in both theory and practice, making his work accessible and
impactful to students, educators, researchers, and practitioners.
●​ His commitment to teaching and mentoring, which has inspired generations of
computer scientists.

295
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Type Inference
Code: https://github.com/pronesto/DCC053_Public/tree/main/10_TypeInference

In SML/NJ, we don't write types. Does it use the same type system as
Python?
In SML/NJ (Standard ML of New Jersey), type inference is used, which is why you don’t need
to write type annotations, yet the language is still statically typed. This means that types are
determined at compile time, without requiring you to explicitly annotate every variable or
function.

How SML/NJ works:

●​ Type inference automatically deduces the types of variables, functions, and expressions
during compilation.
●​ The compiler analyzes how you use each value in the program to infer its type and
ensures type correctness without you specifying it.

How is this different from Python?

●​ Python uses dynamic typing. In Python, types are associated with values at runtime,
and you can freely change the type of a variable by assigning it a different value.
●​ In contrast, SML/NJ is statically typed through type inference, which means that types
are determined and checked at compile time. Once a type is inferred, it’s fixed for that
variable or function throughout its scope.

Example in SML/NJ:
fun add(x, y) = x + y;

●​ You don't need to specify that x and y are integers.


●​ SML/NJ will infer that x and y must be of type int because the + operator in SML only
works on integers.
●​ If you try to call add with non-integers, the compiler will catch it before the program
runs.

In summary:

●​ SML/NJ uses type inference to automatically deduce types but still enforces static
typing (types are fixed at compile time).
●​ Python is dynamically typed, meaning types are determined at runtime and can change
throughout the program’s execution.

296
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So yes, SML/NJ is not like Python in this sense—it's statically typed, but thanks to type
inference, you don't need to write explicit type annotations!

Can you give me an example that shows the difference between SML/NJ
and Python?
Let’s use a conditional expression example to highlight the difference between Python (which
allows dynamic typing) and SML/NJ (which enforces static typing with type inference).

Python Example (Valid):


def dynamic_example(cond):
if cond:
return 10 # Integer
else:
return "ten" # String

print(dynamic_example(True)) # Outputs: 10
print(dynamic_example(False)) # Outputs: ten

In Python:

●​ The dynamic_example function returns either an integer (10) or a string ("ten")


depending on the condition cond.
●​ Python allows this because it doesn’t enforce a single type for the return value; the type
is determined at runtime.

SML/NJ Example (Invalid):


fun static_example(cond) =
if cond then 10 (* Integer *)
else "ten"; (* String *)

In SML/NJ:

●​ This code would fail to compile. Why? Because SML/NJ enforces that both branches of
the conditional must return the same type.
●​ Here, the if branch returns an int (10), while the else branch returns a string
("ten"), leading to a type mismatch error.
●​ SML/NJ performs static type checking and will not allow this mixing of types.

Error Message in SML/NJ:

If you tried to compile the SML/NJ code above, you might see an error like:

297
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Error: types of then and else branches do not agree [int tycon mismatch]
then branch: int
else branch: string

Key Difference:

●​ Python allows the return type to be dynamically determined at runtime. Each function
call can return different types, and the language doesn't enforce consistency.
●​ SML/NJ requires that the types be consistent at compile time, ensuring that both the
then and else branches return the same type (either both int or both string).

This is a demo of how dynamic typing in Python allows more flexibility, while static typing
with type inference in SML/NJ enforces stricter type rules, even though you don't need to
explicitly write the types!

So, how is type inference different from type checking?

Type Checking:

●​ What it does: Verifies that the types of variables and expressions in the program are
correct according to a set of rules.
●​ When it happens: Can be done at compile time (static type checking) or runtime
(dynamic type checking).
●​ Input: Requires explicit type annotations from the programmer or runtime values.
●​ Example:

def add(x: int, y: int) -> int:


return x + y

●​ The type checker will ensure that x and y are integers before the program runs or while
it's running.

Type Inference:

●​ What it does: Automatically deduces the types of variables and expressions without
explicit type annotations.
●​ When it happens: Typically at compile time, but sometimes can occur during
interpretation (e.g., in dynamically-typed languages).
●​ Input: No need for explicit type annotations—types are inferred based on how the
variables are used.
●​ Example:

def add(x, y):


return x + y

298
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ In a language with type inference (e.g., Haskell or SML/NJ), the types of x and y would
automatically be inferred as integers based on how they are used (addition).

Key Difference:

●​ Type checking validates correctness given explicit types.


●​ Type inference figures out the types without needing explicit annotations, then checks
for correctness.

Is the auto keyword doing type inference in C++?


Well, in a way. The auto keyword in C++11 is indeed a form of type inference, albeit a very
simple one. It can be very useful, to deal with the complex syntax of templated types in C++. For
instance:

#include <iostream>
#include <fstream>
#include <string>
#include <map>

int main(int argc, char** argv) {


std::ifstream file("input.txt");
if (!file) {
std::cerr << "Error: Unable to open file.\n";
return 1;
}
std::map<std::string, int> frequency_map;
std::string word;
while (file >> word) {
++frequency_map[word];
}
if (argc > 1) {
for (const auto& [word, count] : frequency_map) {
std::cout << word << ": " << count << '\n';
}
} else {
for (const std::pair<const std::string, int>& entry : frequency_map) {
std::cout << entry.first << ": " << entry.second << '\n';
}
}
return 0;
}

When you use auto in C++, the compiler deduces the type of the variable based on the value
or expression used to initialize it. At first glance, that looks similar to how languages with type
inference, like SML/NJ or Haskell, automatically infer types without requiring explicit type
annotations from the programmer.

299
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Example:
auto x = 10; // x is deduced to be an int
auto y = 3.14; // y is deduced to be a double
auto z = "hello"; // z is deduced to be a const char*

In this example, C++ determines the type of x to be int, y to be double, and z to be const
char*, based on the values assigned to them. The programmer doesn’t need to explicitly write
out the type, and the compiler does the work of deducing it.

Key Points:

●​ Type inference in C++ with auto is local, meaning it works only within the context of
the specific variable initialization.
●​ It does not mean C++ has full-blown type inference like in some functional languages
(e.g., SML, Haskell), where types are inferred globally in functions or across the
program.
●​ Type safety is still guaranteed in C++ even with auto. Once the type is deduced, it's
treated exactly as if you had written it explicitly.

So, while auto in C++ gives some of the convenience of type inference, C++ remains a
statically-typed language where the type must be determinable at compile time.

So, what's the difference between auto in C++ and type inference in
SML/NJ?
The auto keyword in C++ and the type inference system in SML/NJ (Standard ML of New
Jersey) represent different approaches to type inference in programming languages. Here's a
comparison highlighting their differences:

Type Inference in C++ (auto Keyword)

●​ Scope and Usage: The auto keyword in C++ simplifies the type declaration for
variables. It allows the compiler to deduce the type based on the initializer. For example,
auto x = 42; makes x of type int because 42 is an int.
●​ Complexity: C++'s type inference with auto is relatively simple. It does not infer types
beyond what can be determined from a single initializer. The auto keyword is limited in
scope, mostly applied to variable declarations.
●​ Flexibility: While it reduces verbosity, auto does not provide deep or contextual type
inference. It relies on explicit initializers and does not support advanced features like
polymorphism or higher-order types.

Type Inference in SML/NJ

300
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Scope and Usage: SML/NJ uses a more sophisticated type inference system that can
infer types in a broader context, including function signatures and complex expressions.
This system allows for more advanced features like polymorphic types and higher-order
functions without explicit type annotations.
●​ Complexity: SML/NJ employs the Hindley-Milner type inference algorithm, which is a
powerful and general algorithm used in many functional languages. It can infer the most
general type of a given expression, even when multiple type variables and constraints
are involved.
●​ Flexibility: The Hindley-Milner algorithm supports type polymorphism, allowing functions
to be written generically and applied to different types without needing explicit type
annotations. This system can infer types in complex scenarios involving multiple
functions, type constraints, and higher-order functions.

Key Differences

●​ Scope: C++'s auto is limited to variable declarations with straightforward initialization.


SML/NJ’s type inference applies more broadly to function definitions, expressions, and
complex type constraints.
●​ Complexity: SML/NJ's type inference is more complex and capable, handling
polymorphism and higher-order functions. C++'s auto provides a simpler form of type
inference, primarily for convenience.
●​ Application: SML/NJ’s type inference allows for a higher degree of abstraction and
flexibility in functional programming, while C++’s auto focuses on reducing boilerplate
code for variable declarations.

In summary, SML/NJ’s type inference is a more comprehensive system that supports advanced
features and more complex type scenarios compared to the simpler auto keyword in C++.

Could you implement a visitor that propagates types, similarly to what auto
does in C++?
To implement a visitor that infers the type of all variables created through let bindings, you
need to enhance your current visitor pattern. This new visitor should be able to traverse the
expression tree and infer the types based on the expressions used in let bindings.

Here's how you can implement a VisitorTypeInfer class to achieve this:

1.​ Define Type Classes: First, define a basic set of types that you might expect, such as
INT, BOOL, and potentially more if needed.
2.​ Implement Type Inference Visitor: Create a VisitorTypeInfer class that will
traverse the expression tree and infer types.

Here's an implementation of the VisitorTypeInfer class:

301
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class VisitorTypePropagator:
def visit_var(self, var, env):
if var.identifier in env:
return env[var.identifier]
else:
raise TypeError(f"Variavel inexistente {var.identifier}")

def visit_num(self, num, env):


return int

def visit_bln(self, exp, env):


return bool

def visit_add(self, add, env):


left_type = add.left.accept(self, env)
right_type = add.right.accept(self, env)
if left_type == int and right_type == int:
return int
raise TypeError(f"Type error: {left_type} + {right_type}")

def visit_lth(self, exp, env):


left_type = exp.left.accept(self, env)
right_type = exp.right.accept(self, env)
if left_type == int and right_type == int:
return bool
raise TypeError(f"Type error: {left_type} < {right_type}")

def visit_and(self, add, env):


left_type = add.left.accept(self, env)
right_type = add.right.accept(self, env)
if left_type == bool and right_type == bool:
return bool
raise TypeError(f"Type error: {left_type} and {right_type}")

def visit_ifThenElse(self, exp, env):


cond_type = exp.cond.accept(self, env)
e0_type = exp.e0.accept(self, env)
e1_type = exp.e1.accept(self, env)
if cond_type == bool and e0_type == e1_type:
return e0_type
else:
then_str = f"then branch is {e0_type}"
else_str = f"else branch is {e1_type}"
raise TypeError(f"Type error: {then_str}, {else_str}")

def visit_let(self, let, env):


exp_def_type = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier] = exp_def_type
return let.exp_body.accept(self, new_env)

1.​ Visitor Methods:


○​ visit_var: Retrieves the type of a variable from the environment.
○​ visit_num: Numbers are always of type IntType.

302
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ visit_bln: Booleans are always of type BoolType.


○​ visit_add: Addition requires both operands to be IntType, resulting in
IntType.
○​ visit_lth: Less-than comparison requires both operands to be IntType and
results in BoolType.
○​ visit_ifThenElse: The condition must be BoolType, and both branches of
the if must have the same type.
○​ visit_let:
i.​ Evaluates the type of the definition expression (exp_def).
ii.​ Updates the environment with the new variable and its inferred type.
iii.​ Infers the type of the body expression (exp_body) using the updated
environment.

This visitor will traverse the expressions and infer types based on the given rules. If it
encounters type mismatches or unsupported operations, it will return ErrorType.

Can you use this visitor to implement a find_type function that returns
the type of an expression?
Here's the implementation of one such a function:

def find_type(e):
"""
Finds the type of an expression via type propagation.

Example:
>>> e0 = Let('w', Num(2), Add(Var('v'), Var('w')))
>>> e1 = Let('v', Num(40), e0)
>>> find_type(e1)
<class 'int'>

>>> e0 = Let('w', Bln(True), And(Var('v'), Var('w')))


>>> e1 = Let('v', Bln(False), e0)
>>> find_type(e1)
<class 'bool'>

>>> e = IfThenElse(Bln(True), Num(0), Num(42))


>>> find_type(e)
<class 'int'>
"""
v = VisitorTypePropagator()
return e.accept(v, {})

303
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

It seems that type inference via auto depends on the ordering of program
constructs. Does that make sense?
Yes, that makes sense. The auto keyword in C++ relies on contextual information to deduce
types. Here's how your observation breaks down:

1. Type Deduction Requires Order:

Starting with Base Types: For auto to deduce a type, it must begin with expressions where
the type is directly apparent—literals or expressions with fully known types. For example:

auto x = 5; // 'x' is deduced as an 'int' because 5 is an 'int'.

Here, auto can deduce x as an int because the literal 5 has an evident, fixed type (int).
Dependent Constructs: Once the type of a base construct is inferred, it can be propagated to
dependent constructs:

auto y = x + 3; // 'y' is deduced as 'int' because 'x' is 'int' and '3' is 'int'.

The deduction of y depends on the already inferred type of x (int). This creates an order of
inference, where types are resolved in a step-by-step manner.

2. Free Variables Are Problematic:

No Free Variables: If a variable used in an expression is free (not previously defined or


deduced), then auto cannot work on that variable unless its type is explicitly annotated or
deduced elsewhere. For example:

auto z = w + 2; // Error: 'w' is not declared, so 'z' cannot be deduced.

Since w is not declared or defined in the code, the type of z cannot be inferred. The inference
process must have complete information for all involved constructs—no missing types or free
variables.

Annotated or Inferred Variables: If free variables are annotated or inferred separately, type
deduction can proceed:

int w = 4;
// Now 'z' can be deduced as 'int', because 'w' is explicitly an 'int'.
auto z = w + 2;

So, in short: type inference (like auto) depends on the ordering of program constructs. The
process starts from constructs whose types are directly knowable (like literals or explicit
declarations), and it moves to more complex expressions that depend on already-inferred parts

304
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

of the program. Free variables, without proper annotations or separate inference, can halt this
process.

It seems that this forward style of type inference is just type checking like in
the last class. What's the difference?
Yes, type checking also assumes a proper ordering of statements, and also propagates types
forwardly. The key difference appears in the implementation of let bindings, e.g.:

The essential difference between type checking and forward type propagation lies in their
approach to verifying and assigning types within code, especially in the context of explicit versus
implicit type information.

Type Checking (Left)

In the type-checking method, types are explicitly declared by the programmer. The primary
role of the method is to verify that these declared types match the actual types of expressions.
Here’s how it works:

●​ The type of the expression on the right side of the let binding (let_exp.exp_def) is
computed.
●​ This computed type (tp_var) is then compared to the type explicitly declared in the
code (let_exp.tp_var).
●​ If they match, the environment is updated with this type information, and the body of the
let expression is type-checked with this updated environment.
●​ If they don’t match, a TypeError is raised, indicating a mismatch between the declared
and actual types.

Thus, type checking is about validating that the types explicitly stated by the programmer
align with the types determined by analyzing expressions. It relies on an upfront declaration of
types and verifies correctness against those declarations.

Forward Type Propagation (Right)

305
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In forward type propagation, no explicit type is declared for let bindings. Instead, the method
infers the type of the expression:

●​ The type of the let_exp.exp_def expression is computed, as in type checking.


●​ However, rather than comparing this inferred type to an explicit declaration, it directly
adds this type to the environment for the let variable.
●​ This inferred type is then used when analyzing the body of the let expression, allowing
subsequent code to rely on this automatically deduced type.

Forward type propagation is thus about automatically inferring and propagating types without
requiring explicit type annotations. It computes types based on the values of expressions and
allows them to flow through the program.

In Summary

●​ Type Checking: Validates that declared types match actual expression types, enforcing
correctness based on programmer-provided types.
●​ Forward Type Propagation: Deduces and assigns types based on expression values,
enabling implicit type information to propagate through the code.

Both are forms of static analysis, but type checking is more about verification, while forward
type propagation is about inference and automatic type deduction.

The SML/NJ approach to type inference; however, is more powerful than


C++'s auto inference. What can it do that auto does not do?
The type inference system used in languages like SML and Haskell, known as Hindley-Milner
(HM), is indeed more powerful than the type propagation system behind C++'s auto. Here's
how Hindley-Milner provides more flexibility:

1. No Explicit Ordering Required:

In C++'s auto, type deduction happens locally and immediately—you must have enough
information at each point in the program to infer the type. Hindley-Milner, however, can infer
types globally, meaning it doesn't require the explicit ordering of constructs. In HM, types of
variables and expressions can be inferred in any order, and type constraints are solved
together across the entire program. Here's how it works:

●​ Simultaneous Inference: HM generates type constraints for all expressions and


variables. It can infer types even if they depend on each other (e.g., functions and
their parameters), postponing resolution until enough information is available to deduce
all types at once.
●​ Example:

306
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

fun f(x) = x + 1
val y = f(true) (* Error: type mismatch *)

●​ Here, even though the type of f is not explicitly declared, HM can infer that f must take
an integer as a parameter because it's being used with + 1. The system also knows that
true is a boolean, so it can catch the type mismatch at compile time.

2. Type Generalization (Polymorphism):

One of the key features of Hindley-Milner is type generalization (or polymorphism), where the
type inference system can infer generic types (polymorphic types) for functions and
expressions. This allows it to infer more general types that can work with multiple types of data.

●​ Example of Polymorphism in SML:

fun identity x = x

Here, HM infers that identity is polymorphic—it can take any type as an argument and
return that same type. Its inferred type is:

identity : 'a -> 'a (* For any type 'a, it returns 'a *)

●​ This generality is much more powerful than C++'s auto, which only infers specific types
based on the context of each expression.

3. Postponing Resolution:

In HM-based systems, the inference process can postpone resolution until more information is
available. Here's why this is powerful:

●​ Unification: Hindley-Milner uses a process called unification, which solves type


constraints across the entire program. If two expressions share an unknown type, the
system introduces type variables (like 'a, 'b), and it can unify these type variables
later based on how they are used in the program. This means the system can infer types
even if not all the information is available upfront. For instance:

fun f g x = g x

The type of f is inferred as:


f : ('a -> 'b) -> 'a -> 'b

●​ This means f takes a function g that maps some type 'a to 'b, and an argument of type
'a, and returns a result of type 'b. Even though f doesn’t specify the exact types of g

307
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

and x, HM is able to infer a generic, polymorphic type that works for any function g and
any type of x.

4. Inference Across Functions:

Hindley-Milner can infer types across function definitions without requiring explicit type
annotations. This means that you can have mutually recursive functions or references to
functions defined later, and the system will still be able to infer types. Example:

fun even 0 = true


| even n = odd (n - 1)
and odd 0 = false
| odd n = even (n - 1)

●​ HM will infer that both even and odd work on integers, even though they refer to each
other. The type system doesn’t require you to define them in a strict order or annotate
them.

5. Handling Free Variables:

Hindley-Milner can infer types for free variables in functions by introducing type variables and
later resolving them. This makes it capable of handling more complex inference scenarios
where types are not immediately apparent. Example:

fun apply f x = f x

HM infers the type of apply as:


apply : ('a -> 'b) -> 'a -> 'b

Summary of Differences

Feature C++ (auto) Hindley-Milner (SML, Haskell)

Type Deduction Local type deduction based on Global type inference that works
Scope immediate context across the entire program

Ordering of Requires explicit ordering of Can infer types without explicit


Constructs constructs (types must be evident ordering of constructs (postpones
at each point) resolution)

Polymorphism Does not handle polymorphism Supports polymorphic types


(inference is specific to each case) (generalization)

308
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Free Variables Cannot infer types for free Can infer types for free variables
variables without explicit using type variables and unification
annotations

Hindley-Milner is more powerful because it can infer types without needing immediate
resolution or explicit ordering of constructs. It can infer generic types through polymorphism
and resolve types globally across the program, even when the relationships between variables
and functions are complex. This gives it a broader range of inference capabilities compared to
the local and immediate nature of C++'s auto.

How is type inference implemented in SML/NJ then?


Type inference in SML/NJ is implemented via an algorithm called Hindley-Milner (HM) type
inference. This algorithm can be thought of as a two-stage process:

1.​ Constraint Generation (Inference Stage):


○​ In this stage, we traverse the program and infer type constraints based on the
structure of the code. The output of this stage is a set of constraints between type
variables and types.
○​ For example, if you have an expression like x + 1, you infer that x must be of
type Int because + operates on integers.
○​ Each expression generates constraints between type variables and types. These
constraints are relations that describe what the types of variables and
expressions should be.
○​ Example: When traversing a program, we encounter variables, function
applications, and operations. For each, we generate constraints:
●​ Variable: If we see a variable x, we don't know its type, so we assign it a
fresh type variable, say T_x.
●​ Function application: If we apply a function f to an argument x, we
generate the constraint that f's type must be of the form T_x -> T_res
(where T_x is the argument type, and T_res is the result type).
●​ Operations: For x + 1, we generate a constraint that x must be an Int
because + works on integers.
2.​ Unification (Solving Stage):
○​ After generating all the constraints, we use unification to solve these constraints
and deduce concrete types.
○​ The unification algorithm tries to make all types consistent by matching type
variables with specific types (like Int, Bool, or function types), or by unifying
two type variables with each other.

This separation between constraint generation and unification is a key feature of HM inference.

309
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How does unification work?


In the unification stage, we take the set of constraints generated in the first stage and try to
solve them. This is done by finding substitutions that make all the constraints consistent. The
unification algorithm:

●​ Unifies type variables: If we encounter two type variables T1 and T2, we unify them by
making them the same type.
●​ Unifies types: If we see a type variable T1 and a specific type Int, we unify T1 with
Int.
●​ Handles polymorphism: When we generalize a function (i.e., infer a polymorphic type),
we ensure that the types can adapt to different input types without conflict.

Can you show how it works for "let id = fn x => x in id 42"?

Step 1: Generate Constraints

●​ For fn x => x, we assign a fresh type variable to x, say T_x. Since the function just
returns x, the return type is also T_x. Therefore, the type of id is T_x -> T_x.
●​ For id 42, we generate the constraint that id is a function that takes an Int as input,
which gives us the constraint T_x = Int.

Step 2: Unification

●​ We have the constraint T_x = Int from the function application, so we unify T_x with
Int.
●​ This gives us the final inferred type for id: Int -> Int.

Recap: Two-Stage Inference

●​ Stage 1 (Constraint Generation): We traverse the program and infer relations between
types using type variables. Each operation or function application introduces constraints
that need to be satisfied.
●​ Stage 2 (Unification): We solve these constraints using unification, substituting type
variables with concrete types and generalizing where needed (e.g., for polymorphic
functions).

This separation makes the algorithm powerful because it can handle complex expressions and
polymorphism efficiently.

310
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you implement constraint generation as a visitor in Python?


The key idea is that for each expression type (like variables, numbers, booleans, or compound
expressions like Add or And), you are generating pairs (constraints) that describe relationships
between types. The figure below will help to understand the process:

See a full implementation below:

class CtrGenVisitor(Visitor):
"""
Examples:
>>> e = Let('v', Num(40), Let('w', Num(2), Add(Var('v'), Var('w'))))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["('TV_1', 'TV_2')", "('TV_2', 'TV_3')", "('v', <class 'int'>)", "('w', <class
'int'>)", "(<class 'int'>, 'TV_3')", "(<class 'int'>, 'v')", "(<class 'int'>, 'w')"]
"""

def __init__(self):
self.fresh_type_counter = 0

def fresh_type_var(self):
"""
Example:
>>> ev = CtrGenVisitor()
>>> [ev.fresh_type_var(), ev.fresh_type_var()]
['TV_1', 'TV_2']
"""
self.fresh_type_counter += 1
return f"TV_{self.fresh_type_counter}"

def visit_var(self, exp, type_var):


"""
Example:

311
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> e = Var('v')
>>> ev = CtrGenVisitor()
>>> e.accept(ev, ev.fresh_type_var())
{('v', 'TV_1')}
"""
return {(exp.identifier, type_var)}

def visit_bln(self, exp, type_var):


"""
Example:
>>> e = Bln(True)
>>> ev = CtrGenVisitor()
>>> e.accept(ev, ev.fresh_type_var())
{(<class 'bool'>, 'TV_1')}
"""
return {(type(True), type_var)}

def visit_num(self, exp, type_var):


"""
Example:
>>> e = Num(1)
>>> ev = CtrGenVisitor()
>>> e.accept(ev, ev.fresh_type_var())
{(<class 'int'>, 'TV_1')}
"""
return {(type(1), type_var)}

def visit_and(self, exp, type_var):


"""
Example:
>>> e = And(Bln(False), Bln(True))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["(<class 'bool'>, 'TV_1')", "(<class 'bool'>, <class 'bool'>)"]

In the above example, notice that we ended up getting a trivial


constraint, e.g.: (<class 'bool'>, <class 'bool'>). That's alright:
don't worry about these trivial constraints at this point. We can
remove them from the set of constraints later on, when we try to
solve them.
"""
ctrs_left = exp.left.accept(self, type(True))
ctrs_right = exp.right.accept(self, type(True))
return ctrs_left | ctrs_right | {(type(True), type_var)}

def visit_add(self, exp, type_var):


"""
Example:
>>> e = Add(Num(1), Num(2))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["(<class 'int'>, 'TV_1')", "(<class 'int'>, <class 'int'>)"]
"""
ctrs_left = exp.left.accept(self, type(1))
ctrs_right = exp.right.accept(self, type(1))

312
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return ctrs_left | ctrs_right | {(type(1), type_var)}

def visit_lth(self, exp, type_var):


"""
Example:
>>> e = Lth(Num(1), Num(2))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["(<class 'bool'>, 'TV_1')", "(<class 'int'>, <class 'int'>)"]
"""
ctrs_left = exp.left.accept(self, type(1))
ctrs_right = exp.right.accept(self, type(1))
return ctrs_left | ctrs_right | {(type(True), type_var)}

def visit_let(self, exp, type_var):


"""
Example:
>>> e = Let('v', Num(42), Var('v'))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["('TV_1', 'TV_2')", "('v', 'TV_2')", "(<class 'int'>, 'v')"]
"""
ctrs_def = exp.exp_def.accept(self, exp.identifier)
fresh_type_var = self.fresh_type_var()
ctrs_body = exp.exp_body.accept(self, fresh_type_var)
return ctrs_def | ctrs_body | {(type_var, fresh_type_var)}

def visit_ifThenElse(self, exp, type_var):


"""
Example:
>>> e = IfThenElse(Bln(True), Num(42), Num(30))
>>> ev = CtrGenVisitor()
>>> sorted([str(ct) for ct in e.accept(ev, ev.fresh_type_var())])
["('TV_1', 'TV_2')", "(<class 'bool'>, <class 'bool'>)", "(<class 'int'>,
'TV_2')"]
"""
fresh_type_var = self.fresh_type_var()
ctrs_e0 = exp.e0.accept(self, fresh_type_var)
ctrs_e1 = exp.e1.accept(self, fresh_type_var)
ctrs_cond = exp.cond.accept(self, type(True))
return ctrs_e0 | ctrs_e1 | ctrs_cond | {(type_var, fresh_type_var)}

Here are a few points about the implementation of constraint generation:

1.​ Fresh Type Variables: You are generating fresh type variables using
fresh_type_var() whenever a new variable or function is encountered. This ensures
that different variables get different types unless constraints force them to be the same.
2.​ Handling Expressions: For each expression, you recursively visit its subexpressions
(e.g., for Add, you visit both the left and right-hand sides). You are imposing constraints
that specify both sides of an operation must have the same type, and you're doing this
for arithmetic (Add) and logical (And) operations, which aligns with HM.

313
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Unification Through Pairs: The output of each visitor method is a set of pairs (or
constraints) of the form (type0, type1). These pairs capture relations like:
○​ A variable must have the same type as its use (e.g., in visit_var).
○​ Both sides of an addition must be int (in visit_add).
○​ Logical operations must involve bool types (in visit_and).
4.​ Let Binding: In the visit_let method, you are generating constraints for the definition
(exp_def) and ensuring that the type of the variable (exp.identifier) is consistent
with its use in the body. The fresh type variable for the body (fresh_type_var())
captures the idea that the result of the entire let expression could be a new type,
distinct from the type of the bound variable.
5.​ Conditionals: The visit_ifThenElse correctly ensures that the conditional
expression must have a bool type and that the types of the two branches must match.

Can you show the pairs that would be produced for “let val v = 3 * 7
in v + v end"?
Sure, let's do it with a figure:

314
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How could we extend our implementation of HM to a language with


functions and polymorphism?
There are several potential directions along which our implementation could be extended:
●​ Removing Trivial Constraints: Remember that trivial constraints (like (bool, bool))
will be removed later, which is a good idea. These don't add any meaningful information
to the type inference process.
●​ Handling Functions: Eventually, if you're working with more complex expressions like
function applications or higher-order functions, you'll want to extend this to handle those
as well. This involves generating function types and ensuring that the argument and
return types are consistent.
●​ Generalization: In HM type inference, generalization happens at let bindings, where
polymorphic types are inferred for functions. You could extend your implementation to
handle this by generalizing type variables in visit_let if needed.

Could you implement a unification procedure to unify the pairs of relations


in the constraints?
We need to iterate through the list of constraints. For each pair (t0, t1), we attempt to unify
them by merging their sets:

def unify(constraints, sets):


"""
This function unifies all the type variables in the list of constraints;
thus, producing a set of unifiers.

Example:
>>> sets = unify([('a', type(1))], {})
>>> integers = sets[type(1)] - {type(1)}
>>> sorted(integers)
['a']

>>> sets = unify([(type(1), 'b'), ('a', type(1))], {})


>>> integers = sets[type(1)] - {type(1)}
>>> sorted(integers)
['a', 'b']

>>> sets = unify([(type(True), 'b'), ('a', type(1))], {})


>>> booleans = sets[type(True)] - {type(True)}
>>> sorted(booleans)
['b']

>>> sets = unify([(type(True), 'b'), ('a', type(1))], {})


>>> integers = sets[type(1)] - {type(1)}
>>> sorted(integers)
['a']

Notice that at this stage, we still allow sets with invalid types. For
instance, the set associated with 'b' in the example below will contain

315
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

four elements, namely: {<class 'bool'>, <class 'int'>, 'b', 'a'}:


>>> sets = unify([(type(True), 'b'), ('a', type(1)), ('a', 'b')], {})
>>> len(sets['b'])
4
"""
if not constraints:
return sets
else:
t0, t1 = constraints[0]
rest = constraints[1:]
if t0 != t1:
s0 = sets.setdefault(t0, set())
s1 = sets.setdefault(t1, set())
new_set = s0 | s1 | {t0, t1}
for type_name in new_set:
sets[type_name] = new_set
return unify(rest, sets)

So, How does the code work?


●​ Set management: We maintain a dictionary where each key points to a set of type
variables, essentially grouping equivalent types.
●​ Union of sets: Whenever we encounter two distinct type variables, we combine their
associated sets, ensuring that the variables are unified.
●​ Recursion: The recursion is properly managed with the rest of the constraints being
passed for further unification.
The figure below will show how we merge the sets when solving the query unify([('c',
'd'), (int, 'b'), (bool, 'c'), ('a', int), ('a', 'b')], {}):

316
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This algorithm seems very inefficient. Surely, that's not how unification is
implemented in practice, right?
In practice we could use union-find to implement unification. The inefficiency in our
implementation of unification arises from the repeated merging of sets. In each recursive call,
when two elements t0 and t1 are found not to be equal, it merges the sets associated with
each element by creating a union of s0, s1, t0, and t1. Then, it updates every element in
this new set to refer to the merged set. This approach has two main problems:

1.​ Repeated Set Reassociation: Each time two sets are merged, every element in both
sets must be updated to point to the newly formed merged set. As the number of sets
grows, these updates become increasingly expensive, especially when the sets overlap
significantly. This can lead to quadratic or even worse time complexity, depending on the
number and structure of constraints.
2.​ Lack of Efficient Lookup and Merging: The implementation lacks an efficient way to
determine which set a particular element belongs to. It manually merges sets by iterating
through all members, which is inefficient for large sets or numerous constraints.

317
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How the Union-Find Algorithm Improves Efficiency

The union-find (or disjoint-set) algorithm addresses these inefficiencies by providing a more
efficient way to manage and merge sets. Union-find is specifically designed to handle dynamic
connectivity problems, where the task is to quickly determine which elements are in the same
set and to efficiently merge sets when necessary. It has two main components: find and union,
each optimized with techniques that greatly reduce the time complexity.

1.​ Find with Path Compression: In union-find, each element initially points to itself as its
own "parent." When determining which set an element belongs to, the find operation
follows parent pointers until it reaches the root of the set. To improve efficiency, path
compression is used to make each element in the path point directly to the root. This
flattens the structure, so future lookups for elements in the same set become very fast.
2.​ Union with Union by Rank: When two sets need to be merged, the union operation
links the root of one set to the root of the other. Union by rank (or union by size)
ensures that the smaller tree is always attached under the root of the larger tree. This
keeps the overall structure balanced, minimizing the height of the trees.

These optimizations—path compression and union by rank—make union-find very efficient,


with nearly constant time complexity for both operations, specifically amortized O(α(n)), where
α is the inverse Ackermann function, which grows extremely slowly.

Summary of Improvements

By using union-find, we avoid:

●​ Repeated full-set updates, since each element only needs to point to a root, not to
every other element in the set.
●​ Inefficient set merges, as union-find directly merges roots instead of handling all
individual elements in the sets.

Overall, union-find significantly speeds up the unification process by eliminating redundant work
and providing a more scalable way to manage dynamic set connections, making it well-suited
for unification tasks with potentially large numbers of constraints.

We need to associate each set produced after unification with a canonical


type. This process is called canonicalization. Can you explain how it will
happen?
The term canonicalization refers to the process of converting data into a canonical form, which
is a standardized or normalized representation. In the context of your unification algorithm,
canonicalization ensures that every type variable or set of types is mapped to a single, unique
representative—its "canonical" type.

Purpose of Canonicalization in Type Unification

318
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

When unifying type constraints, multiple variables might represent the same type, but initially,
they are distinct. After unification, we want to treat all equivalent types as a single entity, which
simplifies reasoning about them. This is where canonicalization comes into play: it selects a
single "canonical" type for each group of unified types.

Why is it Necessary?

During unification, several type variables or concrete types might end up in the same set. For
example, if we unify the type variable 'a' with int, then 'a' should now be considered
equivalent to int. The purpose of canonicalization is to:

1.​ Ensure consistency: Instead of having a set of equivalent types (e.g., {'a', 'b',
int}), we map all type variables in the set to a single canonical representative, such as
int.
2.​ Error detection: If a type set contains multiple concrete types (e.g., int and bool), this
indicates a type error (as a variable cannot be both an int and a bool).
Canonicalization helps detect and handle such cases by checking the size of the set of
types.
3.​ Simplify further processing: Once the canonical form is established, subsequent type
checks or inferences can simply refer to the canonical type instead of having to reason
about a set of potential types.

Why It's Called "Canonicalization"

The term canonical comes from mathematics and computer science, where it refers to a
standard, normalized, or simplified form of an object. In type systems:

●​ A canonical type is the "final" or "standard" type after unification, such as int, bool,
etc.
●​ Canonicalization is the process of reducing a set of equivalent types (or type variables)
to one standard representative type.

In your name_sets function, canonicalization works by:

●​ Removing non-concrete types (like type variables 'a', 'b') from the set.
●​ Ensuring the set contains only one concrete type (e.g., int, bool).
●​ If there's exactly one concrete type, it's chosen as the canonical representative.
●​ If the set contains multiple concrete types or no concrete type at all, a "Type error" is
raised, signaling ambiguity or failure in unification.

Example of Canonicalization

Suppose the unification process results in the set:

319
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

{'a', 'b', int}

Canonicalization will:

●​ Ignore the type variables 'a' and 'b' (since they are not concrete types).
●​ Recognize that the only concrete type left is int.
●​ Set 'a' and 'b' both to refer to int as their canonical type.

In this way, 'a' and 'b' are now treated as equivalent to int everywhere, which simplifies
further type reasoning.

Why is canonicalization important for unification?


Without canonicalization, the output of the unification process would be a collection of sets of
types, which is harder to reason about. By mapping all type variables to canonical types, we:

●​ Make type checking more efficient.


●​ Prevent inconsistent or ambiguous type inferences.
●​ Ensure that type variables always map to a single, well-defined type.

Thus, canonicalization is a crucial step in ensuring that the unified types are logically consistent
and usable in further stages of type inference.

Can you implement it in Python?

Here's an implementation:

def name_sets(sets):
"""
This method replaces type sets with "canonical type names". A canonical
type name is the name of a type set. For instance, the type set
{'a', 'b', type(int)} has the canonical name type(int)

Notice that this method produces two types of error messages:


* Polymorphic type: if any canonical type set is empty
* Ambiguous type: if any canonical type set contains more than one element.
In both cases, if any of these errors happen, the program should stop with
the following error message: 'Type error'

Example:
>>> sets = name_sets({'a': {'a', 'b', type(1)}, 'b': {'a', 'b', type(1)}})
>>> [sets['a'], sets['b']]
[<class 'int'>, <class 'int'>]

>>> sets = name_sets({'a': {'a', type(1)}, 'b': {'b', type(True)}})


>>> [sets['a'], sets['b']]
[<class 'int'>, <class 'bool'>]
"""

320
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def canonicalize(s):
aux = set(s)
for e in aux:
if not isinstance(e, type):
s.remove(e)
if len(s) == 0 or len(s) > 1:
sys.exit("Type error")

for type_set in sets.values():


canonicalize(type_set)
return {v: list(e)[0] for v, e in sets.items()}

Here goes a few takeaways from the implementation:


●​ Correctness of canonicalization: You check if the canonical type sets contain zero or
more than one type (which would be an ambiguous or polymorphic type), and if so, you
exit with a "Type error." This approach ensures type safety.
●​ Type extraction: The logic for removing non-class entries from the sets (if not
isinstance(e, type)) is a good way to ensure only concrete types remain in the
sets.
●​ Final dictionary: You correctly return a new dictionary where each type variable is
mapped to its canonical type. This is the expected output for a unification algorithm.
The figure below helps to understand what is the final name associated with each set type,
assuming the query below:

set_bool = {'d', 'c', bool}


set_int = {'a', 'b', int}
y = name_sets({'c': set_bool, 'd': set_bool, int: set_int, 'b': set_int, bool:
set_bool, 'a': set_int})

Good! Now, as a final step, can you implement a type inference engine that
uses all your functions?

Sure, here it is:

321
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def infer_types(expression):
"""
This method maps all the program variables to type names.

Example:
>>> e = Let('v', Num(42), Var('v'))
>>> type_names = infer_types(e)
>>> type_names['v']
<class 'int'>

>>> e = Let('v', Num(1), Let('y', Var('v'), Var('y')))


>>> type_names = infer_types(e)
>>> [type_names['v'], type_names['y']]
[<class 'int'>, <class 'int'>]

>>> e0 = Let('v', Num(1), Let('y', Var('v'), Var('y')))


>>> e1 = IfThenElse(Lth(e0, Num(2)), Bln(True), Bln(False))
>>> e2 = Let('w', e1, And(Var('w'), Var('w')))
>>> type_names = infer_types(e2)
>>> [type_names['v'], type_names['w'], type_names['y']]
[<class 'int'>, <class 'bool'>, <class 'int'>]
"""
ev = CtrGenVisitor()
constraints = list(expression.accept(ev, ev.fresh_type_var()))
type_sets = unify(constraints, {})
return name_sets(type_sets)

Unification is also an algorithm used in Prolog. Is that related to unification


in the HM Type Inference algorithm?
Yes, unification in Prolog and in the Hindley-Milner (HM) algorithm are closely related! Both
refer to the process of finding a consistent assignment of variables that makes two expressions
identical. The key difference lies in the contexts and specific use cases of unification in Prolog
and the HM type inference algorithm.

Unification in Prolog

In Prolog, unification is the core operation that drives the logic programming language. It is the
process by which the interpreter matches terms (such as variables, atoms, or structures) when
attempting to satisfy logical queries. Prolog's unification:

●​ Matches terms: Variables can be bound to values (other variables, constants, or


structures) in such a way that the terms being unified become identical.
●​ Bidirectional: Unification in Prolog works symmetrically, meaning X = Y and Y = X are
equivalent.
●​ Occurs at runtime: Prolog uses unification during query resolution to determine whether
a fact or rule matches the given query.

For example, in Prolog:

322
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

?- X = foo(Y), foo(bar) = X.

Prolog will unify X with foo(Y) and foo(bar), resulting in the binding Y = bar.

Unification in the Hindley-Milner Algorithm

In the HM type inference algorithm, unification is used to ensure consistent types for
expressions in a typed functional language (e.g., ML, Haskell). It:

●​ Matches type variables with concrete types or other type variables to ensure type
consistency.
●​ Is directional in most cases, since unification proceeds as part of type checking.
●​ Occurs at compile-time: In the HM algorithm, unification is part of static type inference,
helping to deduce the types of expressions without explicit type annotations.

For example, if you have the expression f(x), where the type of f is inferred to be a -> b and
the type of x is a, unification ensures that the input type of f and the type of x are compatible by
solving constraints like a = int.

Commonalities

1.​ Variable Binding: Both in Prolog and HM, unification binds variables to values (in HM,
these are type variables and types).
2.​ Finding a Substitution: In both systems, unification finds a substitution (or solution) that
makes two terms (or types) identical.
3.​ Failure Handling: In both Prolog and HM, unification can fail if no consistent substitution
is found (e.g., trying to unify an int with a bool in HM, or an incompatible structure in
Prolog).

Differences

1.​ Purpose: In Prolog, unification is used for logical reasoning and query resolution, while
in HM, it is used for type inference to ensure type consistency.
2.​ Symmetry: Prolog's unification is symmetric, while HM unification is often directional
because types are inferred in a structured way (from expressions to types).
3.​ Application Domain: Prolog uses unification in the context of logic programming and
deductive reasoning, while HM unification is about static typing and type inference in
functional languages.

Relation Between the Two

Both forms of unification share the same theoretical basis: they are both about solving
equations between terms or types by finding a substitution that makes them identical. In that
sense, they are different applications of the same fundamental concept.

323
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Prolog uses unification for term matching in a logic programming context.


●​ Hindley-Milner uses unification for type checking in a statically typed functional
programming context.

Conclusion

Although unification in Prolog and in HM type inference serve different purposes and have
distinct characteristics, they are based on the same underlying principle of matching structures
through substitution. The term "unification" is thus used in both cases because both involve
solving the same type of problem—making two things equivalent by finding suitable variable
assignments.

Can you show me where I can find a real-world implementation of a type


inference engine?
I'd recommend you take a look into "Type inference as constraint solving: how GHC's type
inference engine actually works" (https://simon.peytonjones.org/type-inference/) by Simon
Peyton Jones (who is the author of the GHC compiler). The blog provides a high-level overview
of GHC's type inference process.

Now that we saw HM Type Inference, can you contrast it with C++11's
auto?

Type Deduction with auto

In languages like C++ and Rust, the auto keyword (or similar mechanisms) allows the compiler
to deduce the type of a variable based on the expression assigned to it. This deduction
happens as the compiler processes the code in a single pass, propagating type information
forward based on the context where the variable is used. The important characteristics of this
process include:

●​ Deterministic: The compiler deduces the type without any backtracking. Once a type is
inferred, it is fixed.
●​ Context-sensitive: Type deduction relies on the immediate context of the variable, such
as the right-hand side of an assignment or the return type of a function.
●​ Ordered: Type deduction happens in a linear, top-to-bottom fashion, following the order
in which the code is written.

This process doesn't involve solving complex constraints like in the Hindley-Milner (HM)
system. Instead, the compiler simply propagates type information through the code, from the
point where the type is obvious (e.g., from a literal or function return type) to the places where it
needs to be inferred.

324
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Type Propagation (as in C++'s auto)

1.​ Type Information Flows Forward: The type deduced for an expression (e.g., a function
call) is propagated to variables or further expressions, without the need to solve
bidirectional constraints like in HM or Prolog.
2.​ No Backtracking: Once the type is propagated, it remains fixed. There's no need to
revisit earlier parts of the code to adjust types based on future information, which is a
key distinction from unification-based systems.
3.​ Sequential Nature: The deduction happens in the order that the program constructs are
encountered, and types are propagated forward through the code in a single pass.

Comparison to Hindley-Milner (HM) Unification

●​ Hindley-Milner: The HM system is more complex because it must handle type


variables and their unification across potentially recursive and polymorphic structures.
HM involves solving constraints generated from expressions and may require
backtracking or solving type equations to find consistent type assignments.
●​ Type Deduction with auto: In contrast, auto-style type deduction is more
straightforward. It’s local to specific expressions and propagates known types forward,
without the need for complex unification or backtracking.

Why Call It "Type Propagation"?

Calling it type propagation captures the following aspects:

●​ Directional Flow: Types are deduced and propagated through the program as the
compiler encounters each construct.
●​ No Backtracking: The process doesn't involve backtracking or revisiting earlier points in
the code to revise types, as unification might.
●​ Simple Deduction: The types are deduced from well-defined rules and available
information at the point of assignment, rather than solving a system of constraints as in
HM.

Conclusion

Yes, you can indeed call this process type propagation, as it aptly describes how type
information is forwarded through the program constructs in a linear, deterministic manner. This
differentiates it from unification-based type inference systems like HM, where types are
resolved through a constraint-solving process that may involve backtracking or complex
interactions between type variables.

325
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

It seems that Robin Milner did much for computer science. Could you tell
me some of his contributions?
Yes, he did quite a lot! Robin Milner (1934–2010) was a British computer scientist who
profoundly influenced programming languages, concurrency theory, and formal methods. He
was recognized with the Turing Award in 1991 for his foundational contributions to computer
science.

Here is a summary of his key contributions:

1. Programming Languages and Type Systems

●​ ML (MetaLanguage): Milner developed ML, one of the first programming languages to


feature a polymorphic type system, which was an early implementation of type inference.
This made programming both safer and more expressive by detecting type errors at
compile time.
●​ Hindley-Milner Type System: He helped formalize this influential type system, which
underpins the type inference mechanisms of functional programming languages like
Haskell, OCaml, and Scala.

2. Concurrency Theory

●​ Calculus of Communicating Systems (CCS): Milner created CCS to model and


reason about concurrent systems. CCS provided a mathematical framework for
understanding how independent processes interact, laying the foundation for much of
modern concurrency theory.

326
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ π-Calculus: Milner co-developed the π-calculus, an extension of CCS, which enabled


modeling of systems with dynamic process creation and communication, making it a
cornerstone of mobile and distributed computing.

3. Automated Theorem Proving

●​ LCF (Logic for Computable Functions): Milner worked on this early interactive
theorem prover, which helped formalize proofs in logic and computation. It also
introduced innovations like tactics and tacticals, which have influenced modern proof
assistants.

4. Semantics of Programming

●​ Milner's work emphasized the need for precise semantics in programming languages,
helping bridge the gap between theoretical models and practical programming.

5. Awards and Legacy

●​ In addition to the Turing Award, Milner received numerous honors, including election to
the Royal Society and the US National Academy of Sciences. His influence extends
across programming language design, software verification, and theoretical computer
science.

Robin Milner's work remains a cornerstone of computer science, influencing areas as diverse as
functional programming, distributed systems, and formal verification methods.

327
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Anonymous Functions and Closures


Code: https://github.com/pronesto/DCC053_Public/tree/main/11_AnonymousFunctions

Have you ever heard of anonymous functions? Which programming


languages feature them?
Anonymous functions, also known as lambda functions, are a fundamental concept in
programming and are featured in many languages. Here’s a brief overview:

What are Anonymous Functions?

Anonymous functions are functions that do not have a name. They are often used for short-lived
operations, passed as arguments to other functions, or used in functional programming
contexts. They are useful for creating concise, inline functions and can be defined and used in a
single line of code.

Programming Languages Featuring Anonymous Functions

1.​ Functional Languages:


○​ Haskell: Uses lambda expressions to define anonymous functions:
i.​\x -> x + 1
○​ Lisp (and dialects like Scheme): Lambda expressions are central to Lisp.
i.​(lambda (x) (+ x 1))
2.​ Imperative Languages:
○​ Python: Supports lambda functions.
i.​add = lambda x, y: x + y
○​ JavaScript: Uses arrow functions.
i.​const add = (x, y) => x + y;
○​ Ruby: Provides lambda and proc objects.
i.​add = ->(x, y) { x + y }
3.​ Object-Oriented Languages:
○​ Java: Supports anonymous inner classes and, since Java 8, lambda
expressions.
i.​(x, y) -> x + y
○​ C#: Uses lambda expressions.
i.​Func<int, int, int> add = (x, y) => x + y;
○​ Swift: Features closures, which are similar to anonymous functions.
i.​let add = { (x: Int, y: Int) -> Int in return x + y }
○​ Scala: Supports anonymous functions with a concise syntax.
i.​(x: Int, y: Int) => x + y
○​ Kotlin: Allows lambda expressions.
i.​val add = { x: Int, y: Int -> x + y }
4.​ Scripting Languages:

328
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Perl: Supports anonymous subroutines.


i.​my $add = sub { $_[0] + $_[1] };
5.​ Others:
○​ MATLAB: Anonymous functions are supported.
i.​add = @(x, y) x + y;

How do we use anonymous functions in SML/NJ?

In Standard ML of New Jersey (SML/NJ), anonymous functions are defined using the fn
keyword. Here are some examples to illustrate how anonymous functions work in SML/NJ:

1. Simple Anonymous Function

This example defines an anonymous function that adds two numbers:

val add = fn (x, y) => x + y;

Here, fn (x, y) => x + y is an anonymous function that takes two arguments and returns
their sum.

2. Using Anonymous Functions Inline

You can use anonymous functions directly in expressions without assigning them to variables:

val result = (fn x => x * x) 5; (* result is 25 *)

In this case, (fn x => x * x) defines an anonymous function that squares its argument, and
5 is the argument provided to this function.

3. Higher-Order Functions with Anonymous Functions

Anonymous functions are often used as arguments to other functions. For example, using
List.map to apply a function to each element of a list:

val nums = [1, 2, 3, 4];


val squared = List.map (fn x => x * x) nums; (* squared is [1, 4, 9, 16] *)

Here, fn x => x * x is an anonymous function used to square each element in the list nums.

4. Function as Return Value

Anonymous functions can also be returned from other functions. For instance:

fun makeAdder n = fn x => x + n;


val add5 = makeAdder 5;

329
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

val result = add5 10; (* result is 15 *)

In this example, makeAdder returns an anonymous function that adds a given number n to its
argument. add5 is an instance of this function with n set to 5.

5. Function with Multiple Parameters

In SML/NJ, every function takes in just one argument; however, functions can still receive
multiple parameters via tuples:

val multiply = fn (x, y) => x * y;


val product = multiply (3, 4); (* product is 12 *)

Here, fn (x, y) => x * y defines an anonymous function that multiplies two numbers.

6. Using Anonymous Functions in Pattern Matching

Anonymous functions can be used in pattern matching:

val print = fn x => case x of


1 => (fn _ => "One") ()
| 2 => (fn _ => "Two") ()
| _ => (fn _ => "Other") ();

In this case, different anonymous functions are used based on the pattern matched by the case
expression.

Some languages allow us to implement closures with these anonymous


functions. Do you know what a closure is?
A closure is a programming concept where a function retains access to its lexical environment,
even after that function has been executed outside its original context. This means that a
closure can remember and access variables from the scope in which it was created, even when
it is executed elsewhere.

Key Concepts of Closures

1.​ Function with Environment: A closure is a function that has access to variables from
its creation context, known as its environment. This environment includes any local
variables that were in scope when the function was created.
2.​ Encapsulation: Closures allow functions to encapsulate state. This means that a
closure can hold on to the state of variables across function calls, which can be
particularly useful for creating objects with private state.

330
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Function and Data Together: A closure consists of the function itself and the
environment in which it was defined. This means that the function can operate on the
variables from its creation environment, even after that environment has gone out of
scope.

If I have a function like fun add x y = x + y and do val inc = add


1, then can I say that inc is a closure?
Yes, inc is a closure in this case. A closure is a function that captures variables from its outer
lexical scope and can access and modify them even after the outer function has returned. In
your example, inc captures the value 1 from the outer function add and can use it to increment
any number passed to it. Here's a breakdown of why inc is a closure:

●​ inc is a function defined within the scope of add.


●​ inc references the variable x from add.
●​ When inc is called, it uses the value of x that was captured at the time it was defined.
●​ Even if inc has returned, inc can still access and use the value of x.

This behavior is characteristic of closures, and it allows for creating functions that are more
flexible and reusable. Here's how we could use inc:

val result1 = inc 5; (* result1 is 6 *)


val result2 = inc 10; (* result2 is 11 *)

In this case, inc is a closure because it captures and retains the value of x from the
environment where it was defined. This behavior—where a function retains access to its
environment even after its outer function has returned—is the defining feature of a closure.

Let's try to describe the semantics of anonymous functions. We shall start


simple. Could you draft some simple semantic rules?
Below we have evaluation rules for our language of arithmetic and logic expressions augmented
with the evaluation of functions:

331
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Here's a breakdown and assessment of each rule:

1.​ EAdd: This rule handles addition of two expressions. It says:


○​ If e0 evaluates to v0 and e1 evaluates to v1, then e0 + e1 evaluates to v0 +
v1.
2.​ EAnF & EAnT: These two rules handle logical conjunction (and):
○​ EAnF: If e0 evaluates to false, then e0 and e1 evaluates to false,
regardless of e1. This is valid because false and anything is always
false.
○​ EAnT: If e0 evaluates to true and e1 evaluates to v1, then e0 and e1
evaluates to v1. This makes sense as well since true and v1 results in v1.
3.​ ELtT & ELtF: These rules handle less-than comparisons:
○​ ELtT: If e0 evaluates to v0, e1 evaluates to v1, and v0 < v1, then e0 < e1
evaluates to true. This reflects the logic of less-than comparisons.
○​ ELtF: Similarly, if v0 is not less than v1, then e0 < e1 evaluates to false.
4.​ EIfT & EIfF: These rules evaluate conditional expressions (if):
○​ EIfT: If the condition e0 evaluates to true, then the entire if expression
evaluates to e1's value, v. This is consistent with standard conditional evaluation.
○​ EIfF: If e0 evaluates to false, then the if expression evaluates to the result of
e2.
5.​ EVar: This rule handles variable lookup:

332
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If C[name] equals v, then the variable name evaluates to v. This rule is standard
for looking up values in the context C.
6.​ ELet: This rule handles let expressions:
○​ If e0 evaluates to v0, and e1 evaluates to v1 in the extended context C +
{name:v0}, then let name = e0 in e1 end evaluates to v1. This is how
we evaluate let expressions, where a new variable binding is introduced.
7.​ EFn: This rule defines the creation of an anonymous function:
○​ The rule says fn x => e evaluates to the function fn(x, e). This matches
the creation of a closure, where the function body e is captured along with its
argument x.
8.​ EApp: This rule handles function application:
○​ If e0 evaluates to a function fn(x, e), and e1 evaluates to v1, then applying
e0 to e1 results in evaluating e in the context extended with the binding {x:v1}.
This rule correctly represents the evaluation of function applications, where the
function body is evaluated with the argument substituted for the parameter.

Now, we have three kinds of values: integers, booleans and functions. Is


that correct?
Yes, that makes perfect sense! In fact, this is a key insight about how introducing functions
changes the set of possible values in a programming language.

Before Functions:

In languages without first-class functions, values typically belong to a limited set of primitive
types, such as:

●​ Integers: Values like 1, 42, etc.


●​ Booleans: Values like true, false.

After Introducing Functions:

Now, in addition to integers and booleans, functions themselves become values. This is a
fundamental feature of languages that treat functions as first-class citizens. So, you have a third
category of values:

●​ Function Values: These are represented as closures (e.g., fn(x, x + 1)), which
consist of the function body along with any variables captured from the surrounding
environment (i.e., its lexical scope).

In operational semantics, once you introduce anonymous functions, you expand the notion of
what constitutes a "value." A function like fn(x, x+1) is a legitimate value, just like 1 or true.

333
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

It can be passed around as a parameter, stored in data structures, or returned as the result of
other functions.

This change is a hallmark of higher-order languages, where functions can be treated just like
any other value type.

Implications:

1.​ Functions as Values: In languages that support closures, functions are treated as
values that can be created dynamically, passed as arguments, returned from other
functions, and stored in variables.
2.​ Evaluation: In your operational semantics, this is captured by the rule EFn, where a
function expression evaluates to a closure value (e.g., fn(x, e)).
3.​ Extended Type System: The type system needs to recognize this new category of
values. For example, if you had a static type system, the set of types would now include
function types like int -> int in addition to primitive types like int or bool.
4.​ Higher-Order Programming: Now that functions are first-class values, you can write
higher-order functions, which are functions that take other functions as arguments or
return functions as results.

Example:

For instance, after defining val inc = fn(x) => x + 1, inc is now a value in the
program's environment. Just like you can refer to 1 or true, you can refer to inc and pass it
around, apply it to arguments, etc.

This idea fits perfectly with what you are noticing in the semantic rules: when functions are
first-class citizens, the set of values must naturally be extended to include function values.

Could you augment our interpreter of expressions adding classes to


represent functions and the application of functions?
Sure. Here's the implementation:

import sys
from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def accept(self, visitor):
pass

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier

334
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def accept(self, visitor, arg):


return visitor.visit_var(self, arg)

class Num(Expression):
def __init__(self, num):
self.num = num

def accept(self, visitor, arg):


return visitor.visit_num(self, arg)

class Bln(Expression):
def __init__(self, bln):
self.bln = bln

def accept(self, visitor, arg):


return visitor.visit_bln(self, arg)

class BinaryExpression(Expression):
def __init__(self, left, right):
self.left = left
self.right = right

class And(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_and(self, arg)

class Add(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_add(self, arg)

class Lth(BinaryExpression):
def accept(self, visitor, arg):
return visitor.visit_lth(self, arg)

class Let(Expression):
def __init__(self, identifier, exp_def, exp_body):
self.identifier = identifier
self.exp_def = exp_def
self.exp_body = exp_body

def accept(self, visitor, arg):


return visitor.visit_let(self, arg)

class IfThenElse(Expression):
def __init__(self, cond, e0, e1):
self.cond = cond
self.e0 = e0
self.e1 = e1

def accept(self, visitor, arg):


return visitor.visit_ifThenElse(self, arg)

class Fn(Expression):
def __init__(self, formal, body):
self.formal = formal

335
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

self.body = body

def accept(self, visitor, arg):


return visitor.visit_fn(self, arg)

class App(Expression):
def __init__(self, function, actual):
self.function = function
self.actual = actual

def accept(self, visitor, arg):


return visitor.visit_app(self, arg)

The only changes from the language of arithmetic expressions we had seen in the last class are
the addition of two new classes: Fn and App, which represent the implementation of anonymous
functions and the application of functions. The actual evaluation of these functions will happen
through a visitor.

Before we implement the evaluation of functions, we will need a way to


represent functions as values. How to do it?
We can define a new class that embodies functions. Instances of this class can be returned
from expressions that evaluate to functions. Here's an implementation:

class Function:
"""
This is the class that represents functions. This class lets us distinguish
the three types that now exist in the language: numbers, booleans and
functions. Notice that the evaluation of an expression can now be a
function. For instance:

Example:
>>> f = Function('v', Add(Var('v'), Var('v')))
>>> print(str(f))
Fn(v)
"""

def __init__(self, formal, body):


self.formal = formal
self.body = body

def __str__(self):
return f"Fn({self.formal})"

So, can you implement this VisitorEval that evaluates the language
augmented with functions?
Here's the implementation:

class VisitorEval:

336
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

"""
Pretty print an expression based on its type.

Example:
>>> f = Fn('v', Add(Var('v'), Var('v')))
>>> e = App(f, Add(Num(40), Num(2)))
>>> ev = VisitorEval()
>>> e.accept(ev, {})
84

>>> f = Fn('v', Add(Var('v'), Var('w')))


>>> e = Let('w', Num(3), App(f, Num(2)))
>>> ev = VisitorEval()
>>> e.accept(ev, {})
5

>>> e = Let('f', Fn('x', Add(Var('x'), Num(1))), App(Var('f'), Num(1)))


>>> ev = VisitorEval()
>>> e.accept(ev, {})
2

>>> e0 = Let('w', Num(3), App(Var('f'), Num(1)))


>>> e1 = Let('f', Fn('v', Add(Var('v'), Var('w'))), e0)
>>> e2 = Let('w', Num(2), e1)
>>> ev = VisitorEval()
>>> e2.accept(ev, {})
3
"""

def visit_var(self, var, env):


if var.identifier in env:
return env[var.identifier]
else:
sys.exit(f"Variavel inexistente {var.identifier}")

def visit_num(self, num, env):


return num.num

def visit_bln(self, exp, env):


return exp.bln

def visit_add(self, add, env):


return add.left.accept(self, env) + add.right.accept(self, env)

def visit_and(self, exp, env):


val_left = exp.left.accept(self, env)
if val_left:
return exp.right.accept(self, env)
return False

def visit_lth(self, exp, env):


val_left = exp.left.accept(self, env)
val_right = exp.right.accept(self, env)
return val_left < val_right

337
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def visit_ifThenElse(self, exp, env):


cond = exp.cond.accept(self, env)
if cond:
return exp.e0.accept(self, env)
else:
return exp.e1.accept(self, env)

def visit_let(self, let, env):


e0_val = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier] = e0_val
return let.exp_body.accept(self, new_env)

def visit_fn(self, exp, env):


return Function(exp.formal, exp.body)

def visit_app(self, exp, env):


fval = exp.function.accept(self, env)
if not isinstance(fval, Function):
sys.exit("Type error")
pval = exp.actual.accept(self, env)
new_env = dict(env)
new_env[fval.formal] = pval
return fval.body.accept(self, new_env)

The VisitorEval class is an evaluator for a language that includes basic arithmetic, boolean
logic, variable bindings, and anonymous functions. It works by traversing an Abstract Syntax
Tree (AST) that represents the program, evaluating each node and producing a result. The
evaluation is driven by the Visitor Pattern, where each node type in the AST has its own
corresponding visit method.

The class accepts two arguments in its visit methods:

1.​ exp: The current expression node being evaluated.


2.​ env: The environment, which is a dictionary mapping variable names to their values.

Key Operations in VisitorEval

Variable and Number Evaluation

●​ visit_var(self, var, env): Looks up a variable's value in the current


environment (env). If the variable isn't found, it exits with an error.
●​ visit_num(self, num, env): Simply returns the numerical value of the node.

Boolean and Arithmetic Operations

●​ visit_add(self, add, env): Recursively evaluates both the left and right sides of
an addition and returns their sum.

338
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ visit_and(self, exp, env): Implements short-circuit evaluation for boolean and.


It evaluates the left operand first; if it's false, it skips evaluating the right operand and
returns False.
●​ visit_lth(self, exp, env): Compares two values for the less-than relation after
evaluating both sides.

Control Flow: If-Then-Else

●​ visit_ifThenElse(self, exp, env): Evaluates a conditional expression (if e0


then e1 else e2). If the condition evaluates to true, it returns the result of e1;
otherwise, it returns the result of e2.

Let Bindings

●​ visit_let(self, let, env): Evaluates a let expression, which introduces a new


variable binding. It first evaluates the expression defining the variable, then adds the
result to a new environment, and finally evaluates the body of the let in this updated
environment.

Function Evaluation

●​ visit_fn(self, exp, env): When evaluating an anonymous function (fn(x) =>


body), it returns a Function object. This object encapsulates the function's formal
parameter and body.
●​ visit_app(self, exp, env): Handles function application (f(x)). It evaluates the
function expression and ensures that it results in a Function object. Then, it evaluates
the argument (the actual parameter), binds the result to the function's formal parameter,
and finally evaluates the function body in this updated environment.

Example of Evaluation Flow:

1.​ Variables are evaluated by looking up their values in the environment (visit_var).
2.​ Addition is evaluated by recursively evaluating both operands (visit_add).
3.​ Anonymous functions are evaluated into Function objects, but are not immediately
executed (visit_fn).
4.​ Function application evaluates the function and argument, then executes the function
body with the argument bound to the formal parameter (visit_app).

Summary

●​ This class implements an interpreter for a language supporting basic arithmetic, boolean
logic, let bindings, and first-class anonymous functions.

339
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The environment (env) is used to manage variable bindings, and it is updated


appropriately when entering new scopes, such as in let bindings and function
applications.
●​ Function values are first-class and are treated just like other values in the environment.
When a function is applied, a new environment is created where the function parameter
is bound to the argument, and the function body is then evaluated in that environment.

What is the value of the program below, according to your interpreter?

let
w = 2
in
let
f = fn v => v + w
in
let
w = 3
in
f 1
end
end
end

To write this program in the Python representation used by your VisitorEval class, you need
to represent the nested let expressions, the function f, and the function application f 1.
Here's the Python representation of the program, followed by an explanation of how it works
with your visitor.

Program Structure (in Python)


# Expression:
# let w = 2 in let f = fn v => v + w in let w = 3 in f 1 end end end

# Step 1: The innermost expression is the application of function f to 1.


app_f_1 = App(Var('f'), Num(1))

# Step 2: Let expression with w = 3 and body app_f_1.


let_w_3 = Let('w', Num(3), app_f_1)

# Step 3: Define function f = fn v => v + w.


fn_f = Fn('v', Add(Var('v'), Var('w')))

# Step 4: Let expression with f = fn_f and body let_w_3.


let_f_fn = Let('f', fn_f, let_w_3)

# Step 5: Outer let expression with w = 2 and body let_f_fn.

340
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

program = Let('w', Num(2), let_f_fn)

# Initialize the visitor evaluator.


ev = VisitorEval()

# Evaluate the program.


result = program.accept(ev, {})

# Output the result.


print(result) # Expected output: 3

Explanation

1.​ Step 1: We create the expression f 1 using App(Var('f'), Num(1)), which


represents the application of the function f to the argument 1.
2.​ Step 2: We represent let w = 3 in f 1 end using Let('w', Num(3),
app_f_1). This binds w to 3 in the scope of the expression f 1.
3.​ Step 3: We define the function f = fn v => v + w using Fn('v',
Add(Var('v'), Var('w'))). This is an anonymous function that takes one
argument v and returns the result of v + w.
4.​ Step 4: We represent let f = fn v => v + w in ... end using Let('f',
fn_f, let_w_3), which binds the function f to fn v => v + w in the scope of the
inner let w = 3.
5.​ Step 5: Finally, we wrap the entire expression in an outer let w = 2 in ... end
using Let('w', Num(2), let_f_fn).

Evaluation

When we evaluate the program using the visitor ev with an empty environment ({}):

●​ The outermost let w = 2 sets w to 2.


●​ The next let f = fn v => v + w defines the function f with w = 2 in its closure.
●​ The inner let w = 3 temporarily shadows w, but this does not affect the value of w
captured by the closure of f.
●​ When f 1 is applied, the function body v + w is evaluated with v = 1 and the w from
the closure, which is still 2. Therefore, the result is 1 + 2 = 3.

We were expecting the result of 3, but then we got 4. Why?

The reason you're getting the result 4 instead of 3 has to do with the scoping of variables,
specifically how w is handled in the function's closure.

341
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Insight: Variable Shadowing and Closure Behavior


●​ Outer let w = 2: The variable w is initially bound to 2.
●​ f = fn v => v + w: When defining f, it captures the current value of w, which is 2 at
this point. So, f will use this w (from the outer scope) when it's called later. This is the
key part of the closure: it "closes over" the environment where w = 2.

However, in the next part:

●​ Inner let w = 3: This let introduces a new variable w, which shadows the outer w =
2 within the body of this inner let. So within this block, w is 3.

When you apply f 1, even though f was defined when w = 2, it is evaluated in the current
environment where w = 3 is bound. Since f contains a free variable w, this variable is looked
up in the environment where f is applied (not the environment where f was defined). This
means w = 3 is used, not w = 2 from the closure.

Explanation of the Evaluation:

●​ First, w = 2 is bound, and f is defined as fn v => v + w.


●​ Next, w = 3 is bound in the inner let.
●​ When you call f 1, it evaluates the expression v + w with v = 1 and the current value
of w = 3.
●​ This results in 1 + 3 = 4.

Thus, the result is 4, not 3.

Why?

In your implementation, it seems that variables are resolved dynamically, based on the
environment at the time of function application. This behavior is called dynamic scoping, where
variables are resolved based on the current context of execution, rather than lexical scoping
(where variables are resolved based on where the function was defined).

Conclusion:

The surprising result occurs because w is resolved dynamically at the time of function
application. In this case, the inner w = 3 shadows the outer w = 2, so the function uses the
inner w when calculating v + w, leading to 1 + 3 = 4.

342
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, our interpreter is actually implementing dynamic scoping. Not static


scoping. What's dynamic scoping, and what's static scoping?
Dynamic scoping and static (or lexical) scoping are two different ways of resolving variable
names in programming languages. The distinction is important because it affects how functions
behave when they reference variables that are not defined within the function itself but are
accessible from their surrounding environment.

Dynamic Scoping

In dynamic scoping, variables are resolved based on the current execution context (the
environment at the time the function is called). When the function needs to access a variable, it
looks for the most recent definition of that variable in the call stack at runtime.

●​ How it works: When a function is called, the interpreter searches for variable bindings in
the current environment, and if not found, it continues to search in calling environments
further up the call stack. This means the function’s behavior can change based on where
and when it’s called.
●​ Example:

def f():
return x + 1

def g():
x = 10
return f()

x = 5
print(g()) # If dynamically scoped, the value of x used in f would be 10

●​ Under dynamic scoping, when f() is called inside g(), x would refer to the value 10,
even though f() was defined when x = 5. The function takes the environment where it
is called.
●​ Languages that use dynamic scoping:
○​ Early versions of Lisp (before Scheme introduced lexical scoping).
○​ Some shells like bash (variables in functions are dynamically scoped).
○​ Some older languages like Emacs Lisp (although it now supports lexical scoping
as well).
●​ Dynamic scoping is rare in modern programming languages, as it can lead to
unpredictable and hard-to-debug code.

Static (Lexical) Scoping

In static scoping (also called lexical scoping), variables are resolved based on the
environment in which the function is defined, regardless of where the function is called. The

343
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

function carries with it a "snapshot" of the surrounding environment at the time it was defined
(this is typically implemented via closures).

●​ How it works: When a function is defined, the interpreter/compiler determines where


each variable name refers to by looking at the surrounding scopes (the environment
where the function was defined). These references are then fixed, so even if the function
is called from a different scope, the variables it references come from where the function
was defined.
●​ Example:

def f():
return x + 1

def g():
x = 10
return f()

x = 5
print(g()) # If statically scoped, the value of x used in f would be 5

●​ Under static scoping, f() uses the value of x that was in scope when f() was defined,
so it will return 5 + 1 = 6, regardless of the value of x inside g().
●​ Languages that use static (lexical) scoping:
○​ Python.
○​ JavaScript.
○​ Scheme and modern Lisp.
○​ ML and OCaml.
○​ C, C++, Java, and most modern languages.

Lexical scoping is more intuitive and predictable because the behavior of a function depends on
where it is defined, not where it is called.

Dynamic vs. Static Scoping: Key Differences

●​ Variable lookup in dynamic scoping: The function looks for variable bindings in the
current execution environment (where the function is called).
●​ Variable lookup in static (lexical) scoping: The function looks for variable bindings in
the environment where it was defined (the environment at the time the function was
created).

Why is static (lexical) scoping more common?


Static scoping (also called lexical scoping) is more common in modern programming languages
because it offers predictability, clarity, and easier debugging. Here’s why:

344
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Predictability: With static scoping, you can always tell where a variable will come from
just by looking at where the function is defined. The behavior of functions doesn't change
depending on where or when they are called, making it easier to understand the code.
2.​ Clarity: Static scoping ensures that the meaning of variables is fixed at the time of
function definition. This prevents confusion, as you don’t need to trace through the
execution context or call stack to figure out what a variable refers to.
3.​ Easier Debugging: Since variable references are resolved in the environment where the
function was written, debugging is more straightforward. The variable bindings stay
consistent, reducing unexpected behavior or side effects from other parts of the
program.

For these reasons, languages like Python, JavaScript, and C use static scoping, making it the
preferred choice in most modern programming environments.

And how can we implement static scoping in our interpreter?


To implement static scoping, we need to ensure that each function or expression resolves
variables based on the environment in which the function was defined, not the one where it is
called. Here are the key ideas:

1.​ Lexical Environment Capture: When a function is defined, it should capture the
environment (variables and their bindings) that exists at that moment. This is often
referred to as a closure.
2.​ Environment Chain: Each function keeps a reference to the environment in which it
was created. When looking up a variable, we first check the local environment (within the
function) and then move outward to the environment where the function was defined,
and so on, following this chain until we find the variable.
3.​ Scope Determination at Compile Time: The compiler or interpreter determines the
scope of variables based on the structure of the program at the time the function is
written. This means variable resolution is based on the function’s lexical scope.
4.​ Consistent Binding: When the function is invoked later, it uses the environment
captured during its definition, ensuring that variables reference the correct values even if
new variables with the same name exist in the calling environment.

These principles ensure that variables inside a function are always resolved according to their
static scope, which is fixed based on where they were defined in the code.

Can you give me the semantic rules that implement static scoping?
See the new rules below, for the evaluation of function applications:
●​ In the EFn rule (for function definition), the function fn(x, e) captures the current
context C at the moment the function is defined. This ensures that the function
remembers the environment where it was created, which is crucial for static scoping.
●​ In the EApp rule (for function application), when the function is applied, the saved
context C' is used to evaluate the body of the function e, augmented with the parameter

345
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

binding {x:v1}. This shows that the function body is evaluated in the context in which
the function was defined, not where it was called.

Thus, the rules ensure that variables are looked up in the context of their definition (static
scoping), rather than in the calling context (which would be dynamic scoping).

Can you implement these rules as a new visitor?


We will have to change our original implementation in two ways. First, we must modify the
implementation of Function values, so that these values can carry the context where they were
defined:

class Function:

def __init__(self, formal, body, env):


self.formal = formal
self.body = body
self.env = env

def __str__(self):
return f"Fn({self.formal})"

Second, we need to modify two of the visit methods:

class VisitorEval:
...

def visit_fn(self, exp, env):


return Function(exp.formal, exp.body, env)

def visit_app(self, exp, env):


fval = exp.function.accept(self, env)
if not isinstance(fval, Function):
sys.exit("Type error")
pval = exp.actual.accept(self, env)
new_env = dict(fval.env)
new_env[fval.formal] = pval
return fval.body.accept(self, new_env)

With these interventions, we are addressing the two key changes required for implementing
static scoping:

346
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Function Values Carry Their Definition Context:


○​ By modifying the Function class to carry env, you ensure that each function
value now remembers the environment (context) where it was defined. This is
crucial because static scoping means that variables are resolved based on where
they were defined, not where they are called.
2.​ Function Application Uses the Saved Context:
○​ In the visit_fn, you're correctly saving the current environment (env) when
creating the function.
○​ In the visit_app, instead of using the current env, you're retrieving the
environment (fval.env) that was saved when the function was defined. This
ensures that the function body is evaluated in the context where the function was
defined, which is the essence of static scoping. You also extend this saved
environment with the new binding for the function's parameter (fval.formal).

These changes will correctly implement static scoping in your interpreter.

Can you show some code that demonstrates the creation of lexical
closures on your current implementation?
Sure! Consider the program below:

e0 = Fn('x', Fn('y', Add(Var('x'), Var('y'))))


e1 = Let('f', e0, App(Var('f'), Num(2)))
e2 = Let('g', e1, App(Var('g'), Num(1)))
ev = VisitorEval()
print(e2.accept(ev, {}))

This program is equivalent to let g = let f = fn x => fn y => x + y in f 2 end


in g 1 end. Thus, we are first creating a closure on the environment {x:2}, and then using this
closure to compute x + y on an environment {x:2, y:1}. The figure below shows how the lexical
closure is represented: a pointer to code and a pointer to a table with bindings. The figure
represents how closures are created and used in your language of expressions. Let's break
down the program and the visual representation:

347
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Program Analysis

The program is composed of three expressions, e0, e1, and e2:

1.​ e0 = Fn('x', Fn('y', Add(Var('x'), Var('y'))))


○​ This defines a function e0 with parameter x. The body of e0 is another function
that takes a parameter y and returns the sum of x and y.
○​ This function is a closure because it "closes over" the variable x. Even though x
is not in the scope of the inner function directly, it will be available to the inner
function when e0 is applied.
2.​ e1 = Let('f', e0, App(Var('f'), Num(2)))
○​ Here, e1 binds f to the function e0 and then applies f to the number 2.
○​ This will result in an instance of the inner function with x bound to 2.
3.​ e2 = Let('g', e1, App(Var('g'), Num(1)))

348
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ In e2, g is bound to the result of e1, and then g is applied to the number 1.
○​ This application will result in adding x = 2 (from the closure created by f) to y
= 1 (the argument to g).

Figure Explanation

The figure visualizes this sequence of expressions, focusing on how closures capture
environments:

1.​ Function and Closure Structure


○​ On the left side, you see a fn node with parameters y and body e, which
represents the inner function Fn('y', Add(Var('x'), Var('y'))) inside
e0.
○​ The closure on the right, with {x: 2}, represents the captured environment
when e0 is applied with x = 2. This closure allows the function to use the value
of x even though x is outside the immediate scope of y.
2.​ Dashed Lines
○​ The dashed arrows from app (application) nodes show how the variables link to
their environments.
○​ For instance, one dashed line points to {x: 2}, meaning that when f (bound to
e0) is applied to 2, a closure is created that binds x to 2.
3.​ Inner Function Body
○​ Within the red-shaded box, you see the inner add node with e0 and e1 as
children representing Var('x') and Var('y').
○​ This structure shows how the expression Add(Var('x'), Var('y')) is
evaluated. Here, x is resolved through the closure (it’s 2), and y is provided as
the argument to g (which becomes 1).

It seems that dynamic scoping is kind of bad. Why are there programming
languages that use it?
Dynamic scoping is less common today, but it was historically used in some languages because
of a few specific advantages:

1.​ Simplicity in Implementation:


○​ Dynamic scoping is easier to implement in interpreters because it doesn't require
carrying around a separate, saved environment for each function definition. The
interpreter simply uses the current environment at the time of the function call,
which can make the scoping rules simpler to reason about (from an interpreter
perspective).
2.​ Flexibility for Certain Use Cases:

349
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ In certain cases, dynamic scoping can offer a form of flexibility. For example, it
allows functions to automatically access variables defined outside their local
scope, even if they were not available when the function was originally written.
This can sometimes simplify certain kinds of programs, especially in small
scripting languages where the overhead of managing scopes explicitly is
undesirable.

However, this flexibility often turns into a source of confusion because it becomes hard to
understand which variables a function will refer to when it's called, leading to errors that are
hard to track down.

Is there any modern programming language that uses dynamic scoping?


Yes, Bash uses dynamic scoping for its variables. When you define a variable in a Bash script,
the scope is determined dynamically, which means any function that runs can access and
modify variables from the calling function or from the global environment. Here's an example to
illustrate:

#!/bin/bash

x=5

foo() {
echo $x
}

bar() {
local x=10
foo
}

bar

In this script:

●​ bar sets x to 10 (in its local scope).


●​ foo echoes x. Since Bash uses dynamic scoping, foo will print the value of x from
bar's scope, which is 10.

However, if you didn't declare x as local in bar, foo would print the global value of x, which is
5.

This dynamic nature of Bash variables is sometimes useful in shell scripting, but it can also lead
to subtle bugs when variables are inadvertently overwritten. That's why experienced Bash
programmers often explicitly use local variables to limit the scope where necessary.

350
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why Some Languages Still Use Dynamic Scoping

Languages like Bash (and historically Lisp) may use dynamic scoping for the following reasons:

1.​ Legacy: Older languages and scripting environments often started with dynamic scoping
simply because it was easier to implement and made sense for quick-and-dirty scripts.
Over time, even though static scoping became more common, some languages retained
dynamic scoping for backward compatibility.
2.​ Scripting Flexibility: Dynamic scoping is useful for small, simple scripts where it's
helpful to have easy access to the global state. For example, shell scripts frequently
manipulate environment variables dynamically, and dynamic scoping allows this to
happen with minimal boilerplate.
3.​ Niche Use Cases: Certain domain-specific languages or configurations may still favor
dynamic scoping when it fits the particular use case better (e.g., when it's important to
propagate context automatically).

In summary, while dynamic scoping is generally harder to reason about and can lead to
confusing bugs, it can be useful in limited contexts, especially in small scripting languages
where ease of use is prioritized over strict program structure.

Last thing: do you know who coined this term, "Closure"?


The term "closure" in the context of programming languages was coined by Peter Landin, a
British computer scientist known for his pioneering work in programming language theory. He
introduced the concept of closures in his seminal paper, The Mechanical Evaluation of
Expressions, published in 1964.

In that paper, Landin explored ways to model the evaluation of expressions using what he called
the "SECD machine" (an abstract machine he invented). Closures, in his formulation, were a
way to capture both a function and the environment in which it was defined, allowing functions
to be treated as first-class entities that could be passed around with their associated variables
intact.

Closures became a fundamental concept in functional programming and eventually in many


other programming languages, particularly as the ideas of lexical scope and functional
abstraction gained traction.

Who's this Landin guy? What else did he do for compilers and
programming languages?
Peter Landin was a British computer scientist and one of the foundational figures in
programming language theory, especially known for his work in functional programming and the
development of key theoretical concepts that are widely used today. He was born in 1930 and
became active in the field during the 1950s and 1960s, a formative period for computer science
as an academic discipline.

351
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Landin's contributions were groundbreaking, and he is perhaps best remembered for:

1.​ Introducing Functional Programming Concepts: Landin was among the first to apply
formal mathematical logic to programming languages, especially by advocating for the
use of lambda calculus as a model for computation. This laid a theoretical foundation for
functional programming languages.
2.​ The SECD Machine: In The Mechanical Evaluation of Expressions (1964), Landin
described the SECD machine, a virtual machine that could interpret lambda calculus
expressions. This work provided a model for implementing functional languages,
influencing many later language designs.
3.​ Coining the Term "Closure": As you noted, Landin introduced the term "closure" to
describe a function along with the lexical environment it was created in. This concept
became fundamental for understanding and implementing lexical scoping in
programming languages.
4.​ ISWIM: He proposed the hypothetical language ISWIM (If You See What I Mean) as a
model to demonstrate his ideas. ISWIM introduced several ideas that have since
become commonplace, like nested functions, lexical scoping, and higher-order functions.
5.​ Advocating for Higher-Order Functions: Landin’s work highlighted the power of
higher-order functions, functions that can take other functions as arguments or return
them as results, an idea essential to modern functional programming languages like
Lisp, Scheme, Haskell, and many parts of JavaScript.

Landin was an academic at Queen Mary College, London, and he influenced generations of
computer scientists, including John McCarthy (inventor of Lisp) and others who pioneered
functional programming. Despite his enormous impact, Landin himself was modest and was
known for his philosophical approach to computer science, often questioning the ethical and
societal implications of technology. His insights laid the groundwork for many later

352
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

developments, making him a key figure in shaping the languages and computational models we
use today.

353
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Recursive Functions
Code: https://github.com/pronesto/DCC053_Public/tree/main/12_RecFun

In our last interpreter with anonymous functions, could we define recursive


functions?
That implementation of the interpreter cannot handle recursive functions. The core reason for
this limitation lies in how the interpreter handles the environment when defining functions.

Why Recursion Doesn't Work in the Current Setup

To explain why, let's break down how function definitions and applications are evaluated:

1.​ Function Definition (visit_fn): When a function is defined (e.g., fn x => e), the
interpreter creates a Function object with the body e, the formal parameter x, and the
current environment at the time of the function's definition.

def visit_fn(self, exp, env):


return Function(exp.formal, exp.body, env)

This environment (env) is stored inside the Function object, and the function is returned to be
used later.
Function Application (visit_app): When the function is applied, the interpreter:

●​ Evaluates the function expression to get the Function object.


●​ Evaluates the argument.
●​ Creates a new environment where the formal parameter of the function is bound to the
actual argument.
●​ Then it evaluates the body of the function in this new environment.

def visit_app(self, exp, env):


fval = exp.function.accept(self, env)
if not isinstance(fval, Function):
sys.exit("Type error")
pval = exp.actual.accept(self, env)
new_env = dict(fval.env)
new_env[fval.formal] = pval
return fval.body.accept(self, new_env)

1.​ The problem: When defining a recursive function, the function needs to refer to itself by
name within its body (like f in f(x) = f(x-1) + 1). However, in the current
implementation, the function’s name is not added to the environment when the function

354
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

is first created. So, when you try to call the function recursively inside its body, the
function itself is not available in the environment.

Why This Fails for Recursive Functions

In a recursive function, you need the function to be accessible inside its own body. However,
since the function's environment is captured when it is defined, and at that point the function
itself is not yet in the environment, any recursive call will fail because the function is not bound
to its own name.

For example, consider the following function that attempts recursion:

let
f = fn x => if x == 0 then 0 else f(x - 1)
in
f(3)
end

When the interpreter evaluates this:

●​ It creates a function f that captures the current environment, but at the time f is defined,
f itself is not yet in the environment.
●​ When the function f tries to call itself inside its body (f(x - 1)), the interpreter will look
for f in the environment and won't find it, leading to an error.

How can we add recursive functions to our programming language?


There are different ways of doing that. Let's follow the SML/NJ way: we shall add a new
keyword to our language: fun. With this keyword, we can define recursive functions. Find below
some examples of recursive functions:

355
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why is it necessary to have a new keyword, fun, to implement recursive


functions? Why can't we simply use fn as before?

The reason we introduce a new keyword, such as fun, to handle recursive functions is primarily
due to how functions are defined and evaluated in relation to their environment. Let’s
break down why we need this distinction:

1. Handling Self-Reference in Recursive Functions

When you define a function with fn in languages with static scoping (like Python or most
functional languages), the function captures its environment (the context in which it was
defined). However, this environment does not automatically include a reference to the function
itself.

For recursion, the function needs to refer to itself by name in its body. With the fn keyword (as
we implemented it originally), the function is just an anonymous value — it doesn’t have its own
name in the environment where it's defined. Therefore, recursive calls would be impossible
because the function wouldn’t know its own name.

2. How fun Works: Recursively Adding the Function to Its Own


Environment

The fun keyword introduces a named function (e.g., fun f x => e), meaning that when we
define the function, we explicitly bind the function’s name (f) in the environment. This allows the
function to refer to itself within its body, enabling recursive calls.

●​ In the case of fun, the function f is immediately available in its own environment. This is
done through special handling in the interpreter or semantic rules, where f is bound to a
reference of the function itself inside the environment it captures. So, when the
function executes, it has access to its own name, allowing it to call itself recursively.

3. Why fn Alone Doesn’t Suffice

If we simply use fn (which creates anonymous functions without names), there’s no mechanism
to bind the function to a name in its own environment. The function doesn't know how to refer to
itself unless you pass it explicitly, which is not how most recursive functions are written.

●​ In contrast, with fun, the interpreter knows that the function must carry its own name
along with the body and the captured environment, ensuring that the function has access
to itself for recursion.

4. Why Introduce a Separate Keyword?

356
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

By introducing fun, you distinguish between anonymous functions (fn), which don’t need to
be recursive, and recursive named functions (fun), which need to carry a reference to their
own name. This distinction is necessary to avoid always binding the function to a name, which
might not be desired in the case of simple anonymous functions.

In Summary:

●​ fn: Defines an anonymous function that does not have a name in its own environment. It
cannot refer to itself unless explicitly passed as an argument.
●​ fun: Defines a named function where the function’s name is available in its own
environment, enabling recursion. The name binding is crucial for recursive functions,
which is why fun was introduced.

This distinction helps keep the language clear and allows the flexibility to define both
non-recursive and recursive functions effectively.

Can you show the evaluation rules of recursive functions?


With the fun keyword, we can use the following evaluation rules to implement recursion:

Let's break down these rules:

1. EFun - Declaration of Recursive Functions

This rule deals with the declaration of recursive functions using the fun keyword.

●​ Syntax: C ⊢ fun f x => e


●​ Semantic Action: This leads to the creation of a special function object called
rec_fn(f, x, e, C).
○​ f: the function's name.
○​ x: the formal parameter.

357
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ e: the function body.


○​ C: the environment at the point of function declaration.

The key here is that the rec_fn object is designed to carry its own name along with its
definition. This self-referencing is crucial for enabling recursion, as it allows the function to call
itself by name within its body.

2. ERec - Applying Recursive Functions

This rule specifies how a recursive function is applied.

●​ Initial Condition: C ⊢ e0 ⇒ rec_fn(f, x, e, C')


○​ This condition checks if the expression e0 evaluates to a recursive function
object rec_fn.
●​ Parameter Evaluation: C ⊢ e1 ⇒ v1
○​ Evaluates the argument e1 to the value v1.
●​ Environment Augmentation: C' + {x:v1} + {f:rec_fn(f, x, e, C')}
○​ A new environment is created from C' (the environment captured at the
function's definition).
○​ It adds a binding for the formal parameter x to the actual parameter value v1.
○​ Crucially, it also adds a binding for the function's name f to the function itself.
This is what enables the recursion, as calls to f inside the function body can now
refer back to the function.

How It Works

The combination of these rules allows functions defined with fun to refer to themselves by their
own names. When the function f is called within its body, the environment already includes a
reference to f, thus enabling recursive calls.

●​ When defining the function, rec_fn captures the function's own definition along with its
environment, which includes a reference to itself.
●​ When applying the function, the environment in which the function body is evaluated
includes a reference to the function itself, hence supporting recursion.

This approach, inspired by languages like Standard ML, effectively addresses the limitations of
the earlier non-recursive function definitions by explicitly incorporating the function's own
reference into its scope at the point of definition and during its application.

So, now we need two different ways to represent functions. How can we
implement that in our Python interpreter?

358
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

We could use two Python classes, like so:

class Fn(Expression):
"""
This class represents an anonymous function.

>>> f = Fn('v', Add(Var('v'), Var('v')))


>>> ev = VisitorEval()
>>> print(f.accept(ev, {}))
Fn(v)
"""

def __init__(self, formal, body):


self.formal = formal
self.body = body

def accept(self, visitor, arg):


return visitor.visit_fn(self, arg)

class Fun(Fn):
"""
This class represents a named function. Named functions can be invoked
recursively.

>>> f = Fun('f', 'v', Add(Var('v'), Var('v')))


>>> ev = VisitorEval()
>>> print(f.accept(ev, {}))
Fun f(v)
"""

def __init__(self, name, formal, body):


super().__init__(formal, body)
self.name = name

def accept(self, visitor, arg):


return visitor.visit_fun(self, arg)

Class Hierarchy for Functions (Fn and Fun)

●​ Anonymous Functions (Fn): Our Fn class represents anonymous functions. It captures


the formal parameter and the body of the function and uses the accept method to
follow the visitor pattern, allowing easy extension and modularity.
●​ Named Functions (Fun): Extending Fn with the Fun class is also a good idea. Named
functions need an extra name field to allow recursive calls, and by using inheritance, we
keep most of the functionality in Fn and only add the extra name behavior in Fun. This
avoids duplication of code and ensures a clean design.

Evaluation:

359
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Both classes are consistent with the design pattern you're using (the visitor pattern), and
they clearly distinguish between anonymous and named functions. Recursive function
handling with Fun becomes easier this way.

We now have four kinds of "values": Fn, Fun, Bln and Num. But what is
really a value?
In our language, based on the classes you've implemented (Fn, Fun, Num, Bln), a value is a
data structure that represents a final result that the interpreter will produce after evaluating an
expression. This means that any of these four classes (Fn, Fun, Num, Bln) can represent
values. So, every value is an expression; however, not every expression is a value.

Typical Examples of Values vs. Non-Values

Here's a breakdown of examples in your interpreter:

●​ Values:
○​ Num(5): A number, which is a value.
○​ Bln(True): A boolean value.
○​ Fn('x', Add(Var('x'), Num(2))): A function itself is a value.
○​ Fun('f', 'x', Add(Var('x'), Num(2))): A named function is also a
value.
●​ Non-Values (Expressions):
○​ Add(Num(3), Num(2)): An addition expression, which can be evaluated to
produce a value (Num(5)).
○​ Var('x'): A variable reference, which must be evaluated by looking it up in the
environment.
○​ App(Fn('x', Add(Var('x'), Num(2))), Num(5)): A function
application, which must be evaluated by applying the function to the argument.

How the Interpreter Works with Values and Expressions

In your interpreter:

●​ When you evaluate an expression, it will eventually reduce (or evaluate) to a value. For
example:
○​ Evaluating the expression Add(Num(2), Num(3)) results in the value Num(5).
○​ Evaluating the expression Fn('x', Add(Var('x'), Num(2))) results in the
function value Fn('x', Add(Var('x'), Num(2))).

The distinction is important because the evaluator (or interpreter) needs to distinguish
between an expression that needs further evaluation and a value that is already fully evaluated.

360
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

For instance, when you encounter a Var('x') expression, the interpreter knows it needs to
look up x in the environment to find its value, while encountering Num(5) means there is
nothing more to do — this is already a value.

Conclusion:

To summarize, in your interpreter, values are those things that do not require further evaluation
— like Fn, Fun, Num, and Bln. In contrast, expressions are things that may require further
evaluation, such as operations or function applications. This distinction ensures that your
interpreter knows when to stop evaluating and return a result.

Now that we have so many different values, type checking becomes even
more important. Can we define type checking rules for functions?
The figure below defines type checking rules for functions, assuming that the parameters are
annotated with user-given types:

In this figure, there are two rules labeled TFn and TApp. These are still type-checking rules for
functions in a statically typed language. Let’s go over each rule to understand them.

TFn Rule (Typing a Function Definition)

This rule helps us determine the type of a function definition. Let’s break down what it says:

1.​ Context Assumption for Parameters:


○​ In the rule, Σ (called the "typing context") represents all the information we know
about types in the current scope.
○​ When type-checking a function definition, we start by assuming that the function
parameter, say x, has a specific type T1. This is expressed as Σ, x : T1,
which means our context Σ now includes the fact that x has type T1.
2.​ Type of the Function Body:
○​ With the assumption that x has type T1, we check the type of the function body,
which is an expression e.

361
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If, under this assumption, the body e has type T2, then we can conclude that the
entire function has type T1 → T2.
3.​ Final Conclusion:
○​ The rule tells us that a function written as fn x => e has type T1 → T2,
meaning it takes an argument of type T1 and produces a result of type T2.

In simpler terms, the TFn rule is used to assign a function type to a function definition. If we
can show that the function body has type T2 when we assume the parameter has type T1, then
the whole function has type T1 → T2.

TApp Rule (Typing a Function Application)

The TApp rule describes how to type-check a function application, where a function is applied
to an argument.

1.​ Types of the Function and Argument:


○​ We have an expression e1 e2, which represents applying the function e1 to the
argument e2.
○​ For this to type-check, we need e1 to have a function type, specifically T1 → T2.
○​ We also need e2 to have the type T1, which matches the input type expected by
e1.
2.​ Conclusion of the Application:
○​ If e1 has type T1 → T2 and e2 has type T1, then the result of applying e1 to e2
(written as e1 e2) will have type T2.

So, the TApp rule is used to type-check function applications. It confirms that:

●​ The argument type matches the function's expected input type, and
●​ The resulting expression will have the function’s output type.

Putting It All Together

●​ TFn (Function Definition): This rule is used to assign a type to a function. We assume
the parameter has a certain type and verify that the body type-checks with this
assumption. If it does, we conclude that the function has a type from the parameter type
to the result type.
●​ TApp (Function Application): This rule is used to type-check the application of a
function to an argument. It ensures the argument’s type matches the function's input
type and that the result will have the function's output type.

362
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

These rules ensure that functions are used correctly in the language, with arguments and return
types that match as expected. They are essential for type safety, allowing the compiler to catch
errors where functions are used with the wrong types of arguments.

What about the TFun rule?

The TFun rule is designed to handle recursive functions, allowing a function to refer to itself
within its own body. This aspect distinguishes TFun from TFn, which is typically used for
non-recursive functions. Let’s go over how TFun supports recursion and why it's necessary.

How TFun Supports Recursive Functions

When defining a recursive function, we need to make sure that the function can refer to itself.
Without TFun, the function would not "know" its own type while we’re in the process of defining
it. The TFun rule allows us to solve this problem by adding the function itself to the typing
context.

Here’s how it works:

1.​ Context Assumption with the Function’s Name:


○​ In TFun, we expand the context Σ to include a type assumption for the function
itself, effectively allowing the function to call itself recursively.
○​ This means if we are defining a function named f with a type T1 → T2, we
assume within the body of f that f has type T1 → T2. This self-referential
assumption allows recursive calls to f to type-check correctly.
2.​ Typing the Body with Recursion:
○​ With this assumption in place, we can check that the body e of the function f is
valid.
○​ If e type-checks under the assumption that f has type T1 → T2 (and any other
variables in Σ), then we know that f can safely be called within its own body with
arguments of type T1.
3.​ Final Conclusion:
○​ If the body e type-checks to type T2, then the entire function definition f : T1 →
T2 is valid.

Example

Consider a recursive function that calculates the factorial of a number:

fun fact(n : int) : int = if n = 0 then 1 else n * fact(n - 1)

●​ In this case, the TFun rule allows fact to refer to itself within its own body.

363
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The rule lets us assume fact : int → int in the typing context while we check the
body of fact. This assumption is essential to type-check the recursive call fact(n -
1).

Summary

●​ TFun is necessary for recursive functions because it allows a function to assume its
own type in its body.
●​ This rule allows recursive calls within the function body to type-check correctly, ensuring
that recursive functions can be defined safely and consistently.
●​ Without TFun, recursive functions wouldn’t be possible in this type system, as they
would have no way of referring to themselves within their own definition.

We will need to represent Fun and Fn values in our interpreter (the


VisitorEval). How can we do it?
We can use, again, two Python classes. Something like:

class Function:
"""
This is the class that represents functions. This class lets us distinguish
the three types that now exist in the language: numbers, booleans and
functions. Notice that the evaluation of an expression can now be a
function. For instance:

>>> f = Function('v', Add(Var('v'), Var('v')), {})


>>> print(str(f))
Fn(v)
"""

def __init__(self, formal, body, env):


self.formal = formal
self.body = body
self.env = env

def __str__(self):
return f"Fn({self.formal})"

class RecFunction(Function):
"""
This is the class that represents named functions. The key difference between
named and anonymous functions are exactly the "name" :)

>>> f = RecFunction('f', 'v', Add(Var('v'), Var('v')), {})


>>> print(str(f))
Fun f(v)
"""

def __init__(self, name, formal, body, env):

364
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

super().__init__(formal, body, env)


self.name = name

def __str__(self):
return f"Fun {self.name}({self.formal})"

Why do we need to pass the environment (env) to these objects?


The key reason you need to store the environment in your Function and RecFunction
classes is to correctly implement static scoping (also known as lexical scoping). In static
scoping, when a function is created, it "captures" the environment where it was defined, and this
environment will be used later when the function is called.

Why Store the Environment?

1.​ Capturing the Lexical Context:


○​ When a function is defined, it exists within a specific environment where variables
are bound to certain values. This environment represents the lexical context or
the surrounding scope in which the function was created.
○​ If a function refers to variables from its enclosing scope, it must remember the
bindings for those variables at the time of its definition, even if the function is later
called in a different environment.
2.​ Static (Lexical) Scoping:
○​ Static scoping means that when you evaluate a variable inside a function, you
look for the value of that variable in the environment where the function was
defined, not in the environment where the function is called.
○​ By storing the environment in the function object, you're preserving the context in
which the function was defined so that you can correctly resolve variable bindings
when the function is executed later.
3.​ Avoiding Variable Shadowing:
○​ Consider a scenario where a variable is defined both inside the function and
outside it. Storing the environment ensures that the function always refers to the
correct bindings, avoiding potential issues where variables in different scopes
might "shadow" each other.

Example

Let’s clarify this with an example:

outer_env = {'x': 5}

# Define a function that captures this environment


f = Function('y', Add(Var('x'), Var('y')), outer_env)

# Later, we call f in a different environment


call_env = {'x': 10, 'y': 2}

365
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Without storing the original environment (outer_env), you might end up looking for x in the
call_env, leading to a result based on x = 10. But, since the function f was defined in an
environment where x = 5, the correct behavior under static scoping is to use the x from the
definition environment (outer_env).

When you store the environment in the function object, it ensures that when f is evaluated, it
looks up x in the stored environment, not the current environment where the function is called.

Handling Named and Anonymous Functions

●​ Anonymous Functions (Fn): For anonymous functions, storing the environment allows
the function to capture and reference variables in its lexical context.
●​ Recursive Functions (Fun): In the case of recursive functions, the function can
reference itself by name inside its body. Storing the environment is critical to ensure the
recursive call happens in the correct context, even if the function is passed around and
invoked in different scopes.

Example with Recursion:

Consider this recursive function:

outer_env = {}

# Define a recursive function in the outer environment


rec_f = RecFunction('f', 'n', If(Eq(Var('n'), Num(0)), Num(1), Mul(Var('n'),
App(Var('f'), Sub(Var('n'), Num(1))))), outer_env)

# Later, we call rec_f


call_env = {'n': 5}

●​ When the function rec_f calls itself recursively (App(Var('f'), Sub(Var('n'),


Num(1)))), it needs to find the binding for f. Thanks to static scoping and storing the
original environment, the function will correctly resolve f to itself, even if it is passed
around or called in a different scope.

Summary

Storing the environment in Function and RecFunction is essential for:

●​ Lexical Scoping: Ensuring that variables are resolved based on the environment where
the function was defined.
●​ Handling Closures: A function can close over variables from its definition environment
and carry them along even when called elsewhere.
●​ Enabling Recursion: Recursive functions must reference themselves, and storing the
environment helps ensure that recursive calls work correctly.

366
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

By storing the environment, you ensure that the interpreter behaves consistently with static
scoping rules, where a function’s behavior depends on the lexical context in which it was
defined, not the context in which it is called.

Let's then implement our evaluation rules in our VisitorEval class. Can
you do it?
Sure! Find below an implementation:

class VisitorEval:
"""
Evaluates an expression with recursive functions.
"""

def visit_var(self, var, env):


if var.identifier in env:
return env[var.identifier]
else:
sys.exit(f"Variavel inexistente {var.identifier}")

def visit_num(self, num, env):


return num.num

def visit_bln(self, exp, env):


return exp.bln

def visit_add(self, add, env):


return add.left.accept(self, env) + add.right.accept(self, env)

def visit_and(self, exp, env):


val_left = exp.left.accept(self, env)
if val_left:
return exp.right.accept(self, env)
return False

def visit_lth(self, exp, env):


val_left = exp.left.accept(self, env)
val_right = exp.right.accept(self, env)
return val_left < val_right

def visit_ifThenElse(self, exp, env):


cond = exp.cond.accept(self, env)
if cond:
return exp.e0.accept(self, env)
else:
return exp.e1.accept(self, env)

def visit_let(self, let, env):


e0_val = let.exp_def.accept(self, env)
new_env = dict(env)
new_env[let.identifier] = e0_val
return let.exp_body.accept(self, new_env)

367
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def visit_fn(self, exp, env):


return Function(exp.formal, exp.body, env)

def visit_fun(self, exp, env):


return RecFunction(exp.name, exp.formal, exp.body, env)

def visit_app(self, exp, env):


"""
The application of function to actual parameter must contain two cases:
1. An anonymous function is applied: (fn x => x + 1) 2
2. A named function is applied: f 2, where f is fun f a = a + a
The only difference between these two cases is that in the second we
must augment the environment with the name of the named function.

Example:
>>> f = Fun('f', 'v', Add(Var('v'), Var('v')))
>>> e0 = Let('f', f, App(Var('f'), Num(2)))
>>> ev = VisitorEval()
>>> e0.accept(ev, {})
4
"""
fval = exp.function.accept(self, env)
if isinstance(fval, Function):
pval = exp.actual.accept(self, env)
new_env = dict(fval.env)
new_env[fval.formal] = pval
if isinstance(fval, RecFunction):
new_env[fval.name] = fval
return fval.body.accept(self, new_env)
else:
sys.exit("Type error")

Here's an explanation for each of the visit_fn, visit_fun, and visit_app methods, which
together handle the evaluation of anonymous functions (fn), recursive functions (fun), and
function application in the interpreter:

1. visit_fn(self, exp, env):

This method evaluates an anonymous function expression. When we encounter a fn


expression (i.e., a function without a name), we need to create a Function object that
encapsulates:

●​ The formal parameter (exp.formal).


●​ The function body (exp.body).
●​ The current environment (env) where the function was defined.

Storing the environment in the function is crucial because it enables lexical scoping. If the
function refers to any variables outside of its body, it will be able to resolve them using the
environment in which it was created.

368
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key points:

●​ Input: An anonymous function (fn x => x + 1).


●​ Output: A Function object, which stores the function's formal parameter, body, and the
environment.

Example:

fn x => x + 1

In this case, x is the formal parameter, x + 1 is the body, and the function is closed over the
current environment.

2. visit_fun(self, exp, env):

This method evaluates a recursive function expression. In contrast to visit_fn, this method
handles fun expressions, which are named functions that can call themselves recursively.

When evaluating a fun expression, we create a RecFunction object, which extends the
Function class and additionally stores the name of the function (exp.name). Like visit_fn,
this method also captures the current environment to enable lexical scoping. Storing the
function's name is essential for supporting recursion: within the function body, recursive calls
refer to this name.

Key points:

●​ Input: A named recursive function (fun f x => x + f(x - 1)).


●​ Output: A RecFunction object, which stores the function's name, formal parameter,
body, and environment.

Example:

fun f x => if x == 0 then 1 else x * f(x - 1)

Here, f is the name, x is the formal parameter, and the body is a recursive factorial
computation.

3. visit_app(self, exp, env):

This method handles function application—that is, applying a function to an actual parameter.
It covers two cases:

1.​ Applying an anonymous function.


2.​ Applying a named (recursive) function.

369
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The general process is the same for both cases, but with one additional step for recursive
functions.

●​ First, we evaluate the function expression (exp.function.accept(self, env)) to


get its value (fval). This value will be either a Function or RecFunction object,
depending on whether the function is anonymous or recursive.
●​ Then, we evaluate the actual argument (exp.actual.accept(self, env)) to get its
value (pval).
●​ We create a new environment (new_env) based on the environment where the
function was defined (fval.env). In this new environment, we bind the formal
parameter of the function to the actual parameter (new_env[fval.formal] = pval).

If the function is a recursive function (i.e., a RecFunction), we must also bind the function’s
name to the function itself in the new environment (new_env[fval.name] = fval). This step
allows the function to call itself recursively.

Finally, we evaluate the function body in the newly constructed environment


(fval.body.accept(self, new_env)).

If the function value (fval) is not a Function or RecFunction, we raise a type error.

Key points:

●​ Input: A function application (e.g., f(2)).


●​ Output: The result of applying the function to the actual parameter.

Example 1: Anonymous function application

(fn x => x + 1) 2

1.​ We evaluate the anonymous function (fn x => x + 1), which gives us a Function
object.
2.​ We evaluate the actual parameter 2.
3.​ We create a new environment where x is bound to 2, and then evaluate the function
body x + 1, which results in 3.

Example 2: Recursive function application

fun f x => if x == 0 then 1 else x * f(x - 1)


f(2)

1.​ We evaluate the recursive function (fun f x => ...), which gives us a
RecFunction object.

370
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ We evaluate the actual parameter 2.


3.​ We create a new environment where x is bound to 2 and f is bound to the recursive
function itself.
4.​ We evaluate the function body, which will involve a recursive call to f(x - 1).

Summary:

●​ The visit_fn method evaluates an anonymous function, creating a Function object


that stores the formal parameter, function body, and the environment in which the
function was defined. This allows the function to access variables in its defining scope
(lexical scoping).
●​ The visit_fun method evaluates a recursive function, creating a RecFunction
object, which is similar to Function but also includes the function's name. This allows
the function to call itself recursively, with the current environment captured for static
scoping.
●​ The visit_app method handles function application, where a function is applied to an
actual parameter. It constructs a new environment where the function's formal parameter
is bound to the actual parameter. For recursive functions, the environment is augmented
with a binding from the function's name to the function itself, enabling recursion. The
function body is then evaluated in this new environment.

This approach ensures that the interpreter correctly implements static scoping and supports
both anonymous and recursive functions, maintaining the proper scope when functions are
called.

So, can we still create closures with fun declarations?


Sure you can! Consider this program below, for instance:

App(App(Let("g", Fun("f", "x", Fn("y", Add(Var("y"), Var("x")))),


Var("g")), Num(3)), Num(4))

It simulates the program (fun f x = fn y => y + x) 3 4, which, by the way, is not valid SML/NJ
syntax. You will create a closure (fn y => y + x, {x: 3}). The figure below shows this closure.

371
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's write some cool examples, to see if we can evaluate them. Can you
translate the following function to an Expression tree?

let
fun range n0 = fn n1 =>
if n0 < n1
then n0 + range (n0 + 1) n1
else 0
in
range 2 7
end

Here's how we can translate the given program into a sequence of Expression objects using
the existing expression types: Var, Num, Bln, And, Add, Lth, Let, IfThenElse, Fn, Fun, and
App. I'll represent this in Python-like pseudocode based on your class structure.

372
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Breakdown of the program:

The program defines a recursive function range, which takes two parameters n0 and n1. The
function checks if n0 is less than n1, and if so, it adds n0 to a recursive call to range with n0 +
1 and n1. Otherwise, it returns 0. The program then calls range 2 7.

The equivalent expression tree for the program can be built step by step.

2. Expression Tree:
def create_arithmetic_sum(init_value, end_value):
"""
Example:
>>> program = create_arithmetic_sum(2, 7)
>>> v = VisitorEval()
>>> program.accept(v, {})
20

>>> program = create_arithmetic_sum(1, 8)


>>> v = VisitorEval()
>>> program.accept(v, {})
28
"""
# The inner body of the function:
# if n0 < n1 then n0 + range (n0 + 1) n1 else 0
body = IfThenElse(
Lth(Var('n0'), Var('n1')), # Condition: n0 < n1
Add(Var('n0'), App( # Then: n0 + range (n0 + 1) n1
App(Var('range'), Add(Var('n0'), Num(1))), # range (n0 + 1)
Var('n1')
)),
Num(0) # Else: 0
)

# Anonymous function: fn n1 => ...


fn_n1 = Fn('n1', body)

# Recursive function: fun range n0 = fn n1 => ...


fun_range = Fun('range', 'n0', fn_n1)

# Application: range init_value end_value


app_range = App(
App(Var('range'), Num(init_value)), # range init_value
Num(end_value) # range init_value end_value
)

# Whole program:
return Let('range', fun_range, app_range)

3. Explanation of the Translation:

373
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Body of the recursive function:


○​ We first create the body of the function, which is an IfThenElse expression. It
checks whether n0 is less than n1 (Lth(Var('n0'), Var('n1'))).
○​ If n0 < n1 is true, we compute the sum of n0 and the result of recursively
calling range (n0 + 1) n1. This is represented by Add(Var('n0'),
App(App(Var('range'), Add(Var('n0'), Num(1))), Var('n1'))).
○​ If n0 < n1 is false, the function returns 0 (Num(0)).
●​ Anonymous function:
○​ The function body created above is encapsulated in an anonymous function that
takes the parameter n1: Fn('n1', body).
●​ Recursive function:
○​ The range function itself is a recursive function (Fun('range', 'n0',
fn_n1)), where n0 is the first parameter, and it returns an anonymous function
(fn n1 => ...) that takes the second parameter n1.
●​ Application of range 2 7:
○​ This represents applying the range function to 2 and 7:
App(App(Var('range'), Num(2)), Num(7)).
●​ Let expression:
○​ The final program is encapsulated in a Let expression, where range is bound to
the recursive function, and we apply range 2 7 within the in block:
Let('range', fun_range, app_range).

Can you write a new function for the following program? Notice that loop
receives another function; hence, it is a higher-order function.
let
fun loop n = fn f => fn a =>
if n = 1
then a
else loop (n-1) f (f a)
in
loop 10 (fn x => x + 1) 2
end

Here's the translation of the loop function program into Expression objects.

def create_loop(num_iterations):
"""
Example:
>>> program = create_loop(10)
>>> v = VisitorEval()
>>> program.accept(v, {})
11

374
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

"""
# The body of the recursive function:
# if n = 1 then a else loop (n - 1) f (f a)
body = IfThenElse(
Lth(Var('n'), Num(2)), # Condition: n < 2)
Var('a'), # Then: a
App( # Else: loop (n-1) f (f a)
App(App(Var('loop'), Add(Var('n'), Num(-1))), # loop (n - 1)
Var('f')),
App(Var('f'), Var('a')) # f a
)
)

# Anonymous function fn a => ...


fn_a = Fn('a', body)

# Anonymous function fn f => fn a => ...


fn_f = Fn('f', fn_a)

# Recursive function fun loop n = fn f => fn a => ...


fun_loop = Fun('loop', 'n', fn_f)

# Application: loop num_iterations (fn x => x + 1) 2


app_loop = App(
App(App(Var('loop'), Num(num_iterations)), # loop num_iterations
Fn('x', Add(Var('x'), Num(1)))), # (fn x => x + 1)
Num(2) # 2
)

# Whole program:
return Let('loop', fun_loop, app_loop)

1. Breakdown of the program:

The function loop is a recursive function that takes three parameters: n, f, and a. If n = 1, it
returns a. Otherwise, it recursively calls loop (n-1) f (f a).

2. General Highlights:

●​ Body of loop: The body is an IfThenElse checking if n = 1. If true, it returns a; if


false, it recursively applies loop to n-1, f, and f a.
●​ Multiple anonymous functions (fn f and fn a): These are nested inside each other
as Fn objects.
●​ Application of loop: The final program applies loop num_iterations to the
function fn x => x + 1 and the value 2.

375
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Could you modify the loop example so that it works as a for loop?
We can create a customizable loop structure using higher-order functions and recursion. This
allows you to not only iterate over a range of values but also to apply a custom function (the
"body" of the loop) at each step of the iteration. Here's the implementation:

def create_for_loop(begin, end, function):


"""
This function parameterizes our loops with the begin and end of the
iterations, plus a high order function that simulates the body of the loop.

Example:
>>> program = create_for_loop(2, 10, Fn('x', Add(Var('x'), Num(1))))
>>> v = VisitorEval()
>>> program.accept(v, {})
7

>>> double = Fn('x', Add(Var('x'), Var('x')))


>>> program = create_for_loop(2, 10, double)
>>> v = VisitorEval()
>>> program.accept(v, {})
16
"""
# The body of the recursive function:
# if n = 1 then a else loop (n - 1) f (f a)
body = IfThenElse(
Lth(Var('n'), Var('a')), # Condition: n < a)
Var('a'), # Then: a
App( # Else: loop (n-1) f (f a)
App(App(Var('loop'), Add(Var('n'), Num(-1))), # loop (n - 1)
Var('f')),
App(Var('f'), Var('a')) # f a
)
)

# Anonymous function fn a => ...


fn_a = Fn('a', body)

# Anonymous function fn f => fn a => ...


fn_f = Fn('f', fn_a)

# Recursive function fun loop n = fn f => fn a => ...


fun_loop = Fun('loop', 'n', fn_f)

# Application: loop num_iterations (fn x => x + 1) 2


app_loop = App(
App(App(Var('loop'), Num(end)), function),
Num(begin)
)

# Whole program:
return Let('loop', fun_loop, app_loop)

376
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Highlights of the Design:

1.​ Higher-Order Function as Loop Body:


○​ The function argument acts as the "body" of the loop, and it can be any
function that operates on the current value (like adding, doubling, or even more
complex operations).
○​ This makes your loop adaptable to many scenarios, allowing users to define
custom operations for each iteration.
2.​ Recursive Structure:
○​ The loop is implemented as a recursive function (Fun) named loop. It
decrements the loop variable (n) at each step, applies the given function (f) to
the current value (a), and passes it to the next iteration.
○​ The base case stops recursion when n becomes less than 2.
3.​ Customizable Start and End Points:
○​ The loop starts at begin and continues until it reaches end. These values are
parameterized, allowing for great flexibility in how the loop behaves.
4.​ Examples of Usage:
○​ Example 1: When you pass Fn('x', Add(Var('x'), Num(1))), the loop
behaves like a classic incrementing loop, adding 1 at each step.
○​ Example 2: When you pass a "doubling" function (Fn('x', Add(Var('x'),
Var('x')))) to the loop, it will double the value on each iteration, creating an
exponential growth pattern.

How It Works:

●​ Recursive Loop (Fun) Construction: This structure mirrors the logic of a for loop by
having the recursive function (loop) handle the increment or update of the loop variable
(n), and apply the given function to the accumulator (a).
●​ End of Loop Condition (IfThenElse): The condition checks whether the loop has
reached its end (when n < 2). If so, it returns the current value (a). Otherwise, it
continues the recursion.
●​ Loop Application: After defining the recursive function loop, the actual application
applies it with the start value (begin) and the user-defined function.

Our for loop is simulating the usage of higher-order functions. What is this
concept?
Higher-order functions (HOFs) are functions that can take other functions as arguments
and/or return functions as their result. This concept allows for flexible and reusable code,
making it possible to create generic patterns like your for loop, where the "body" of the loop is
passed in as a function.

377
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Aspects of Higher-Order Functions:

1.​ Functions as First-Class Citizens:


○​ In programming languages that support higher-order functions, functions are
first-class citizens. This means they can be treated like any other data type:
assigned to variables, passed as arguments, and returned from other functions.
2.​ Parameterizing Behavior:
○​ By accepting functions as arguments, higher-order functions allow you to
parameterize behavior. For example, instead of writing a specific loop that
always adds numbers, you can pass in a function that describes what operation
should be performed during each iteration (e.g., add, multiply, print).
3.​ Returning Functions:
○​ Higher-order functions can also return functions as their result, enabling the
creation of function factories or closures. These functions can capture and
maintain their own state, which is useful in many programming patterns, like
memoization or currying.

The for Loop as a Higher-Order Function:

In our implementation, the create_for_loop function is parameterized by:

●​ The starting and ending values (begin and end).


●​ A function (function) that represents the "body" of the loop.

This design allows you to reuse the create_for_loop function with different behaviors,
simply by passing in a different higher-order function. For example:

●​ To simulate a loop that increments a value, you pass a function that adds 1.
●​ To simulate a loop that doubles a value, you pass a function that multiplies the input by
2.

By separating the loop logic (iteration) from the operation (body of the loop), you've effectively
generalized the loop structure using higher-order functions.

Can you give me some examples of higher-order functions in typical


functional programs?
Here goes some examples of well-known higher order functions:

map: Applies a given function to each item in a list (or any iterable) and returns a new list of
results.
def square(x):
return x * x

378
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

numbers = [1, 2, 3, 4]
result = map(square, numbers) # Applies `square` to each element.

filter: Takes a function that returns a boolean and filters a list based on that function.

def is_even(x):
return x % 2 == 0

# Filters the list to keep only even numbers:


evens = filter(is_even, numbers)

reduce: Applies a binary function (a function that takes two arguments) cumulatively to the
items of a list, reducing it to a single value.
from functools import reduce

def add(x, y):


return x + y

from functools import reduce


sum_total = reduce(add, numbers) # Sums up the list.

Why are higher-order functions useful?


Higher-order functions allow you to create flexible and reusable code by passing and returning
functions. In your for loop example, you've implemented a powerful abstraction where the
operation performed during each loop iteration is defined by a function passed as an argument.
This concept is central to functional programming and provides great flexibility when writing
adaptable and concise code. Here's a summary of the benefits they bring in:
●​ Abstraction and Reusability: HOFs allow you to abstract common patterns (like
looping) and reuse them across different contexts by simply changing the function
passed in.
●​ Functional Composition: You can build complex functionality by composing simple
functions. For example, you could combine multiple HOFs like map, filter, and
reduce to create complex data transformations in a concise and readable way.
●​ Flexibility: Higher-order functions provide flexibility in designing algorithms. Instead of
hard-coding specific behavior, you allow the caller to define the exact behavior by
passing in different functions.

Do you know what was the first programming language ever to become
popular that featured recursive functions?
Lisp probably was the first programming language to ever become popular while featuring
recursive functions. Lisp, developed by John McCarthy in 1958, was one of the earliest
languages explicitly designed to support recursion as a fundamental feature. This was

379
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

groundbreaking at the time, as recursion was not only syntactically supported but also naturally
aligned with Lisp’s structure, allowing functions to call themselves and process complex
recursive data structures like lists and trees.

Recursive functions in Lisp were integral to its functional programming paradigm, distinguishing
it from other early languages like Fortran, which did not initially support recursion. Lisp’s
popularity in artificial intelligence research further helped establish recursion as a powerful
programming technique, influencing the design of many later programming languages.

The Algol family introduced recursion a bit later. Algol 58, the initial version, did not support
recursion. However, Algol 60 (published in 1960) explicitly added recursion as a supported
feature, influenced in part by the concepts emerging in Lisp. Algol 60 introduced block structure
and a formalized way to handle recursive functions, which influenced many later languages like
Pascal, C, and beyond.

To sum up, Lisp was the first widely used language to feature recursion, while Algol 60 was the
first major procedural language to include recursion formally, spreading the concept to a broader
programming audience.

So, in many ways, John McCarthy, the creator of Lisp, is one of the
forefathers of recursion in programming languages, right?
Yes, John McCarthy was indeed a pioneering figure in computer science, especially known for
his foundational work in artificial intelligence (AI) and programming languages.

380
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Here’s an overview of his key contributions:

1.​ Artificial Intelligence (AI) – McCarthy is often credited with coining the term artificial
intelligence in 1956, during the famous Dartmouth Conference, which he organized
alongside Marvin Minsky, Claude Shannon, and Nathaniel Rochester. This conference is
generally regarded as the birthplace of AI as an academic field, setting the stage for
decades of research and progress.
2.​ Lisp Programming Language – In 1958, McCarthy invented Lisp (short for LISt
Processing), which became the second oldest high-level programming language after
Fortran. Lisp introduced key concepts like recursion, symbolic computation, and
automatic memory management (garbage collection), which have since become
essential in many programming languages. Lisp’s design made it ideal for AI research,
as its syntax and structure supported symbolic manipulation and reasoning well-suited to
AI tasks.
3.​ Concept of Time-Sharing – McCarthy was instrumental in developing the concept of
time-sharing, which allowed multiple users to interact with a computer simultaneously.
This was revolutionary at the time, as early computers typically ran one job at a time.
Time-sharing paved the way for modern operating systems and laid the groundwork for
personal computing and interactive software.
4.​ Work on Formal Logic and the Theory of Computation – McCarthy made important
contributions to formal logic and the theory of computation, including his work on
formalizing common-sense reasoning in AI. He developed influential theories like the
situation calculus, a mathematical approach to representing dynamic systems in logic.
His work on formalizing knowledge and reasoning processes helped shape early AI
logic-based systems.
5.​ Advice Taker and Early AI Theories – McCarthy proposed the concept of the Advice
Taker, an early framework for reasoning programs that could process human advice or
commands to accomplish tasks. Though it wasn’t implemented at the time, the Advice
Taker introduced concepts related to knowledge representation, which are core to
modern AI systems.
6.​ Advocacy for Open Research and AI Ethics – McCarthy was ahead of his time in
promoting open research and discussing AI’s societal impact. He believed in the
collaborative nature of research and shared his ideas widely, encouraging the
development of a global AI community. He also thought deeply about AI’s potential
ethical implications and was one of the early voices advocating for responsible AI.

Overall, McCarthy's work laid much of the foundation for AI and influenced a broad array of
fields, including programming languages, computational logic, and human-computer interaction.
He received numerous accolades, including the Turing Award in 1971 for his contributions to the
field.

381
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Introduction to Code Generation


Code: https://github.com/pronesto/DCC053_Public/tree/main/14_CodeGeneration

Which kind of code actually runs on the CPU of your computer?


My computer runs "Machine Code". Machine code is the lowest-level representation of a
program that a computer's CPU can directly execute. It consists of binary instructions made up
of sequences of 0s and 1s. These instructions tell the CPU to perform fundamental operations,
such as arithmetic computations, data movement, and branching (making decisions in the
program flow). Each machine code is architecture-specific, meaning that the binary instructions
vary depending on the type of processor (x86, ARM, RISC-V, etc.).

Types of Machine Code

There are several types of machine codes, each tailored to a specific CPU architecture. Here
are a few examples:

1.​ x86 Machine Code (CISC Architecture)


○​ Found on: Intel and AMD processors.
○​ Format: Variable-length instructions (some instructions are one byte, while
others can be several bytes long).
○​ Example Instructions:
■​ 0x89 D8 → This binary sequence translates to the assembly instruction
mov eax, ebx (move the value in the ebx register to the eax register).
○​ Where it’s found: Desktop and server environments (Windows, Linux on
Intel/AMD processors).
2.​ ARM Machine Code (RISC Architecture)
○​ Found on: Mobile devices, embedded systems, and increasingly on servers
(Apple’s M1 chip, many Android phones).
○​ Format: Fixed-length instructions (usually 4 bytes per instruction).
○​ Example Instructions:
■​ 0xE3A00001 → This binary sequence translates to the assembly
instruction mov r0, #1 (move the value 1 into register r0).
○​ Where it’s found: Smartphones, tablets, embedded systems, and some laptops
(MacBooks with Apple Silicon).
3.​ RISC-V Machine Code (RISC Architecture)
○​ Found on: Academic environments, embedded systems, and open-source
projects.
○​ Format: Fixed-length instructions, similar to ARM (typically 4 bytes per
instruction).
○​ Example Instructions:
■​ 0x00000013 → This translates to addi x0, x0, 0 (add immediate
value 0 to register x0 and store the result in x0, effectively a no-op).

382
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Where it’s found: Embedded systems, research projects, and open-source


hardware.
4.​ MIPS Machine Code (RISC Architecture)
○​ Found on: Early embedded systems, some game consoles (PlayStation,
routers).
○​ Format: Fixed-length instructions (4 bytes per instruction).
○​ Example Instructions:
■​ 0x20100001 → This binary sequence translates to addi $t0, $zero,
1 (add immediate value 1 to register $zero and store it in register $t0).
○​ Where it’s found: Routers, older embedded systems, and some video game
consoles.

Where Machine Code Can Be Found

●​ In Compiled Binaries:
○​ Programs compiled from high-level languages (C, C++, Rust) into executables.
For example, on Linux, compiled binaries like /bin/ls contain machine code
that the processor directly executes.
●​ Firmware:
○​ The embedded software running on hardware devices like network routers, IoT
devices, and microcontrollers. Firmware is machine code that controls the
device's low-level operation.
●​ Operating System Kernels:
○​ Kernels like Linux or Windows contain large amounts of machine code designed
for specific CPU architectures (e.g., x86, ARM).
●​ In BIOS/UEFI:
○​ The firmware used to initialize hardware during the boot process of a computer is
also written in machine code tailored to specific processors.

Key Takeaways

●​ Machine code is the most fundamental layer of a program, made up of binary


instructions executed directly by the CPU.
●​ Different processor architectures (like x86, ARM, RISC-V) have distinct sets of machine
codes.
●​ Machine code is found in all compiled executables, firmware, and other low-level
software components that interact directly with hardware.

There are low-level representations that are not really machine code, but
that we can still interpret, right?
Not all low-level code is machine code. There are other forms of low-level code, such as
bytecode and intermediate representations (IR), which are not directly executed by the

383
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

hardware. Instead, these forms are interpreted or further compiled before execution. Let's dive
into the differences.

Examples of Low-Level Representations That Are Not Machine Code

1.​ Bytecode
○​ What it is: Bytecode is a compact, intermediate form of code that is typically
produced by compilers for virtual machines (VMs). It’s not tied to a specific
hardware architecture but is designed to be interpreted by a virtual machine or
further compiled at runtime (JIT compilation).
○​ Key Feature: Bytecode is portable across different hardware platforms since it's
not directly tied to any specific CPU. The virtual machine (e.g., Java Virtual
Machine) abstracts away the hardware specifics.
○​ Example: Java Bytecode
■​ When you compile a Java program, it doesn't produce machine code right
away. Instead, it generates Java bytecode, which is executed by the
Java Virtual Machine (JVM).
■​ Example Bytecode Instruction: 0x3C → istore_1 (store an integer into
local variable 1).
■​ The JVM interprets or compiles this bytecode to machine code at runtime.
○​ Where it’s found:
■​ Java programs: .class files contain bytecode.
■​ Android apps: Dalvik/ART (Android Runtime) uses a different form of
bytecode to run applications on smartphones.
○​ How it’s executed: The JVM reads the bytecode and either interprets it or
compiles it just-in-time (JIT) to machine code suitable for the host CPU.
2.​ Intermediate Representation (IR)
○​ What it is: IR is a more abstract, platform-independent representation of a
program used during the compilation process. It acts as a bridge between
high-level languages and machine code. IR is not tied to any specific hardware,
and like bytecode, it's not directly executable by the CPU.
○​ Key Feature: Designed to be transformed and optimized by compilers before
being translated into machine code for the target architecture.
○​ Example: LLVM Intermediate Representation (LLVM IR)
■​ The LLVM compiler uses an intermediate representation (LLVM IR),
which is a lower-level, platform-independent representation of the
program.
■​ Example IR Instruction:
■​ %add = add i32 %a, %b → This is an LLVM IR instruction that
adds two 32-bit integers (%a and %b).
■​ Where it’s found: LLVM IR is commonly used in compilers such as
Clang, which translates C/C++ programs into LLVM IR before converting
them into machine code.
○​ How it’s used:

384
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ The LLVM toolchain takes the IR and performs various optimizations on


it before passing it through a backend (e.g., llc) to generate machine
code for a specific target (x86, ARM, etc.).

If bytecodes and IR are not machine code, can they still be executed?
Yes, it's still possible to execute programs in these formats, via interpretation.
Bytecode Execution:

●​ Bytecode is interpreted by a virtual machine (VM) such as the JVM or Dalvik VM.
●​ In some cases, bytecode can be Just-In-Time (JIT) compiled into machine code at
runtime. This is common in Java's HotSpot JVM, which converts frequently executed
bytecode into highly optimized machine code.

IR Execution:

●​ Intermediate Representations like LLVM IR are further processed by a compiler


backend.
●​ The LLVM toolchain uses tools like llc to convert the IR into machine code for the
target architecture.
●​ IR is not meant to be executed directly; it serves as an intermediate step for
optimizations and code transformations during compilation. Nevertheless, LLVM still
provides an interpreter for its IR: the lli tool.

What are the differences between machine codes and these other low-level
representations?
There are a few differences between these different forms of low-level code:

●​ Machine code: The binary instructions that a CPU can execute directly, tied to a specific
architecture (x86, ARM, RISC-V).
●​ Bytecode: A platform-independent, compact code interpreted or compiled at runtime by
a virtual machine (e.g., Java bytecode).
●​ Intermediate Representation (IR): A low-level, platform-independent code
representation used internally by compilers (e.g., LLVM IR), further compiled into
machine code for the target architecture.

We can summarize these differences in a table:

Aspect Machine Code Bytecode IR (Intermediate


Representation)

385
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Execution Directly by the CPU Interpreted or Not directly executable;


JIT-compiled by a further compiled to
Virtual Machine machine code

Architecture-s Yes (tied to specific No (interpreted on any No (platform-independent)


pecific CPU architecture) platform with a
compatible VM)

Portability Low (requires High (portable High (used internally by


recompilation for between systems compilers for multiple
different CPUs) running the same VM) targets)

Example x86, ARM, RISC-V Java bytecode, LLVM IR


machine code Android’s Dalvik
bytecode

Where it’s Compiled binaries, .class files, Android Inside compilers (LLVM,
found firmware apps GCC)

What is the path that a program will traverse to become machine code?
This path depends heavily on the compiler. As an example, it suffices to take a new look into the
cover of "Crafting Interpreters". But, if we talk about a full-fledged compiler, such as clang or
gcc, then we would find a path more or less like so (using the figure below):

1.​ Source Code (.c): The top section shows a simple C program with a main function that
prints "Hello" using printf. This is the high-level source code, which is
human-readable.
2.​ Intermediate Representation (.ll): The next section displays the LLVM IR, generated
from the C code using clang -S -emit-llvm. This is an intermediate representation
that is platform-independent and still readable, although more abstract than the original
C code. It shows the call to printf, which has been transformed into lower-level
operations.
3.​ Assembly Instructions (.s): The third section displays the assembly code generated by
llc, targeting ARM (-march=arm). This representation is specific to the ARM
architecture and shows instructions like push, mov, and bl (branch link). This is closer
to machine-level operations and is used by the CPU.
4.​ Machine Code (.o): The final section shows the machine code, which is the binary
representation of the instructions, ready to be executed by the processor. Each
hexadecimal number corresponds to an ARM instruction or data value.

Each step reduces the abstraction level, moving from human-readable C code to
architecture-specific machine code that runs on the hardware.

386
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's try to see the different representations of a larger program. Can you
write a simple program and convert it to the LLVM IR?
Let's go through a simple C program that includes some control flow, then generate its LLVM IR
representation using clang, and finally use llc to produce machine code for different
architectures.

Step 1: Write a Simple C Program

Here’s a simple C program that includes a control flow statement (an if statement):

// example.c

387
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

#include <stdio.h>

int main() {
int a = 5;
int b = 10;
int sum;

if (a < b) {
sum = a + b;
} else {
sum = a - b;
}

printf("Result: %d\n", sum);


return 0;
}

Step 2: Generate LLVM IR (.ll) File

To generate the LLVM IR file from the C program, you can use the clang compiler with the -S
and -emit-llvm flags. Open a terminal and run the following command:

clang -S -emit-llvm example.c -o example.ll

This command creates a file named example.ll, which contains the LLVM IR representation
of the program.

Step 3: Examine the LLVM IR

You can view the contents of example.ll using a text editor or by running cat on the
program. If you do it, then you should see something like this (simplified) representation:

; ModuleID = 'example.c'
source_filename = "example.c"
target datalayout = "..."
target triple = "x86_64-unknown-linux-gnu"

define i32 @main() {


entry:
%a = alloca i32, align 4
%b = alloca i32, align 4
%sum = alloca i32, align 4

store i32 5, i32* %a, align 4

388
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

store i32 10, i32* %b, align 4

%0 = load i32, i32* %a, align 4


%1 = load i32, i32* %b, align 4
%cmp = icmp slt i32 %0, %1
br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry


%2 = load i32, i32* %a, align 4
%3 = load i32, i32* %b, align 4
%add = add i32 %2, %3
store i32 %add, i32* %sum, align 4
br label %if.end

if.else: ; preds = %entry


%4 = load i32, i32* %a, align 4
%5 = load i32, i32* %b, align 4
%sub = sub i32 %4, %5
store i32 %sub, i32* %sum, align 4
br label %if.end

if.end: ; preds = %if.then,


%if.else
%6 = load i32, i32* %sum, align 4
%7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x
i8], [14 x i8]* @.str, i32 0, i32 0), i32 %6)
ret i32 0
}

Step 4: Generate Machine Code Using llc

Now, let's use llc to generate machine code for different architectures. We can generate code
for x86_64 and RISC-V as examples.

Generate x86_64 Assembly Code

llc -march=x86 example.ll -o example.x86

Generate RISC-V Machine Code

llc -march=riscv64 example.ll -o example.riscv

Step 5: Generate Machine Code Using as

389
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

You can translate the assembly code to machine code using an assembler such as the GNU
Assembler (as):

as example.x86 -o example.o

Step 5: Review the Generated Machine Code

You can examine the generated object files using objdump or similar tools.

For x86_64:

objdump -d example.o

In this case, you should get something like:

Disassembly of section __TEXT,__text:

0000000000000000 <ltmp0>:
0: ff c3 00 d1 sub sp, sp, #48
4: fd 7b 02 a9 stp x29, x30, [sp, #32]
8: fd 83 00 91 add x29, sp, #32
c: bf c3 1f b8 stur wzr, [x29, #-4]
10: a8 00 80 52 mov w8, #5
14: a8 83 1f b8 stur w8, [x29, #-8]
18: 48 01 80 52 mov w8, #10
1c: a8 43 1f b8 stur w8, [x29, #-12]
20: a8 83 5f b8 ldur w8, [x29, #-8]
24: a9 43 5f b8 ldur w9, [x29, #-12]
28: 08 01 09 6b subs w8, w8, w9
2c: ca 00 00 54 b.ge 0x44 <ltmp0+0x44>
30: a8 83 5f b8 ldur w8, [x29, #-8]
34: a9 43 5f b8 ldur w9, [x29, #-12]
38: 08 01 09 0b add w8, w8, w9
3c: e8 13 00 b9 str w8, [sp, #16]
40: 05 00 00 14 b 0x54 <ltmp0+0x54>
44: a8 83 5f b8 ldur w8, [x29, #-8]
48: a9 43 5f b8 ldur w9, [x29, #-12]
4c: 08 01 09 6b subs w8, w8, w9
50: e8 13 00 b9 str w8, [sp, #16]
54: e8 13 40 b9 ldr w8, [sp, #16]
58: e0 03 08 aa mov x0, x8
5c: 09 00 00 90 adrp x9, #0
60: 29 01 00 91 add x9, x9, #0
64: e0 07 00 f9 str x0, [sp, #8]
68: e0 03 09 aa mov x0, x9

390
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

6c: e9 03 00 91 mov x9, sp


70: ea 07 40 f9 ldr x10, [sp, #8]
74: 2a 01 00 f9 str x10, [x9]
78: 00 00 00 94 bl 0x78 <ltmp0+0x78>
7c: 08 00 80 52 mov w8, #0
80: e0 03 08 aa mov x0, x8
84: fd 7b 42 a9 ldp x29, x30, [sp, #32]
88: ff c3 00 91 add sp, sp, #48
8c: c0 03 5f d6 ret

Summary

In this process, we:

1.​ Created a simple C program with control flow.


2.​ Used clang to generate LLVM IR.
3.​ Used llc to produce machine code for both x86_64 and RISC-V architectures.

This workflow highlights how high-level code is transformed into lower-level representations
suitable for different CPU architectures. Let me know if you need further details or explanations!

This step from C to LLVM IR is typically what we call "Code Generation".


Can you explain this process of program transformation?
The step from C to LLVM IR, often called Code Generation (or IR Generation by purists), is an
important phase in the compilation process where high-level language constructs (like C) are
transformed into an Intermediate Representation (IR), in this case, LLVM IR. This phase
serves as a bridge between high-level source code and machine-specific optimizations, allowing
for portability and further program analysis. Here's what happens and why it’s interesting:

1. Translation of High-Level Constructs:

●​ During this phase, high-level constructs such as loops, function calls, conditional
statements, and arithmetic operations are mapped to lower-level abstractions that can
be easily manipulated by the compiler.
●​ For example, a for loop in C might be transformed into basic blocks with control flow
edges, representing the loop initialization, condition checking, and update statements.
●​ Interesting part: High-level features, such as complex data types, are simplified. For
instance, structures and arrays are flattened into memory addresses and offsets.
Function calls become simple jumps, while C’s undefined behavior rules can allow
optimizations to discard certain checks.

2. Platform Independence:

391
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ LLVM IR is platform-independent. It represents the program in a form that doesn’t yet


assume details about the target architecture, making it a universal intermediate
language that can be optimized once and then translated into machine code for many
different hardware platforms (x86, ARM, RISC-V, etc.).
●​ Interesting part: This abstraction enables LLVM-based compilers to target multiple
architectures without needing to reimplement the front-end for each one. It also enables
optimizations that are independent of the machine architecture.

3. Optimization Opportunity:

●​ The IR generation step sets the stage for optimizations. LLVM IR is designed to
facilitate various optimization passes. For example:
○​ Dead code elimination: Removing code that will never be executed.
○​ Constant propagation: Replacing variables that are constant with their actual
values.
○​ Loop optimizations: Simplifying or unrolling loops to reduce runtime overhead.
●​ Interesting part: Because IR is more abstract than assembly, the compiler can reason
about the program's behavior without worrying about architecture-specific details,
enabling complex, high-level optimizations.

4. Simplification:

●​ During this transformation, complex control flow and expressions in the source code are
simplified into basic operations and basic blocks (a straight-line sequence of
instructions with no branches, except at the entry and exit points). This simplification
allows the later stages of the compiler to reason more easily about program behavior.
●​ For example, a complex if-else statement might be transformed into several basic
blocks with conditional branches between them.
●​ Interesting part: The control flow graph (CFG) of the program becomes clearer in IR. By
breaking down control flow into branches and blocks, it becomes easier for optimization
passes to analyze dependencies and execution paths.

5. Abstract Data Types and Memory Model:

●​ The IR abstracts data types and memory accesses. For instance, local variables may
still be represented in IR, but they will eventually be lowered to stack or register
accesses depending on optimizations.
●​ Interesting part: This abstraction simplifies memory management. Variables are either
SSA values (Static Single Assignment) or memory locations, making it easier to track
where values are modified and to optimize away unnecessary memory operations.

6. Function Representation:

392
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Functions in LLVM IR are represented with basic blocks, which are sequences of
instructions with a clear entry and exit point, and are often laid out in Static Single
Assignment (SSA) form. SSA ensures that every variable is assigned exactly once,
which simplifies dependency analysis and allows for better optimizations.
●​ Interesting part: SSA form makes dependency analysis (e.g., which variables depend
on which other variables) much simpler, enabling more powerful and safe optimizations.
It also allows easier reasoning about control flow and variable lifetimes.

7. Intrinsic Functions:

●​ LLVM IR introduces intrinsic functions, which are special functions that represent
low-level operations not directly available in high-level languages. For example, there
might be intrinsics for vector operations, atomic operations, or memory barriers. These
provide hooks for optimizations and fine-tuning of platform-specific operations.
●​ Interesting part: Intrinsic functions allow the compiler to introduce hardware-specific
instructions without exposing them at the source level, keeping IR platform-neutral while
still offering the potential for fine-tuned performance on different architectures.

Summary:

The transformation from C to LLVM IR is fascinating because it involves translating a rich,


high-level program into a much simpler, abstract form that the compiler can manipulate. This
process provides opportunities for platform-agnostic optimizations, enables code reuse across
multiple architectures, and paves the way for advanced program analysis. The modular, simple
structure of LLVM IR helps in applying transformations, optimizing memory, and control-flow
logic before the code is tied to any specific hardware. In essence, it’s a powerful intermediate
step that retains the high-level logic of the source code but provides enough low-level detail to
make the eventual generation of machine code efficient and highly optimized.

Let's go over the LLVM IR. What are interesting aspects of this intermediate
representation?
The figure below is a control-flow graph: a way to show the low-level representation of
programs. This CFG was produced for the following program:

#include <stdio.h>

int main(int argc, char *argv[]) {


if (argc < 2) {
printf("Usage: %s <input_string>\n", argv[0]);
return 1;
}

char *input = argv[1];

393
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

int count = 0;

while (input[count] != '\0') {


count++;
}

printf("The input string '%s' has %d characters.\n", input,


count);

return 0;
}

We can produce the Control-Flow Graph (CFG) of this program with the following commands in
clang:

clang -Xclang -disable-O0-optnone -S -emit-llvm count_char.c


opt -dot-cfg count_char.ll
dot -Tpdf .main.dot -o main.pdf

Here's the program's CFG:

394
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The control-flow graph we've produced for the C program is a visual representation of the flow
of execution through the program in LLVM Intermediate Representation (IR). Let's break down
the interesting aspects of this representation.

1. Basic Blocks:

●​ Each rectangular box represents a basic block. A basic block is a sequence of


instructions that are executed sequentially. Once control enters a block, it will execute all
the instructions inside it before potentially branching to another block.
●​ Blocks are labeled (%2, %4, %7, %10, etc.), and you can see the instructions within each
one.

2. Typed Instructions:

395
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Yes, LLVM IR is strongly typed. You can see that every instruction has types
associated with the values they operate on. For instance, i32 stands for a 32-bit integer,
i8 for an 8-bit integer (a byte), and pointers are indicated by i8* or i32*.
●​ These types ensure that the operations performed on the data are correct. For example,
you can’t add an integer to a pointer without explicit casting, enforcing type safety at this
low level.

3. Phi Functions (phi):

●​ The phi functions (%phi) are specific to Static-Single Assignment (SSA) form, which is
the form used by LLVM IR.
●​ A phi function is used to choose a value based on the control flow. For example, in block
%10, there’s a phi function (%11 = phi i64 [%15, %10], [0, %7]). This
instruction selects the value of %15 if control comes from block %10, or 0 if control
comes from block %7.
●​ This is necessary in SSA because variables are immutable, meaning each variable can
only be assigned once. Phi functions allow merging different values depending on the
incoming path in the CFG.

4. Control Flow (Edges):

●​ The edges between the blocks represent the possible control flow paths between basic
blocks. These are determined by conditional and unconditional branch instructions (br in
LLVM IR).
●​ For instance, in block %2, there’s a conditional branch (br i1 %3, label %4,
label %7). This means that based on the condition %3 (a comparison), control either
flows to block %4 (T) or block %7 (F).
●​ The control flow ensures that certain code is only executed under specific conditions, like
loops or if-else structures.

5. Loops:

●​ The red-colored block %10 is part of a loop. You can see that this block branches back
to itself, creating a loop structure. The instruction %14 = icmp eq i8 %13, 0 checks
whether a condition is met (whether the current character in the string is '\0'). If not,
the loop continues.
●​ LLVM adds metadata for loops (!llvm.loop), which can be useful for loop
optimizations.

6. Function Calls:

396
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ You can spot calls to functions, such as the printf function in blocks %4 and %16. The
LLVM IR includes calls to external functions using the call instruction. The
getelementptr (GEP) is used here to compute the memory addresses for the
arguments passed to printf.

7. Branching Logic:

●​ In the program, branching happens based on comparisons (icmp). For example, in


block %2, the program compares whether the argument count is less than 2. Based on
this, it either exits (%4) or proceeds to process the string (%7).
●​ The branching and control flow illustrate how the program determines which path to
follow during execution.

8. SSA Form:

●​ LLVM IR is in Static Single Assignment form, meaning each variable (or register) is
assigned exactly once. This simplifies many analyses and transformations in the
compiler. To deal with multiple assignments to a variable, the phi functions are used at
control-flow merges.

Summary:

●​ Basic Blocks: Represent sequences of instructions.


●​ Typed Instructions: Enforce type safety at the LLVM IR level.
●​ Phi Functions: Merge different values based on control flow in SSA form.
●​ Control Flow Edges: Show possible execution paths, branching based on conditions.
●​ Loops: Illustrated by blocks that point back to themselves.
●​ Function Calls and Branching: Show external function invocations and control
decisions.

This CFG allows us to visualize the flow of execution in a program and understand how control
moves from one block to another, based on the values and conditions evaluated during
execution. It’s especially useful when optimizing and transforming code at the compiler level!

Why does LLVM use an infinite number of register names? Is this not too
high level? When will these names be mapped to actual locations?
LLVM uses an infinite number of virtual registers in its intermediate representation (IR) for
two main reasons:

1. Simplification of Code Representation

397
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ SSA Form (Static Single Assignment): LLVM IR is in SSA form, meaning each
variable (or "register") is assigned exactly once. This simplifies analysis and optimization
because:
○​ There is no need to worry about multiple definitions of the same variable.
○​ Data flow analysis becomes straightforward, as each value has a single, clear
origin.
●​ Virtual registers behave like variables in a high-level language, making transformations
and optimizations easier for the compiler developer.

2. Independence from Physical Hardware

●​ Virtual registers are not constrained by the limited number of physical registers on
the target architecture. This enables optimizations without worrying about hardware
constraints during the IR stage.
●​ This abstraction allows LLVM to perform optimizations independently of the target
machine, making the IR more portable.

When Are Virtual Registers Mapped to Actual Locations?

This mapping occurs during the backend stage of the compilation pipeline, specifically in the
register allocation phase. Here’s how the process works:

1.​ Lowering the SSA Form:


○​ Before register allocation, the SSA form is "lowered."
○​ For example, phi nodes (used for merging values in SSA) are converted into
explicit moves and control-flow instructions.
2.​ Register Allocation:
○​ The compiler assigns physical registers from the target architecture to the virtual
registers. This process considers:
■​ The limited number of physical registers.
■​ The need to minimize spills (storing values in memory when registers are
insufficient).
○​ Algorithms like graph coloring or linear scan are often used for register
allocation.
3.​ Spilling:
○​ If there are more virtual registers than available physical registers, some virtual
registers are spilled. This means their values are stored in memory and loaded
back as needed.
4.​ Code Emission:
○​ The LLVM backend generates machine code where all virtual registers are
replaced by either physical registers or memory locations.

398
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why Is This Design Useful?

●​ Optimization-Friendly: An infinite register model eliminates the need to handle register


pressure during optimization. Compiler optimizations (e.g., constant propagation, loop
unrolling) are easier to implement when there's no risk of running out of registers.
●​ Portability: LLVM IR can remain the same regardless of the target hardware. The
details of how many registers the hardware has are handled later, during code
generation.

Example: Mapping Virtual to Physical Registers

Consider a simple LLVM IR snippet:

%1 = add i32 %a, %b


%2 = mul i32 %1, 2
ret i32 %2

In SSA form:

●​ %1 and %2 are virtual registers that store intermediate results.

During register allocation for a target with physical registers R0, R1, and so on:

●​ %a and %b might already be mapped to R0 and R1.


●​ %1 might be assigned to R2.
●​ %2 might reuse R1 if R1 is no longer needed for %b.

The resulting assembly might look like:

ADD R2, R0, R1​ ; R2 = R0 + R1


MUL R1, R2, #2​ ; R1 = R2 * 2
RET R1 ​ ; Return R1

If there are not enough registers, some of these intermediate results might be spilled to memory.
For instance:

ADD R2, R0, R1 ​ ; R2 = R0 + R1


STR R2, [SP, #-4]! ; Spill R2 to the stack
LDR R3, [SP], #4​ ; Load R2 from the stack into R3
MUL R1, R3, #2 ​ ; R1 = R3 * 2
RET R1 ​ ; Return R1

399
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This separation of virtual and physical registers allows LLVM to be both powerful and portable,
catering to a wide range of optimizations and target architectures.

It seems that all the info necessary to run the program is embedded in the
LLVM IR. Can we interpret this IR to see the results of the execution?
Yes, we can! For instance, to run the program above, you can do:

$> lli count_char.ll some string here


The input string 'some' has 4 characters.

The LLVM Intermediate Representation (IR) is primarily designed as a portable, low-level


representation for performing sophisticated compiler optimizations and facilitating code
generation across multiple architectures. Its main purpose is not to serve as a direct execution
format, but LLVM's design allows it to be interpreted or executed using tools like lli, which is
the LLVM interpreter.

Here's a breakdown:

1.​ Primary Purpose: Optimization and Code Generation​


LLVM IR is structured to be machine-independent, enabling sophisticated optimizations
that are applicable across multiple target architectures. These optimizations occur in the
front-end (high-level code to LLVM IR) and back-end (LLVM IR to machine code) phases
of the compilation process.
2.​ Interpretation via lli​
The lli tool provides a way to interpret LLVM IR by executing it directly using the LLVM
Just-In-Time (JIT) compiler or an interpreter:
○​ When run with JIT support, lli translates the LLVM IR into native machine code
at runtime and executes it.
○​ In pure interpretation mode, it simulates the execution of LLVM IR instructions
without translating them to machine code.
3.​ Is Interpretation a Goal of LLVM IR?​
Interpretation is more of a feature than a core goal:
○​ The ability to interpret LLVM IR using lli is a practical way to test and debug IR
code without requiring it to be compiled to native machine code.
○​ It helps in scenarios where a lightweight execution environment is desired (e.g.,
for research or prototyping).

Conclusion

LLVM IR is not designed specifically for interpretation, but its architecture allows it to be
interpreted or executed via tools like lli. This flexibility is a byproduct of its design for
optimization and portability, making LLVM IR versatile for tasks beyond its primary role.

400
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, the LLVM IR is a "Virtual Instruction Set". But the Java Bytecodes are
also a "Virtual Instruction Set". How are they different?
LLVM IR and Java bytecode are both virtual instruction sets, but they serve different purposes,
have different designs, and are intended for different execution environments. Here's a
comparison between them:

1. Purpose and Ecosystem

●​ LLVM IR (Intermediate Representation):


○​ Purpose: LLVM IR is used primarily as an intermediate representation (IR) in the
LLVM compiler framework, which allows for optimizations and transformations
before generating machine code for multiple architectures (e.g., x86, ARM,
RISC-V).
○​ Ecosystem: LLVM IR is designed to support many programming languages
(e.g., C, C++, Rust, Swift). It acts as a portable, machine-independent code
representation that can be compiled into various target architectures.
○​ Execution: LLVM IR is not usually directly executed; it's compiled further into
native machine code via the LLVM backend or interpreted by tools like the LLVM
interpreter.
●​ Java Bytecode:
○​ Purpose: Java bytecode is designed to be executed on the Java Virtual Machine
(JVM). It is a portable format specifically for Java programs (and other JVM
languages like Kotlin, Scala, and Groovy).
○​ Ecosystem: Java bytecode is tailored to the Java ecosystem. It allows Java
programs to run on any platform with a compatible JVM, making the code "write
once, run anywhere."
○​ Execution: Java bytecode is executed on the JVM, either through interpretation
or Just-In-Time (JIT) compilation to machine code at runtime.

2. Instruction Set Design

●​ LLVM IR:
○​ Low-level Representation: LLVM IR is closer to assembly language but still
abstract enough to be platform-independent. It includes instructions for primitive
operations (arithmetic, memory access, branching) and supports features like
type information, SSA (Static Single Assignment) form, and control flow.
○​ Three-Address Code: LLVM IR instructions typically operate in a three-address
format, meaning instructions take a destination and two source operands (e.g.,
%1 = add i32 %2, %3).
○​ Type System: LLVM IR has a rich type system, supporting a wide range of
primitive and complex types, including pointers, arrays, structures, and vectors. It
allows fine-grained control over data representation.

401
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Target Independence: LLVM IR is designed to be optimized at a high level and


then lowered into architecture-specific machine code. This abstraction makes it
very powerful for cross-compilation and code optimization.
●​ Java Bytecode:
○​ Higher-level Abstraction: Java bytecode is higher-level compared to LLVM IR.
It is designed for object-oriented execution, supporting classes, objects, and
methods, along with instructions specific to object creation, method invocation,
and exception handling.
○​ Stack-based Instructions: Java bytecode is stack-based, meaning most
instructions push operands onto a stack and pop the results. For example, iadd
pops two integers off the stack, adds them, and pushes the result back onto the
stack.
○​ Simpler Type System: Java bytecode uses a more simplified type system
focused on JVM primitives like integers, floats, references, and arrays. It does not
deal with low-level types like pointers or memory management directly (since the
JVM handles memory).
○​ JVM Specific: Java bytecode is meant to run in the JVM environment, which
abstracts away many details like memory management (handled by garbage
collection), exception handling, and thread management. It is optimized for the
JVM's execution model.

3. Compilation and Execution

●​ LLVM IR:
○​ Compilation Stages: Source code (e.g., in C, C++, Rust) is first compiled into
LLVM IR, which can then be further optimized using the LLVM toolchain. Finally, it
is compiled into native machine code for specific architectures.
○​ JIT Compilation: LLVM also supports JIT compilation, where LLVM IR can be
compiled into machine code at runtime, making it flexible for use in dynamic
languages or environments where runtime code generation is needed.
●​ Java Bytecode:
○​ Bytecode Interpretation and JIT: Java bytecode is typically interpreted by the
JVM at runtime. However, JVMs also employ JIT compilation, where frequently
used bytecode is compiled into native machine code for performance
improvements.
○​ Write Once, Run Anywhere: Since Java bytecode is platform-independent, it
can run on any system with a compatible JVM. This contrasts with LLVM IR,
which is more of a middle step in a compilation process and not directly tied to a
specific runtime environment like the JVM.

4. Optimization and Flexibility

●​ LLVM IR:

402
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Optimization Focus: LLVM IR is designed to be highly optimizable. The LLVM


toolchain provides numerous optimization passes that operate on LLVM IR,
including dead code elimination, loop unrolling, constant propagation, and
inlining. These optimizations are performed before the final machine code is
generated.
○​ Flexibility: LLVM IR is used in a variety of contexts, from general-purpose
compilers to specialized tools like those for graphics, cryptography, or machine
learning. It's highly flexible and can support various programming paradigms and
domains.
●​ Java Bytecode:
○​ JVM Optimizations: The JVM provides runtime optimizations such as method
inlining, hotspot detection, and garbage collection. However, these optimizations
are at a higher level than those in LLVM IR.
○​ Portability Focus: Java bytecode is more focused on portability and seamless
execution across multiple platforms, with the JVM providing most of the
underlying optimizations and system-specific details.

5. Portability

●​ LLVM IR:
○​ While LLVM IR is portable in terms of the machine code it can generate, it is not
designed for direct portability in the way Java bytecode is. LLVM IR is a
compilation step, not an execution format.
●​ Java Bytecode:
○​ Java bytecode is portable across any platform that supports the JVM. This allows
developers to compile once and run their programs on any system without
recompilation, making it extremely portable.

Summary Table
Feature LLVM IR Java Bytecode

Primary Use Intermediate step in compilation Execution on JVM

Architecture Closer to machine code (assembly-like) Higher-level, JVM-specific

Instruction Three-address code, SSA form Stack-based execution


Design

Type System Rich, with pointers, structs, etc. Simplified, JVM-focused types

Compilation Compiled into native machine code Executed on JVM or JIT


compiled

Execution Model Cross-compilation and optimization Write once, run anywhere

403
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Optimizations LLVM-specific, low-level optimizations JVM runtime optimizations

Portability Portable across architectures (via Fully portable on any JVM


compilation)

In summary, LLVM IR is designed for optimization and portability during the compilation
process, while Java bytecode is intended for portability and execution on the JVM.

Can you show examples of Java bytecodes compared with a similar


program written in the LLVM intermediate representation?
Sure, let's take a look into a non-recursive version of factorial:

The figure compares Java bytecode and LLVM IR by showing equivalent factorial
implementations in Java and C, along with their compiled representations. Key differences
include:

●​ Java Bytecode: It is a stack-based instruction set focused on higher-level operations,


such as loading variables and simple operations like imul (integer multiplication),
working closely with the JVM's stack and supporting portability.

404
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ LLVM IR: It uses a three-address code style with more detailed control flow (using labels
and phi nodes) and explicit type information (i32 for integers). LLVM IR is designed for
optimizations and machine code generation across different architectures, while Java
bytecode is optimized for execution on the JVM.

In contrast to the LLVM IR, Java bytecodes were designed with


interpretation in mind. Do you agree?
Yes, Java Bytecode was explicitly designed for interpretation and execution within the JVM,
making it a natural fit for platform-independent program execution. In contrast, LLVM IR was
designed as a compiler infrastructure tool for optimization and code generation, with
interpretation being an incidental feature rather than a core goal. The design of Java Bytecode
and LLVM IR reflects their intended purposes, and several features make Java Bytecode
well-suited for interpretation, while LLVM IR is less ideal for this. Here’s a comparison
highlighting the key differences:

Features of Java Bytecode That Make It Good for Interpretation

1.​ Stack-Based Architecture


○​ Java Bytecode uses a stack-based execution model, where most operations
involve pushing operands onto the stack and popping results off it. This makes
interpretation straightforward, as the interpreter doesn’t need to manage registers
or complex instruction dependencies.
○​ Example: iload, iadd, and istore instructions push and pop values directly,
avoiding the need for register allocation.
2.​ Compact and Simple Instruction Set
○​ Java Bytecode has a small, fixed-size instruction set that is easy to decode
and execute. Each instruction performs a simple, well-defined operation (e.g.,
load, store, add, invoke), making it easy to map directly to interpretation logic.
○​ Many instructions are tightly coupled with the JVM's runtime environment, like
object creation (new), method invocation (invokevirtual), and exception
handling (athrow).
3.​ Designed for Virtual Machine Integration
○​ Java Bytecode assumes it will always run within the JVM, so its features are
tightly integrated with the JVM runtime, including garbage collection, exception
handling, and class loading. These high-level abstractions simplify interpretation.
○​ The constant pool (a table of symbolic references) enables runtime
resolution of classes, methods, and fields, making interpretation dynamic and
flexible.

Features of LLVM IR That Make It Less Suitable for Interpretation

405
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Register-Based Model


○​ LLVM IR uses a register-based execution model, with an infinite number of
virtual registers. While this is excellent for optimizations and mapping to real
hardware, it complicates interpretation because interpreters must simulate the
register state and dependencies.
2.​ Lack of Built-In Runtime Support
○​ LLVM IR is not tied to a specific runtime environment. Features like exception
handling, object management, and memory allocation are implemented as part of
the front-end or runtime library, not built into the IR. This requires interpreters to
deal with more infrastructure to execute programs.
3.​ More Complex Instructions
○​ LLVM IR instructions can be quite complex, supporting advanced features like
vectorized operations, intrinsics, and undefined behavior (undef values). This
complexity makes interpretation harder and slower compared to Java Bytecode's
simpler operations.
4.​ Machine-Specific Optimizations
○​ LLVM IR includes optimizations and abstractions that target specific machine
architectures, such as explicit control over calling conventions and low-level bit
manipulations. These features make LLVM IR efficient for compilation but
burdensome for interpretation.

Illustrative Example: Addition

Below we show how addition is represented in Java Bytecode:

iload_1 ​ ; Load integer from local variable 1 onto the stack


iload_2 ​ ; Load integer from local variable 2 onto the stack
iadd ​ ; Pop two integers, add them, and push the result
istore_3 ​ ; Store result into local variable 3

The interpreter just follows the stack operations directly. Now, see how addition is represented in
the LLVM IR:

%1 = load i32, i32* %a ​; Load integer from memory


%2 = load i32, i32* %b ​; Load integer from memory
%3 = add i32 %1, %2 ​; Add integers
store i32 %3, i32* %result​; Store result in memory

The interpreter must manage registers (%1, %2, %3), memory locations, and types explicitly.

406
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The process of transforming LLVM IR to machine code is what we call


"Machine Code Generation". Could you explain this process?
The process of Machine Code Generation (also known as LLVM IR to Assembly lowering) in
the context of the LLVM compiler involves multiple sophisticated steps, such as instruction
selection, instruction scheduling, and register allocation. Each of these plays a critical role
in translating the high-level, architecture-independent LLVM Intermediate Representation (IR)
into the low-level, architecture-specific machine code. Here’s an overview of this process, the
phases involved, and the tools used in LLVM.

1. Instruction Selection:

●​ Instruction selection is the phase where high-level, target-independent LLVM IR


instructions are transformed into target-specific machine instructions.
●​ For instance, an LLVM IR instruction like add i32 %a, %b is abstract and can be
mapped to different machine instructions depending on the target architecture (e.g., ADD
on x86 or ADDI on RISC-V).
●​ In LLVM, this is handled by TableGen, a tool used to describe the instruction set of a
target architecture. TableGen provides pattern matching rules that specify how LLVM IR
operations are translated into machine instructions for a specific target.
○​ These patterns are often called SelectionDAG patterns, which form a Directed
Acyclic Graph of instructions that map to the machine instructions.

Tool: TableGen, along with SelectionDAG ISel (Instruction Selector), plays a significant role in
this phase.

2. Instruction Scheduling:

●​ After instruction selection, the generated machine instructions may need to be


rearranged to optimize performance.
●​ Instruction scheduling ensures that instructions are executed in an order that
maximizes the efficiency of the hardware. This step typically involves resolving
dependencies between instructions (e.g., ensuring that an instruction that writes to a
register completes before a subsequent instruction reads from that register).
●​ Instruction scheduling also considers hardware constraints, such as pipeline stages and
available functional units, to avoid delays (e.g., stalls) in the CPU pipeline.
●​ LLVM uses MachineScheduler or PostRA MachineScheduler (after register allocation)
to reorder instructions based on a target-specific scheduling model. This scheduling
model is described using TableGen to model the architecture’s latency, throughput, and
resource constraints.

Tool: LLVM’s MachineScheduler (part of the LLVM codegen backend) and TableGen for
target-specific scheduling models.

407
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. Register Allocation:

●​ In LLVM IR, variables (or SSA values) are assumed to have an infinite number of
registers, but real hardware has a limited number of physical registers. Register
allocation is the phase where these virtual registers are mapped to the physical
registers of the target architecture.
●​ Register allocation involves two key tasks:
○​ Register assignment: Mapping virtual registers to physical registers.
○​ Spilling: If there aren’t enough physical registers to hold all values, some values
must be spilled to memory (typically the stack), and load/store instructions are
inserted to handle these spills.
●​ LLVM provides several register allocators, such as:
○​ Linear Scan Register Allocator: A simpler, faster allocator, though not as
optimal in terms of minimizing spills.
○​ Greedy Register Allocator: A more sophisticated allocator that considers live
ranges and aims to minimize the number of spills and reloads.

Tool: LLVM’s Greedy or Linear Scan Register Allocators.

4. Additional Phases in Machine Code Generation:

After instruction selection, scheduling, and register allocation, several other steps take place to
fully prepare the machine code for execution:

●​ Prolog/Epilog Insertion: This step involves inserting code for function entry and exit,
such as saving/restoring registers to/from the stack, setting up the stack frame, etc.
●​ Peephole Optimization: This phase performs small, local optimizations to further
simplify and streamline the machine code, such as removing redundant instructions or
simplifying instruction sequences.
●​ Code Emission: The final step, where the machine instructions are emitted as assembly
code or directly as object code. This is where the tool llc comes into play.

5. LLVM Tools Involved in Machine Code Generation:

●​ llc (LLVM static compiler): This tool is responsible for converting LLVM IR into
machine-specific assembly or object code. It leverages the results of instruction
selection, scheduling, and register allocation to produce optimized machine code.
○​ Example: llc input.ll -o output.s generates assembly code
(output.s) from the LLVM IR file (input.ll).
●​ TableGen: This is a meta-programming tool in LLVM that helps define target-specific
details, such as instruction sets, register classes, instruction scheduling models, etc. It
generates C++ code for instruction selection, register allocation, and instruction
scheduling.

408
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Example: TableGen files (*.td) describe the mapping of LLVM IR instructions to


specific machine instructions, the target’s calling convention, and register sets.
●​ LLVM’s Backend: The backend includes target-specific code for generating machine
code for various architectures. Each target has its own implementation for instruction
selection, scheduling, register allocation, and more.

6. Process Flow of Machine Code Generation in LLVM:

1.​ Instruction Selection: LLVM IR instructions are mapped to target-specific machine


instructions using pattern matching rules defined in TableGen. This happens in the
SelectionDAG phase.
2.​ Instruction Scheduling: The instructions are reordered based on the CPU's pipeline
model to minimize stalls and maximize parallel execution. This is done by the
MachineScheduler.
3.​ Register Allocation: Virtual registers are mapped to physical registers. Spills to memory
are inserted if there aren’t enough registers.
4.​ Prolog/Epilog Insertion: The prologue and epilogue for functions are generated,
handling the function’s stack frame and register saving/restoring.
5.​ Code Emission: Finally, the machine instructions are emitted as either assembly code
or object code.

Summary:

●​ Machine Code Generation involves transforming the target-independent LLVM IR into


optimized, target-specific assembly or machine code.
●​ Instruction Selection (TableGen, SelectionDAG) maps LLVM IR to machine
instructions.
●​ Instruction Scheduling (MachineScheduler) reorders instructions to optimize CPU
pipeline usage.
●​ Register Allocation (Greedy/Linear Scan Register Allocators) maps virtual registers to
physical ones and handles spilling.
●​ llc is the tool that brings all of these steps together and emits the final assembly or
object code.

This process allows LLVM to efficiently compile programs for a wide variety of architectures and
is at the heart of LLVM’s portability and optimization capabilities.

Let's use objdump to inspect the binary code that is produced by clang.
Can you go over the binary, and figure it out?
To produce the binary file and inspect it with objdump, do the following:

$ clang count_char.c -o count_char.exe


$ objdump -d count_char.exe

409
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

When using objdump -d to disassemble the binary generated by your program, you're looking
at its low-level machine code. This output shows the ARM64 assembly code for different
sections of your program, primarily focusing on the __TEXT,__text section (where your actual
program's machine code resides) and the __stubs and __stub_helper sections (which are
involved in dynamic linking).

Here are some interesting aspects of this disassembly:

1. Function Prologue and Epilogue

●​ Prologue: The sequence at the beginning of _main is a typical ARM64 function


prologue. It reserves space on the stack for local variables, saves the frame pointer
(x29) and the return address (x30), and adjusts the stack pointer (sp).

100003e84: ff 43 01 d1 sub sp, sp, #80


100003e88: fd 7b 04 a9 stp x29, x30, [sp, #64]
100003e8c: fd 03 01 91 add x29, sp, #64

●​ Stack allocation: The sub sp, sp, #80 instruction reserves 80 bytes on the
stack.
●​ Save frame pointer and return address: The stp x29, x30, [sp, #64]
stores the old frame pointer and return address.

●​ Epilogue: Similarly, at the end of the function, the saved frame pointer and return
address are restored, and the stack is deallocated. This returns control to the caller by
restoring the stack and returning with ret:

100003f38: fd 7b 44 a9 ldp x29, x30, [sp, #64]


100003f3c: ff 43 01 91 add sp, sp, #80
100003f40: c0 03 5f d6 ret

2. Stack Usage and Argument Handling

ARM64 uses the stack to manage local variables and function arguments:

100003e98: a1 03 1f f8 stur x1, [x29, #-16]


100003e9c: a8 83 5f b8 ldur w8, [x29, #-8]

These instructions manipulate data on the stack, such as storing and loading function
arguments or local variables.

For instance:

410
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ stur x1, [x29, #-16]: Stores the value of x1 at an offset of 16 bytes below the
frame pointer.
●​ ldur w8, [x29, #-8]: Loads a 32-bit value from 8 bytes below the frame pointer
into register w8.

3. Conditional Branching

Conditional branches, such as:

100003ea0: 1f 09 00 71 cmp w8, #2


100003ea4: 6a 01 00 54 b.ge 0x100003ed0 <_main+0x4c>

Here, the cmp w8, #2 compares the value in w8 with 2, and b.ge branches if the comparison
results in greater-than or equal (signed comparison).

4. Dynamic Linking and Stubs

Sections __stubs and __stub_helper are related to dynamic linking, which is used to
resolve external functions at runtime (like library functions). This is managed by the
dyld_stub_binder, a part of the dynamic linker that resolves addresses of external functions.

●​ The __stubs section has a branch through register (indirect jump) that takes the
program to the dynamically linked function:

100003f48: d0 05 02 58 ldr x16, 0x100008000 <dyld_stub_binder+0x100008000>


100003f4c: 00 02 1f d6 br x16

●​ The __stub_helper section is involved in setting up the address for the external
function and managing stack frames.

5. Calling External Functions

In the main section, you see a call to bl (branch and link), which is an instruction used to call
external functions. The address of dyld_stub_binder is resolved and used to perform
dynamic linking:

100003ec0: 21 00 00 94 bl 0x100003f44 <dyld_stub_binder+0x100003f44>

6. Instruction Formats

●​ ARM64 instructions are 32 bits wide and use a fixed instruction format. You see a variety
of data manipulation instructions such as ldr (load register), str (store register), and
branch instructions like b and bl.

411
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What's actually objdump, and what else can it do?

objdump is a powerful tool used to inspect and analyze binary files. In this case, you are using
it to disassemble your binary and inspect the generated assembly code. Beyond disassembly,
objdump can also:

●​ Display headers of object files, sections, and symbols.


●​ Disassemble specific sections of a binary.
●​ Show the raw bytecode of machine instructions.

Some useful options with objdump include:

●​ -d: Disassemble the text section (executable code).


●​ -x: Display all headers, including section headers and symbol information.
●​ -s: Display the contents of a section as a hex dump.

In summary, you're seeing the machine-level translation of your program, with key features such
as function prologues/epilogues, stack usage, control flow (branching), and dynamic linking
mechanisms, all captured by objdump.

Can you explain to me the output of objdump -x?


Sure. Let's go over this example below:

412
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The output of objdump -s a.out shows the contents of different sections in the compiled
binary a.out. These sections organize various parts of the program's data and instructions.
Here's an explanation of each section in the output:

Sections Breakdown

1.​ __text
○​ Purpose: Contains the program's machine code (compiled instructions).
○​ Details:
■​ The hexadecimal numbers are ARM64 instructions (based on the format).
■​ This section represents the executable logic of your program, including
the main function and any other compiled functions.
2.​ __stubs
○​ Purpose: Holds stubs (placeholders) for dynamically linked functions like
printf.
○​ Details:
■​ When a dynamically linked function is called for the first time, the stub
redirects control to the dynamic linker to resolve the function's address.
3.​ __stub_helper
○​ Purpose: Assists in resolving symbols during dynamic linking by providing helper
code.
○​ Details:
■​ This section contains code that helps with lazy symbol resolution.
4.​ __cstring
○​ Purpose: Stores constant strings used in the program.
○​ Includes:
■​ Usage: %s <input_string>
■​ The input string '%s' has %d characters.
5.​ __unwind_info
○​ Purpose: Contains metadata for stack unwinding, used in exception handling
and debugging.
○​ Details:
■​ This metadata helps restore the state of the program if an exception
occurs or during stack tracing for debugging purposes.
6.​ __got (Global Offset Table)
○​ Purpose: Holds addresses of dynamically resolved symbols (functions or
variables).
○​ Details:
■​ Initially, the entries may be zeros or placeholders.
■​ At runtime, the dynamic linker populates this table with actual addresses.
7.​ __la_symbol_ptr

413
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Purpose: Contains pointers for lazy binding of symbols.


○​ Details:
■​ Lazy binding defers resolving a symbol's address until it is first used,
improving program startup time.
8.​ __data
○​ Purpose: Stores initialized global and static variables.
○​ Details:
■​ In your program, this section appears empty because you haven’t defined
global or static variables that require initialization.

Highlighted Features in the Output

1.​ __text Section:


○​ Contains the actual compiled machine instructions for your program.
○​ Disassembling it (using objdump -d a.out) would reveal human-readable
assembly code.
2.​ __cstring Section:
○​ Clearly shows the constant strings used in your program.
○​ These strings are directly referenced in the compiled code, typically through
pointers.
3.​ Dynamic Linking Sections (__stubs, __stub_helper, __got, and
__la_symbol_ptr):
○​ These sections work together to handle dynamically linked functions like
printf.
○​ Instead of embedding the code for printf in your binary, the compiler includes
placeholders and pointers to resolve printf's address at runtime.

Why This Happens

The binary file you compiled is dynamically linked and uses the default build configuration. This
results in additional sections to support runtime features like:

●​ Dynamic linking: Resolves external symbols like printf at runtime.


●​ Debugging and exception handling: Provides metadata for stack unwinding.
●​ Efficient string storage: Centralizes constant strings in the __cstring section.

If you want a more streamlined binary:

414
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Static linking: Use gcc -static -o a.out program.c to embed all dependencies
into the binary.
●​ Stripping symbols: Use strip a.out to remove debugging and linking metadata.

Tips for Exploration

1.​ Disassemble the Code: Run objdump -d a.out to see the assembly code
corresponding to the __text section.
2.​ Extract Strings: Use strings a.out to list all readable strings, like the ones in
__cstring.
3.​ Analyze Dynamic Linking: Use ldd a.out to see the shared libraries your binary
depends on.

There are many different kinds of machine instructions. Some of them


follow a philosophy called RISC. Others follow the CISC philosophy. Have
you ever heard of that?
Let me explain the general differences between RISC and CISC architectures, and how they are
typically represented:

RISC (Reduced Instruction Set Computer):

●​ Simpler Instructions: RISC architecture uses simple, fixed-length instructions,


generally performing one operation per instruction.
●​ Load/Store Architecture: In RISC, instructions typically either load/store data to/from
memory or perform arithmetic/logic operations, but not both in one instruction.
●​ Fixed Instruction Format: The instruction format in RISC is often uniform, meaning all
instructions are the same size (e.g., 32-bit) and are divided into fields like opcode,
source registers, destination registers, and immediate values.
●​ More Registers: RISC architectures usually have a larger set of general-purpose
registers to minimize memory accesses.

CISC (Complex Instruction Set Computer):

●​ Complex Instructions: CISC instructions are often variable-length and can perform
multiple operations in a single instruction (e.g., load from memory and perform arithmetic
in one go).
●​ Memory-to-Memory Instructions: CISC instructions can work directly on memory,
unlike RISC, where memory is accessed only via load/store instructions.

415
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Variable Instruction Format: CISC architectures have variable-length instructions,


which can be more difficult to decode. An instruction might be as short as one byte or
span multiple bytes.
●​ Fewer Registers: Since CISC allows for more complex instructions that work directly on
memory, the number of registers tends to be fewer.

Can you give me examples of CISC and RISC instruction sets?


Here are some examples of CISC and RISC instruction sets to help illustrate the differences:

CISC Instruction Sets

CISC architectures tend to have more complex instructions that may execute multiple
operations. Examples of CISC instruction sets include:

1.​ x86 (Intel and AMD processors):


○​ One of the most well-known CISC architectures.
○​ Instructions can be variable in length, from 1 to 15 bytes.
○​ Example Instruction:
■​ MOV AX, [1234] — This instruction loads data from memory location
1234 directly into the AX register.
■​ LOOP — This instruction automatically decrements a counter and jumps to
a specific memory location.
○​ x86 processors are designed to support backward compatibility, which has led to
a large number of instructions and complex instruction formats.
2.​ VAX (Digital Equipment Corporation):
○​ Known for its extremely rich and complex instruction set.
○​ Example Instruction:
■​ ADDL3 R1, R2, R3 — Adds the contents of R1 to R2 and stores the
result in R3.
■​ VAX instructions could operate on memory and registers, and they had
many addressing modes.
3.​ IBM System/360 and successors (mainframe computers):
○​ Another example of a CISC architecture.
○​ Instructions could perform tasks like loading and storing multiple values in
memory, performing arithmetic directly on memory locations, and even executing
instructions conditionally based on memory contents.

RISC Instruction Sets

RISC architectures aim to simplify instructions, focusing on instructions that execute in a single
clock cycle. Examples of RISC instruction sets include:

1.​ ARM (Advanced RISC Machines):

416
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ One of the most widely used RISC architectures, especially in mobile devices.
○​ ARM uses a fixed instruction length (typically 32 bits in older versions, 64 bits in
ARM64).
○​ Example Instruction:
■​ ADD R0, R1, R2 — Adds the contents of R1 and R2 and stores the
result in R0.
■​ LDR R0, [R1] — Loads data from memory (address held in R1) into R0.
2.​ MIPS (Microprocessor without Interlocked Pipeline Stages):
○​ A clean RISC architecture that focuses on simplicity and speed.
○​ Example Instruction:
■​ ADD $t0, $t1, $t2 — Adds registers $t1 and $t2, stores the result
in $t0.
■​ LW $t0, 4($t1) — Loads a word from memory address 4 + $t1 into
register $t0.
3.​ RISC-V:
○​ A newer open-source RISC instruction set architecture.
○​ Example Instruction:
■​ ADD x1, x2, x3 — Adds registers x2 and x3, stores the result in x1.
■​ LW x1, 0(x2) — Loads a word from the memory address stored in x2
into x1.
4.​ SPARC (Scalable Processor Architecture):
○​ Developed by Sun Microsystems.
○​ Example Instruction:
■​ ADD %r1, %r2, %r3 — Adds the contents of %r1 and %r2, stores the
result in %r3.
■​ LD [%r1+4], %r2 — Loads a word from memory address %r1 + 4
into register %r2.

Key Differences Between CISC and RISC Examples:

●​ CISC (e.g., x86, VAX) instructions often combine multiple operations (e.g., load from
memory and perform arithmetic in one step).
●​ RISC (e.g., ARM, MIPS, RISC-V) instructions are simpler, with separate instructions for
each step (e.g., first load from memory, then perform arithmetic).

CISC instructions tend to have more complex operations but may take multiple cycles to
execute, while RISC instructions are designed to be simple and execute in a single clock cycle.

417
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In this course's homework, students will generate code using RISC-V. What
is RISC-V, and which instructions of it we shall be using?
RISC-V is an open-source instruction set architecture (ISA) based on the principles of reduced
instruction set computing (RISC). It provides a simple and flexible framework for designing
processors that range from low-power microcontrollers to high-performance computers.
RISC-V’s modularity allows for extensions (like multiplication or floating-point support) without
unnecessary complexity, making it ideal for educational use and research.

In your course, students will work with 14 key RISC-V instructions that cover arithmetic, memory
access, and control flow. Here's a brief explanation of each:

1.​ add rd, rs1, rs2: Performs an addition of the values in registers rs1 and rs2 and
stores the result in rd. Example: add x3, x1, x2 adds x1 and x2 and stores the
result in x3.
2.​ addi rd, rs1, imm: Adds an immediate value (constant) imm to the value in register
rs1 and stores the result in rd. Example: addi x3, x1, 5 adds 5 to the value in x1
and stores the result in x3.
3.​ mul rd, rs1, rs2: Multiplies the values in rs1 and rs2, storing the product in rd.
Example: mul x3, x1, x2 multiplies x1 by x2 and stores the result in x3.
4.​ sub rd, rs1, rs2: Subtracts the value in rs2 from rs1 and stores the result in rd.
Example: sub x3, x1, x2 computes x1 - x2 and stores the result in x3.
5.​ xor rd, rs1, rs2: Performs a bitwise XOR (exclusive OR) of rs1 and rs2, storing
the result in rd. Example: xor x3, x1, x2 computes x1 ^ x2 and stores the result
in x3.
6.​ xori rd, rs1, imm: Performs a bitwise XOR between rs1 and an immediate value
imm, storing the result in rd. Example: xori x3, x1, 5 XORs x1 with 5 and stores
the result in x3.
7.​ div rd, rs1, rs2: Divides the value in rs1 by the value in rs2 (signed integer
division) and stores the quotient in rd. Example: div x3, x1, x2 computes x1 / x2
and stores the quotient in x3.
8.​ slt rd, rs1, rs2: Sets rd to 1 if rs1 is less than rs2 (signed comparison),
otherwise sets rd to 0. Example: slt x3, x1, x2 checks if x1 < x2 and stores 1 or
0 in x3.
9.​ slti rd, rs1, imm: Sets rd to 1 if rs1 is less than the immediate value imm (signed
comparison), otherwise sets rd to 0. Example: slti x3, x1, 5 checks if x1 < 5
and stores 1 or 0 in x3.
10.​beq rs1, rs2, lab: Branches to the label lab if the values in rs1 and rs2 are
equal; otherwise, the program counter increments to the next instruction. Example: beq
x1, x2, label branches to label if x1 == x2.

418
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

11.​jal rd, lab: Jumps to the label lab and stores the address of the next instruction
(current program counter + 1) in rd. Example: jal x1, label jumps to label and
stores the return address in x1.
12.​jalr rd, rs1, offset: Jumps to the address in rs1 + offset and stores the
return address (current program counter + 1) in rd. Example: jalr x1, x2, 4 jumps
to the address x2 + 4 and stores the return address in x1.
13.​sw reg, offset(rs1): Stores the value in register reg to memory at the address
rs1 + offset. Example: sw x3, 4(x1) stores the value in x3 to memory at the
address x1 + 4.
14.​lw reg, offset(rs1): Loads a value from memory at the address rs1 + offset
into register reg. Example: lw x3, 4(x1) loads a value from memory at x1 + 4 into
x3.

These instructions form the basis for building more complex operations in RISC-V, focusing on
simplicity and efficiency, typical of RISC architectures. In the rest of these classes, RISC-V will
be our target representation! However, we shall assume one simplification: we will have an
infinite surplus of variable names, just like in the LLVM Intermediate representation.

Who invented this notion of control-flow graphs?


Frances E. Allen is widely credited with pioneering the concept of the control-flow graph (CFG)
and introducing many foundational ideas in program optimization and analysis. Her
contributions to computer science, especially in the field of compiler design and
high-performance computing, are significant and transformative.

419
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Frances Allen's Contributions to Computer Science

1. Control-Flow Graph (CFG)

●​ Innovation: Frances Allen formalized the notion of the control-flow graph as a


representation of a program’s structure, focusing on the flow of control through basic
blocks (straight-line code sequences with a single entry and exit).
●​ Impact: CFGs are now fundamental to program analysis, enabling optimizations like
dead code elimination, loop unrolling, and data-flow analysis.

2. Program Optimization

●​ Allen was a trailblazer in developing program optimization techniques, especially for


high-performance computing.
●​ Her work helped define data-flow analysis, which determines how data values
propagate through a program.
●​ She introduced concepts like dominators in a control-flow graph, which are essential for
optimization and SSA (Static Single Assignment) form.

3. Parallelization and High-Performance Computing

●​ Allen worked extensively on automatic parallelization techniques, enabling compilers to


transform sequential programs into efficient parallel implementations.
●​ She was instrumental in designing tools for supercomputers, like IBM’s Stretch-Harvest
project, which tackled some of the earliest large-scale computing problems.

420
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4. Interprocedural Optimization

●​ Allen contributed to interprocedural analysis, enabling optimizations across function


boundaries, such as inlining and more accurate data-flow analysis.

5. Leadership and Legacy

●​ Allen mentored many researchers and established herself as a leader in computer


science.
●​ She co-authored seminal papers that shaped compiler theory and practical compiler
implementations.

Key Achievements and Recognition

●​ First Woman to Win the Turing Award (2006):


○​ Awarded for her pioneering contributions to the theory and practice of optimizing
compilers.
●​ Developed FORTRAN Compiler Improvements:
○​ Played a key role in improving FORTRAN, the first widely-used high-level
programming language.
●​ Work on Parallelism:
○​ Her work laid the foundation for modern compiler optimizations that leverage
multi-core and distributed computing architectures.

Broader Impact

Frances Allen's contributions extended beyond her technical achievements:

●​ Breaking Gender Barriers: As the first woman to win the Turing Award and a trailblazer
in a male-dominated field, she inspired generations of women in computing.
●​ Education and Advocacy: Allen actively promoted the importance of advancing
compiler technology for scientific and industrial applications.

Her groundbreaking contributions continue to influence programming languages, compilers, and


systems research today. If you'd like, I can elaborate on specific aspects of her work, like her
role in data-flow analysis or parallelization!

421
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Code Generation for Expressions


In the last class we saw 14 different RISC-V instructions. Could you write
an operational semantics for these instructions?
Find below a set of operational semantic rules for RISC-V instructions:

422
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Each of these rules operates within a state transition system, where the state (S) and registers
(R) are modified by the instructions, and the program counter (pc) advances. The rules describe
what happens to these components when a given instruction is executed. They specify:

●​ How values are loaded from registers (R[rs1], R[rs2]).


●​ How new values are stored into registers or memory.
●​ How the program counter (pc) is updated after each instruction.

Here goes an explanation of each rule:

1. BEQ (Branch if Equal)

●​ Compare the values in registers rs1 and rs2. If they are equal, update the program
counter (pc) by adding the immediate value imm. Otherwise, move to the next
instruction.

2. BEQF (Branch if Not Equal)

●​ If R[rs1] is not equal to R[rs2], update the program counter by adding the immediate
value imm. Otherwise, proceed to the next instruction.

3. JAL (Jump and Link)

●​ Save the address of the next instruction (pc + 1) in register rd and jump to the address
calculated by adding the immediate value imm to the program counter.

4. JALR (Jump and Link Register)

●​ Save the address of the next instruction (pc + 1) in register rd, then jump to the sum of
the value in register rs1 and the immediate value imm.

5. SW (Store Word)

●​ Store the value in register rs2 at the memory address calculated by adding the
immediate value imm to the value in register rs1.

6. LW (Load Word)

●​ Load the value from memory at the address calculated by adding the immediate value
imm to the value in register rs1, and store it in register rd.

423
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

7. ADD (Add)

●​ Add the values in registers rs1 and rs2 and store the result in register rd.

8. ADDI (Add Immediate)

●​ Add the immediate value imm to the value in register rs1 and store the result in register
rd.

9. MUL (Multiply)

●​ Multiply the values in registers rs1 and rs2, and store the result in register rd.

10. SUB (Subtract)

●​ Subtract the value in register rs2 from the value in register rs1, and store the result in
register rd.

11. XOR (Bitwise XOR)

●​ Perform a bitwise XOR of the values in registers rs1 and rs2, and store the result in
register rd.

12. XORI (Bitwise XOR Immediate)

●​ Perform a bitwise XOR between the value in register rs1 and the immediate value imm,
and store the result in register rd.

13. DIV (Divide)

●​ Divide the value in register rs1 by the value in register rs2 (signed integer division),
and store the quotient in register rd.

14. SLT (Set Less Than)

●​ If the value in register rs1 is less than the value in register rs2, set register rd to 1.
Otherwise, set it to 0.

424
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

That's the semantic of individual instructions. What would be the semantic


of an entire program?
Below, we have transition rules for an entire program. These rules describe how a RISC-V
program is interpreted step by step, with each instruction being fetched and executed until the
program either completes (through EBase) or continues processing further instructions (via
EIndc):

Let's go over the semantic rules for full programs:

1. EIndc (Instruction Execution)

●​ This rule applies when the current instruction at pc is a valid instruction (i.e., Prog[pc]
= inst).
●​ First, the instruction is fetched from the program counter (pc) and executed, producing a
new state (S', R', and pc').
●​ After executing this instruction, the program continues with the updated state (S', R',
pc'), fetching and executing the next instruction.
●​ The process repeats until the full program is executed, transitioning through different
states (S'', R'', pc''), as indicated by multiple steps in the rule.

2. EBase (End of Program)

●​ This rule applies when the program counter (pc) reaches a location in the program
where there is no valid instruction (i.e., Prog[pc] = ⊥).
●​ At this point, the program has no more instructions to execute, and the final state
remains as it is (S, R, pc).
●​ This rule essentially signifies the end of the program's execution.

425
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Imagine that you had to write the rules above in Python. What would they
look like?
Here's an idea of how you could translate the operational semantics rules for RISC-V programs,
specifically EIndc and EBase, into Python-like pseudocode. This will give you a sense of how
the rules would be implemented programmatically.

EIndc Rule in Python


def execute_instruction(store, registers, pc, program):
# Fetch the instruction at the current PC
instruction = program[pc]

# Check if the instruction exists (not end of program)


if instruction is None: # EBase
# End of program, return final state (Handled by EBase)
return store, registers, pc
else: # EIndc
# Execute the instruction and update the state (S', R', pc')
new_store, new_registers, new_pc = execute(instruction, store, pc)

# Recursive call to continue executing the next instruction


return execute_instruction(new_state, new_pc, program)

# Auxiliary function to execute a specific instruction


def execute(instruction, state, program_counter):
# This would contain the logic to execute a single RISC-V instruction
# and update the state, registers, and program counter accordingly
new_state = state # Just an example, logic would depend on the instruction
new_registers = {} # Update registers based on the instruction
new_pc = program_counter + 1 # Usually increment PC unless it's a jump
return new_state, new_registers, new_pc

Key Ideas:

●​ EIndc: The function execute_instruction fetches the current instruction, executes


it, updates the program state, and recursively calls itself to execute the next instruction.
This simulates the step-by-step execution of the program.
●​ EBase: If the instruction is None, the program has reached its end (⊥), and the current
state is returned without further execution.

This Python-like pseudocode gives you an idea of how the recursive execution of a RISC-V
program could look in code form, corresponding to the rules you're working with!

426
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's try to write a visitor to generate code for different programming


language constructs. We receive as input the tree-like representation of
programs, and output a sequence of instructions. Can you highlight the
general ideas for doing this code generation?
The approach we are using to implement code generation follows a structured process in
compiler design, and it involves transforming a program represented as an Abstract Syntax Tree
(AST) into low-level RISC-V instructions. We can initially assume that we have an infinite
surplus of registers. Whenever we need a new register, we just create a new register name.
Here's how the general idea works:

1. Abstract Syntax Tree (AST)

●​ The program is first represented as an AST, where each node represents a specific
construct in the programming language. For instance, variables, numbers, and arithmetic
operations like addition are all represented as nodes.
●​ These nodes form a hierarchical structure, where more complex expressions are
composed of simpler ones.

2. Visitor Pattern for Code Generation

●​ The Visitor Pattern is used to traverse the AST. The "visitor" visits each node of the
AST and generates the corresponding machine instructions.
●​ Each type of node in the AST (like a variable, number, or addition) has its own method in
the visitor, which handles how to generate code for that specific node type.

3. Code Generation Process

●​ For each node of the AST, the visitor generates the necessary instructions to evaluate
that expression. For example:
○​ A variable node fetches the value of the variable from memory or a register.
○​ A number node directly creates a load instruction to store the number into a
register.
○​ An addition node generates instructions to add the results of its two
sub-expressions.

4. Registers and Variables

●​ During code generation, the visitor uses registers to hold intermediate results. Since
you're assuming an infinite supply of registers (see last class), each step in the code
generation process returns a new register that holds the result of the computation for
that node.
●​ The generated instructions reference these registers, and each register corresponds to a
variable holding some value at runtime.

427
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Program Representation

●​ The program being generated is stored in a structure (let's call it prog, which contains
methods to insert and retrieve instructions). This program consists of a sequence of
instructions.
●​ The visitor adds instructions to prog as it traverses the AST, ensuring that the final
sequence of instructions correctly evaluates the entire expression.

6. Returning the Register with the Result

●​ After visiting a node, the visitor method returns the name of the register that holds the
result of the code generated for that node. For example:
○​ If visiting a number like 13, the visitor generates an instruction to load 13 into a
register and returns the register name.
○​ If visiting an addition like 13 + 10, the visitor generates instructions to add two
registers and returns the name of the register holding the result.

By using the visitor pattern and an AST, this approach ensures that the code generation process
is modular, with each type of node responsible for generating its own corresponding machine
instructions. This method is efficient and scalable, making it easy to extend and support more
complex constructs later on.

What's the name of what we just described?


The process we described is related to instruction selection but is more broadly part of code
generation. Let’s clarify the distinction and place instruction selection in context:

Code Generation Phases:

1.​ Instruction Selection:


○​ This is the process of translating the high-level constructs (like arithmetic
operations, control flow, etc.) into specific machine instructions (assembly or
machine code) for a target architecture.
○​ In a compiler, there may be many ways to implement a high-level operation (like
addition), and instruction selection is responsible for choosing the appropriate
machine instruction(s) for each part of the program based on the target
architecture (e.g., RISC-V, x86).
○​ For example, when translating an arithmetic expression like 13 + 10, instruction
selection might choose an ADD instruction on a RISC-V system or something like
ADDQ on an x86-64 system. The goal is to pick the right instruction for the
architecture.
2.​ Register Allocation:

428
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ After instruction selection, the next phase is register allocation. This is the
process of assigning physical registers to variables or intermediate values.
○​ In your case, you’re assuming an infinite number of registers for simplicity.
However, in real-world scenarios, processors have a limited number of registers,
and the compiler must decide how to efficiently use them.
3.​ Instruction Scheduling:
○​ This phase involves reordering instructions to optimize execution, taking into
account the latency of certain instructions or dependencies between instructions.
This ensures that the CPU pipeline is used efficiently.

Instruction Selection in Our Example:

When generating code for an arithmetic operation (like addition or multiplication), instruction
selection chooses the appropriate assembly instruction based on the operation. For example:

●​ For a simple number, it might use a "load immediate" instruction like Addi in RISC-V to
put a constant value in a register.
●​ For an Add operation, it would select the Add instruction.

Instruction selection takes place when you're deciding how to express a high-level operation in
terms of low-level machine instructions. For instance:

●​ A node representing a number (Num(13)) is converted into an instruction like Addi to


load the number into a register.
●​ A node representing an addition (Add) is converted into an Add instruction that adds the
results of two registers.

So, what's Instruction Selection then?


In a more general sense, instruction selection is the process within the code generation phase
where the compiler selects the machine instructions that will be used to implement each
operation in the source code. The selection depends on factors like:

●​ The target hardware architecture (e.g., RISC-V, ARM, x86).


●​ The specific operations that need to be performed.
●​ The cost (in terms of execution time or other factors) of different possible instructions.

Summary of Instruction Selection:

●​ Instruction selection is a part of the code generation process.


●​ It maps high-level language constructs (like arithmetic operations or variable
assignments) to specific machine instructions based on the target architecture.
●​ This process aims to produce efficient, correct code by selecting the best instructions for
the job, depending on what the processor supports and its architecture's characteristics.

429
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In our example, when the visitor generates an Addi instruction for a number or an Add
instruction for an addition, that is instruction selection in action. However, instruction selection
is only one part of the overall code generation phase, which includes additional steps like
register allocation and scheduling.

Let's get back to our code generator. How would you generate code for a
variable?
We can use a simple visit method that only returns the variable name. Remember, we are not
doing register allocation yet. Our visit method would be something like:

class GenVisitor(Visitor):
"""
The GenVisitor class compiles arithmetic expressions into a low-level
language.
"""
...

def visit_var(self, exp, prog):


"""
Usage:
>>> e = Var('x')
>>> p = AsmModule.Program({"x":1}, [])
>>> g = GenVisitor()
>>> v = e.accept(g, p)
>>> p.eval()
>>> p.get_val(v)
1
"""
return exp.identifier

This visit_var method correctly generates code for variables in a specific way because it
relies on a key assumption: the variable's value is already stored or accessible. Here's why
this approach works and why it doesn't generate any instructions:

1. What Does the Method Do?

●​ The method is responsible for generating the code corresponding to a variable (like x in
the example).
●​ Instead of generating any new instructions, it simply returns the variable's name
(exp.identifier), which is the same name used to reference it in the program.

2. Why No Instruction is Generated?

●​ Variables are already associated with a value: The assumption here is that the
variable's value has been previously initialized or loaded into a register (or is in memory).

430
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Therefore, there’s no need to generate instructions to calculate or assign the value of the
variable.
○​ The variable's identifier (exp.identifier) acts as a reference to this
pre-existing value.
○​ For example, if x is stored in a register or memory, the code generation process
can simply refer to it by its name ('x'), and no further computation is required.
●​ The program state already contains the variable: The AsmModule.Program (as
seen in the Usage section) already knows the value associated with the variable x (in
this case, x=1 is passed in). Since the value is known, no instructions are needed to
perform any computation or loading at this point.

3. Efficiency and Simplicity:

●​ By not generating unnecessary instructions, this method avoids redundant work. If


the value of the variable x is already in a register, generating another instruction to load
x would waste time and resources.
○​ Instead, the visitor simply returns the name (or "location") of the variable,
assuming that future instructions (in the program) can use this identifier to
reference the value of x when needed.

4. How this Differs from Other Visit Methods:

●​ Once we write visit_num, compare code generation for variables with that other
method, where an instruction (e.g., Addi) is generated to load a number into a register.
The reason visit_num generates an instruction is that the number is a constant that
needs to be placed somewhere in a register to be used later.
●​ Variables, on the other hand, are already named entities that map to a value in memory
or a register. So, there's no need to load them—we already know where they are.

5. What Happens Later?

●​ Once this method returns the variable name, later instructions generated by other
visit_ methods (e.g., for addition, subtraction, etc.) will use this variable name to
perform operations.
○​ For instance, when the visitor processes an arithmetic expression like x + 1, it
will refer to the name x (from visit_var) to get the value associated with x in
the register or memory and then generate the appropriate instructions for the
addition.

In Summary:

431
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The visit_var method doesn’t generate instructions because the value of the variable
is already assumed to exist in the program state (a register or memory).
●​ Instead, it returns the variable’s name (exp.identifier), which can be used by other
parts of the program to access its value.
●​ This approach optimizes the code generation process by avoiding unnecessary
instructions for variables that have already been initialized.

In this case, how would be the generation of code for number literals?
We can generate code for numbers via the simple visit_num method below:

class GenVisitor(Visitor):
...
def visit_num(self, exp, prog):
"""
Usage:
>>> e = Num(13)
>>> p = AsmModule.Program({}, [])
>>> g = GenVisitor()
>>> v = e.accept(g, p)
>>> p.eval()
>>> p.get_val(v)
13
"""
var_name = self.next_var_name()
prog.add_inst(AsmModule.Addi(var_name, "x0", exp.num))
return var_name

You might find it weird: we use an addition to put a constant into a register. But that's common!
Let's break down why code generation for numbers uses an addition instruction (Addi) to
load a constant into a register instead of simply assigning the number directly, as in var_name
= exp.num.

1. Registers vs Variables in Assembly Programming:

●​ In higher-level languages (like Python, C, etc.), you can directly assign values to
variables using statements like var_name = 13. These languages abstract away the
machine-level details.
●​ In assembly or low-level programming, however, the hardware does not have
"variables" in the same way. Instead, it has registers. Registers are small storage units
inside the CPU where you must load data before you can use it.
○​ So, to make the number 13 usable in assembly, you need to load it into a
register.
○​ This is done via specific instructions that interact with the CPU’s registers.

432
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Why Use an Addition (Addi) Instruction?:

●​ The reason an addition instruction (Addi) is used is because, in many assembly


languages (like RISC-V), there isn't always a specific instruction to just "put a number"
into a register. Instead, you use instructions like Addi to load values into registers
indirectly.​
Here's how it works:
○​ Addi stands for add immediate, meaning it adds a constant (the immediate
value) to the value in a register and stores the result in another register.
○​ In this case, the instruction is written as: Addi var_name, "x0", exp.num.
■​ What this does is:
●​ var_name is the destination register where we want to store the
value of exp.num (the number from the expression).
●​ "x0" is a special register in RISC-V that always contains the
value 0.
●​ exp.num is the constant number we want to load.
■​ The effect is equivalent to setting var_name = 0 + exp.num, which
just stores exp.num in var_name by adding it to zero.
○​ Since "x0" is always 0, the instruction Addi var_name, "x0", exp.num
effectively loads the constant exp.num directly into var_name.

3. Why Not Just Say var_name = exp.num?:

●​ In higher-level languages, it's easy to just say var_name = exp.num because they
automatically handle things like memory management and register allocation for you.
●​ But at the assembly level, the program needs to manually move data between memory
and registers, or between registers themselves. The assembly language does not have
an equivalent to Python's variable assignment directly.
○​ You must use explicit instructions to move data into registers before performing
any operations.

4. Why This Approach is Common in Assembly:

●​ Assembly languages usually use instructions like Addi because they’re simple and
efficient.
●​ Some architectures have special load instructions (like Load Immediate), but they
might not always be available, or they could be more complicated to implement.
○​ RISC-V (the architecture we’re targeting) focuses on keeping the instruction set
simple, so it uses existing instructions like Addi for tasks like loading constants.

5. How It Fits into the Code Generator:

433
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ visit_num generates an instruction (Addi) to load a constant into a register.


●​ The name of the register is generated using self.next_var_name(), which creates a
fresh register name for each number.
●​ The instruction is added to the program (prog.add_inst(...)), and the register
holding the number is returned for later use by other instructions.

Summary:

●​ In assembly code, you can't directly assign a value to a "variable" as you would in
higher-level languages. You need to move the value into a register using specific
instructions.
●​ The Addi instruction is used here to load the constant into a register because assembly
languages require explicit instructions to move data around.
●​ While Addi is technically an addition instruction, it’s used here as a way to load a
constant into a register by adding it to the always-zero register "x0".

Is it not wasteful to have a register x0 that always holds zero?

This design decision actually provides several benefits in RISC (Reduced Instruction Set
Computer) architectures like RISC-V, and it’s not unique to RISC-V. Let's explore why it's
useful, how other architectures like ARM and x86 handle similar scenarios, and why using
literals directly isn’t always preferable.

1. Why Have a Register Like x0 in RISC-V?

In RISC-V, x0 is a special register that always holds the value 0, and its contents cannot be
changed. This might seem wasteful at first glance, but it serves several important purposes that
actually enhance simplicity and efficiency:

a) Simplified Instruction Set:

●​ RISC architectures, like RISC-V, focus on simplicity and minimizing the number of
instructions. Instead of having many special-purpose instructions for operations involving
zero, a zero register allows a variety of operations to leverage the same set of
instructions.
○​ For example, rather than needing a dedicated instruction to move a constant zero
into a register, you can use existing instructions that involve x0. E.g., Addi with
x0 is effectively a way to load a constant without needing a separate "load
immediate" instruction.

b) Optimized Hardware Implementation:

434
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ From a hardware perspective, having a dedicated zero register simplifies the design of
the processor.
○​ Since x0 always holds 0, there's no need for additional circuitry to check or
modify its value. This consistency simplifies data paths and control logic within
the processor.

c) Efficient Operations:

●​ Many common operations involve zero, and having a dedicated register for zero reduces
the number of literal values that need to be encoded directly into instructions.
○​ Instead of using up bits in every instruction to represent the literal 0, instructions
that refer to x0 can use that space for other purposes, allowing for more efficient
use of instruction encoding.

2. Why Not Use Literal Zero Directly in Instructions?

In RISC-V, you could theoretically use immediate values (literals) directly in some instructions,
but there are trade-offs that make using x0 beneficial:

a) Instruction Length Limitations:

●​ In a RISC architecture, instructions are usually fixed-length (e.g., 32 bits in RISC-V). If


you have a literal in the instruction, you need space to encode it.
○​ By using x0, you can avoid wasting bits to encode the literal 0. This helps keep
instructions compact, which is especially important for RISC architectures that
prioritize fixed-length instructions for simplicity and performance.

b) Uniform Handling of Values:

●​ Using x0 for zero means the processor doesn’t need special cases for handling
operations involving zero. It can treat zero just like any other register value, leading to
more uniform instruction behavior.
○​ This is particularly useful for operations like conditional branches, where you
might want to check if a register is zero.

3. Benefits of Using a Zero Register in RISC Architectures:

●​ Efficiency: Using a zero register can make instruction encoding more efficient because
you don’t need to repeatedly encode the literal 0 in instructions. You just reference x0,
which is represented compactly in the instruction.
●​ Simplicity: It reduces the need for special cases in instruction sets and hardware logic.
Instead of having different instructions or data paths for zero, you can handle zero like
any other register value.

435
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Uniformity: Instructions become more uniform because you can apply the same
operations to zero as you would to any other value. This makes the design of both the
compiler and the processor simpler.

Conclusion:

Having a dedicated zero register like x0 in RISC-V (or xzr in ARM) is not a waste. Instead, it
simplifies both the processor’s hardware design and the instruction set architecture. It allows
zero to be handled like any other value, reducing the need for special instructions and improving
overall efficiency. In contrast, architectures like x86, which don’t have a zero register, rely on
their more complex instruction encoding and handling of literals, but they still manage to
accomplish the same kinds of operations—albeit with a different design philosophy.

Do other architectures like ARM and x86 also have an "always-zero"


register?
ARM: ARM does not have a dedicated zero register like RISC-V's x0, but it achieves similar
functionality in a different way.

●​ In AArch64 (64-bit ARM), when certain instructions specify the xzr register as the
source or destination, it acts like a zero register. For example, writing to xzr discards the
result, and reading from xzr always returns zero.
●​ Like x0 in RISC-V, this design helps avoid needing special instructions for zero-based
operations and keeps the instruction set simple.

x86: x86 does not have a dedicated zero register like RISC-V or ARM.

●​ Instead, x86 typically uses literal encoding for immediate values, including 0. The
instructions can directly operate on the literal zero, but this can make encoding slightly
more complex, and there’s a trade-off in terms of flexibility and uniformity.
●​ x86 compensates for the lack of a dedicated zero register by using its more complex
instruction set, which can handle a wider variety of operations directly on literals.

How would you generate code to represent boolean literals, such as true
and true?
We could generate code for booleans using the following visit method:

​class GenVisitor(Visitor):
...
def visit_bln(self, exp, prog):
"""
Usage:
>>> e = Bln(True)

436
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> p = AsmModule.Program({}, [])


>>> g = GenVisitor()
>>> v = e.accept(g, p)
>>> p.eval()
>>> p.get_val(v)
1

>>> e = Bln(False)
>>> p = AsmModule.Program({}, [])
>>> g = GenVisitor()
>>> v = e.accept(g, p)
>>> p.eval()
>>> p.get_val(v)
0
"""
if exp.bln:
var_name = self.next_var_name()
prog.add_inst(AsmModule.Addi(var_name, "x0", 1))
return var_name
else:
return "x0"

This visit_bln method handles the generation of code for boolean literals (True and False)
in a way that is consistent with RISC-V's handling of constants, using the x0 register and
immediate values. Let’s break down the code step by step to understand its logic and why it
works this way.

1. Boolean Expression (exp.bln):

The argument exp is an instance of a class Bln that presumably stores a boolean value (True
or False). The method checks if exp.bln is True or False:

●​ If exp.bln is True, the method proceeds to the first branch.


●​ If exp.bln is False, it returns "x0" directly, without generating any new instructions.

2. Handling True (Non-zero Value):

When the boolean literal is True, the method generates code that places the value 1 into a new
register:

var_name = self.next_var_name()
prog.add_inst(AsmModule.Addi(var_name, "x0", 1))
return var_name

437
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Step 1: A new variable (register) name is generated using self.next_var_name().


This is important because each variable (or temporary register) used in the generated
assembly code needs to have a unique name.
●​ Step 2: The method then adds an instruction to the program (prog.add_inst), which
loads the constant value 1 into this newly created register using the Addi instruction.
○​ Addi is the RISC-V instruction for "Add Immediate," which adds a constant
(immediate value) to a register. In this case, the constant 1 is added to the value
in the x0 register (which is always 0). This is effectively loading the value 1 into
var_name.
●​ Step 3: Finally, the method returns the name of this register (var_name), which can later
be used in subsequent instructions.

3. Handling False (Zero Value):

When the boolean literal is False, the method doesn’t create a new variable or register.
Instead, it returns "x0" directly:

return "x0"

●​ Why return "x0"?: The register x0 always holds the value 0, which represents False
in the context of booleans. By returning "x0", the method avoids the need to explicitly
generate instructions to load the value 0 into a register. This is efficient because it
leverages the fact that x0 is always zero.
●​ This is one of the benefits of having a zero register: we can directly use it to represent
False without consuming additional instructions or registers.

Why Use Addi for True?

You might wonder why the method uses the Addi instruction to load 1 for True, instead of
something like var_name = exp.num, as you mentioned before.

●​ RISC-V doesn’t have a dedicated "load immediate" instruction for small constants.
Instead, it uses instructions like Addi to load small constants by adding them to a base
register (x0 in this case, which is always zero).
●​ Using Addi with x0 as the source register effectively loads the immediate value (in this
case 1) directly into the target register (var_name).

Efficiency Considerations:

●​ For True: The code generates a single Addi instruction to load the value 1 into a
register, which is a straightforward and efficient way to handle the constant.

438
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ For False: It returns the zero register (x0) directly, which is highly efficient because no
additional instructions or registers are needed to represent the value 0.

Summary:

●​ When the boolean literal is True, the method generates a new variable and uses the
Addi instruction to load the value 1 into a register, since there’s no direct "load
immediate" in RISC-V for small constants.
●​ When the boolean literal is False, the method returns the x0 register directly, as it
always holds the value 0, making the code more efficient by avoiding unnecessary
instructions.
●​ This approach leverages RISC-V’s design features, like the x0 register, to generate
simple and efficient code for boolean literals.

How would you generate code to represent an addition, e.g., Add(e0, e1)?
To generate code for an addition expression like Add(left, right), the approach would
follow similar patterns to those you've already seen for numbers, variables, and booleans. The
goal is to generate assembly instructions that compute the sum of the two sub-expressions
(left and right) and store the result in a new register.

Here’s a step-by-step approach to implementing the visit_add method for generating code for
an addition expression:

Key Steps to Generate Code for Addition:

1.​ Generate Code for the Left Operand (left):


○​ The left operand can be an expression, which could be a variable, a number, or
even another operation. To generate code for it, you need to recursively visit the
left expression.
○​ The result of visiting left will be the name of the register or variable where the
value of left is stored.
2.​ Generate Code for the Right Operand (right):
○​ Similarly, generate code for the right operand by visiting the right expression.
○​ The result of visiting right will be the name of the register or variable where the
value of right is stored.
3.​ Create a New Register for the Result:
○​ After evaluating the left and right operands, create a new register to hold the
result of the addition.
4.​ Generate the Addition Instruction:
○​ Use the RISC-V Add instruction to add the values in the two registers (the results
of left and right).

439
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Store the result of this addition in the new register.

Example Code for visit_add:


def visit_add(self, exp, prog):
"""
Usage:
>>> e = Add(Num(3), Num(4))
>>> p = AsmModule.Program({}, [])
>>> g = GenVisitor()
>>> v = e.accept(g, p)
>>> p.eval()
>>> p.get_val(v)
7
"""
# Step 1: Generate code for the left operand
left_reg = exp.left.accept(self, prog)

# Step 2: Generate code for the right operand


right_reg = exp.right.accept(self, prog)

# Step 3: Create a new register to store the result.


# Remember: we have an infinite supply of register names!
result_reg = self.next_var_name()

# Step 4: Generate the addition instruction


prog.add_inst(AsmModule.Add(result_reg, left_reg, right_reg))

# Step 5: Return the result register


return result_reg

Breakdown of Each Step:

1.​ Generate Code for the Left Operand:


a.​ The accept method recursively calls the appropriate visit method for the left
sub-expression (which could be a number, variable, or another operation). This
method returns the name of the register or variable holding the result of
evaluating left.
2.​ Generate Code for the Right Operand:
a.​ Similarly, generate code for the right sub-expression. This will store the result
of evaluating right in a register or variable.
3.​ Create a New Register:
a.​ A new register is allocated to store the result of adding left and right. This
ensures that the result can be used in subsequent operations.

440
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ Generate the Addition Instruction:


a.​ The Add instruction is used to add the values in left_reg and right_reg, and
the result is stored in result_reg.
b.​ The AsmModule.Add is equivalent to the RISC-V Add instruction, which
performs a register-register addition: result_reg = left_reg +
right_reg.
5.​ Return the Result:
a.​ The name of the register holding the result of the addition is returned, so it can be
used in subsequent instructions or as the final result.

Why This Works:

●​ Recursion on Sub-expressions: The method recursively generates code for each


operand (left and right), allowing it to handle complex expressions like
Add(Add(Num(1), Num(2)), Var('x')).
●​ Result Register: By creating a new register (result_reg), the code generator ensures
that the result of the addition is safely stored without overwriting any intermediate values.
●​ Efficient Code Generation: Only one Add instruction is needed, and the code generator
reuses the code that was already implemented for handling variables, numbers, and
booleans.

Why You Can't Directly Use left + right:

●​ You can’t directly do something like result = left + right because:


○​ The operands (left and right) are expressions that could be variables,
constants, or other operations. You need to evaluate these expressions first, and
the result of evaluating an expression is stored in a register.
○​ The code must adhere to the target assembly language (e.g., RISC-V), which
operates on registers, not high-level language variables.

Conclusion:

The visit_add method uses the same principles seen in earlier examples. It recursively
generates code for the left and right operands, then combines the results with an Add
instruction, storing the sum in a new register. This ensures that code for complex arithmetic
expressions is correctly generated and efficiently executed.

441
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, essentially, we assume that we can generate code for subparts of an


expression, and then find a way to generate code for the whole
expression?
Yes, exactly! That's the essence of code generation in most compilers, and it's called recursive
code generation. The idea is to break down an expression into smaller, manageable parts
(often sub-expressions), generate code for each of these parts, and then combine the results to
form the complete code for the whole expression. This technique is based on the
compositionality of expressions: the assumption that if you can generate correct code for the
components, you can combine that code to get correct code for the larger expression.

Here’s a breakdown of how this process works conceptually:

1. Recursive Structure of Expressions

Expressions in most programming languages (and intermediate representations) have a


tree-like structure. For example:

●​ Add(Num(3), Num(4)) is an expression where Add is the root node, and Num(3) and
Num(4) are the child nodes.
●​ The same applies to more complex expressions, like Add(Var('x'), Add(Num(2),
Num(3))), which forms a nested tree.

The recursive structure of expressions naturally leads to a recursive approach for code
generation.

2. Base Cases and Recursive Cases

When generating code, each type of expression has its own rule:

●​ Base Cases: For simple expressions like Num(3) or Var('x'), the code generation is
straightforward. You generate an instruction that loads a constant or retrieves the value
of a variable.
○​ Example: For Num(3), you generate Addi(result, x0, 3), which loads 3
into a register.
○​ Example: For Var('x'), you just return the variable's name (or the register it
maps to).
●​ Recursive Cases: For compound expressions like Add(left, right), you generate
code for the left and right sub-expressions first (this is the recursive part), and then
combine the results (in this case, by adding them using an Add instruction).
○​ The code generation for Add(left, right) calls the code generation for both
left and right, producing code for each sub-expression before generating the
final Add instruction.

442
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. Register Management

Each sub-expression typically has its own result stored in a temporary register. By recursively
generating code for each sub-expression, the compiler keeps track of these intermediate
results, storing them in temporary registers or variables.

●​ In the case of Add(left, right), after generating code for left and right, their
results are stored in two registers, say r1 and r2.
●​ The Add instruction then combines the values in r1 and r2 and stores the result in a
new register, say r3.

4. Combining the Results

Once you have the results of the sub-expressions, you can generate instructions that combine
those results into a final value. This is the "glue" that puts the sub-expression results together:

●​ For an Add(left, right), the generated code might look like this:

Addi r1, x0, left_value ; Load the value of the left expression
Addi r2, x0, right_value ; Load the value of the right expression
Add r3, r1, r2 ; Add the two values and store in r3

●​ This combines the results of the left and right expressions into r3.

5. Final Result Propagation

After generating code for an expression, you return the result (often the name of a register) to
be used by higher-level expressions or the final evaluation. This is crucial for ensuring that the
result of one expression can be correctly used as an input to another.

Can you show how the code for Add(Num(21), Add(Var("x"),


Bln(True))) would look like?
We can use the tree below to show how the code would be produced for this example.

443
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Given that everything boils down to integers, is a program like


Add(Num(1), Bln(True)) valid in Assembly?
Well, more or less... You've identified a key distinction between static type systems and the
low-level operations performed at the assembly or machine level. Let's break this down:

1. At the Assembly Level, Everything is Just Data (Often Integers)

●​ At the assembly level, values like integers, booleans, and even characters are often
represented as simple binary data in registers or memory.
●​ For instance, a boolean True might be represented as 1, and False as 0. Therefore,
an addition like Add(Num(1), Bln(True)) is perfectly valid at the assembly level
because it’s just adding two integers (1 + 1 = 2).
●​ In assembly, there is no notion of "types" like booleans or integers—everything is just
a sequence of bits or integers. The operations (like Add, Mul, etc.) are agnostic to the
meaning behind those bits.

2. The Role of the Type System in High-Level Languages

●​ High-level languages enforce type rules to ensure programs behave correctly and
meaningfully. For example, adding a boolean to an integer would be nonsensical in
many high-level languages, and such an operation would typically be caught during type
checking.
●​ The type system ensures that expressions like Add(Num(1), Bln(True)) are not
allowed at the high-level. The type checker would raise an error before the program
reaches the code generation phase, because adding a boolean and an integer would
violate type rules (e.g., in strongly typed languages like Python, Java, or Haskell).

444
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. Type Checking Prevents Invalid Code from Reaching Code Generation

●​ In most modern compilers, type checking is a phase that happens before code
generation. If the program passes type checking, the compiler guarantees that the
operations being performed are valid according to the language's type rules.
●​ For instance, if the program tries to perform Add(Num(1), Bln(True)), the type
checker would reject the program before it gets to the code generation phase. As a
result, code generation would never encounter this problematic operation, because the
high-level program itself would be considered invalid.

4. Why This Distinction is Important

●​ Assembly code is untyped, meaning operations like addition can happen between any
values as long as the hardware supports them (e.g., adding two 32-bit registers).
●​ High-level programs are typed, meaning operations must follow the rules of the
language’s type system. This ensures that nonsensical operations (like adding booleans
and integers) are caught early, preventing them from being translated into assembly
code.

Example: Type Checking in Action

Imagine you have a high-level program like this in Python (which is strongly typed):

def add(a: int, b: int) -> int:


return a + b

add(1, True) # This would cause a type error

●​ Here, add(1, True) would raise a type error during static type checking (if you use
tools like mypy) or at runtime in Python.
●​ In languages with static type systems (e.g., Haskell, Java), such code would be rejected
before it is ever compiled to machine code or assembly.
●​ If the type system allows the program to pass type checking, the compiler then
guarantees that the code being generated will be valid and meaningful at runtime,
preventing unexpected behavior.

5. Conclusion: The Type System Protects the Code Generation Process

●​ Even though at the assembly level you can perform operations like Add(Num(1),
Bln(True)) (since they are just binary values), the type system in the high-level
language prevents this from happening.
●​ The type checker ensures that only valid operations are passed to the code generation
phase, meaning that nonsensical or invalid operations are caught early and do not make
it into the final assembly code.

445
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you implement code generation for let bindings?


Let's recap: our let-binding has the following form: let v = exp_def in exp_body, where:
●​ exp_def: The expression that defines the value of the variable v. For example, let v
= 2.
●​ exp_body: The expression where v is used. For example, let v = 2 in v + 3.

In the generated code, we want to:

1.​ Evaluate the expression (exp_def) that defines the variable.


2.​ Bind the result of that expression to the variable name (v).
3.​ Evaluate the body (exp_body) where the variable is used, replacing occurrences of v
with the result of exp_def.

Explanation of visit_let:
def visit_let(self, exp, prog):
"""
Handles code generation for let-bindings.
"""
# Step 1: Evaluate the defining expression (exp_def)
d_name = exp.exp_def.accept(self, prog)

# Step 2: Bind the result to the variable (identifier)


prog.add_inst(AsmModule.Addi(exp.identifier, d_name, 0))

# Step 3: Evaluate the body (exp_body) using the variable binding


v_name = exp.exp_body.accept(self, prog)

# Return the result of evaluating the body expression


return v_name

Step 1: Evaluate the Defining Expression (exp_def)


●​ This step evaluates the expression that defines the value of the variable (exp_def). The
method recursively calls accept on exp.exp_def, which evaluates the definition.
●​ The result is stored in a register (or a variable), and d_name holds the name of this
register.
●​ For example, if exp_def is Num(2), the method will generate code to load 2 into a
register (like r1), and d_name will hold the reference to this register.

Step 2: Bind the Result to the Variable (exp.identifier)

446
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ This step binds the result of exp_def (held in d_name) to the variable
exp.identifier. The variable's identifier (e.g., "v") becomes an alias for the register
holding the result of the defining expression.
●​ The Addi instruction is used to copy the value from d_name into the variable's name
(exp.identifier). This operation doesn't change the value—it's effectively an
assignment operation. In RISC-V, this is done using Addi with an immediate value of 0,
meaning it's simply copying the value.
○​ Example: If d_name is r1 and exp.identifier is "v", this will effectively bind
the value of r1 (e.g., 2) to "v".

Step 3: Evaluate the Body (exp_body)


●​ After binding the variable to its defining expression, the body of the let-expression
(exp_body) is evaluated. The accept method is called recursively on exp_body to
generate code for this part.
●​ Within this body, the variable (v) is now bound to the result of exp_def, meaning any
occurrence of v in the body will use the value computed in Step 1.
○​ Example: If exp_body is Add(Var('v'), Num(3)), the code generation will
correctly refer to the value of v as 2, and then add 3 to it.

Return the Result:


Finally, the method returns the name of the register that holds the result of evaluating
exp_body. This value is the final result of the entire let expression.

Can you show how code would be generated for let x <- 12 in 21 +
(x + true) end?
The figure below shows the code generation process for this particular program. A few things to
notice:
●​ The visit_let method generates code for let-bindings by first evaluating the defining
expression, binding its result to the variable, and then evaluating the body expression
where the variable is used.
●​ It uses an Addi instruction to bind the value of the defining expression to the variable
and allows recursive evaluation of nested let-expressions.
●​ This approach ensures that values are correctly tracked and referenced throughout the
program, even in the presence of nested or complex let-bindings.

447
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

But this approach to code generation assumes that variables have different
names, right? Otherwise, what could go wrong?
That's correct. This approach assumes that every let binding defines a variable with a different
name. Otherwise, imagine that we have something like:

let x <- 12 in let x <- 13 in x end + x end

This code should return 25. But our generator would mix the two definitions of variable x. Thus,
the result would be 26. The issue we're describing arises because the current implementation
for generating code for let-bindings doesn't account for variable shadowing. In functional
languages (and many others), variables declared in nested scopes can shadow variables with
the same name from outer scopes. The current approach uses the same variable name for both
bindings of x, leading to the incorrect result of 26 instead of 25.

How could you solve this issue?


There are many ways to handle this problem. One approach would be to preprocess the
program, renaming all the variables to ensure that they have different names. Or else we could
rename them during code generation. This can be achieved by introducing scoping in the code
generation. You can use a stack-based mechanism to keep track of variable names, and every
time you enter a new scope (such as inside a nested let), you generate a new name for
variables to avoid collisions.

448
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Revised Approach

1.​ Introduce a variable name stack: Use a stack or a similar data structure to keep track
of variable names for different scopes. When entering a new let binding, push the new
variable name onto the stack. When exiting the scope, pop the variable name off the
stack.
2.​ Generate unique names: When processing a let binding, generate a unique internal
name for the variable (e.g., by appending a counter or some identifier) while maintaining
the original name in the stack to correctly reference it.

Revised visit_let Code Example:

class CodeGenVisitor:
def __init__(self):
# A dictionary to map variable names to a stack of their unique versions
self.var_scope = {}

def visit_let(self, exp, prog):


"""
Handles code generation for let-bindings with support for shadowing.
"""
# Step 1: Generate a unique name for the variable in the let-binding
unique_name = self._generate_unique_name(exp.identifier)

# Step 2: Evaluate the defining expression (exp_def) and bind to unique name
d_name = exp.exp_def.accept(self, prog)
prog.add_inst(AsmModule.Addi(unique_name, d_name, 0))

# Step 3: Evaluate the body (exp_body) using the variable binding


v_name = exp.exp_body.accept(self, prog)

# Step 4: Remove the unique name from the scope when leaving the let-binding
self._pop_variable(exp.identifier)

# Return the result of evaluating the body expression


return v_name

def _generate_unique_name(self, var_name):


"""
Generates a unique name for the variable to avoid shadowing.
"""
if var_name not in self.var_scope:
self.var_scope[var_name] = []

# Create a unique name by appending a counter or unique suffix


unique_name = f"{var_name}_{len(self.var_scope[var_name])}"

# Push the unique name onto the stack for this variable
self.var_scope[var_name].append(unique_name)

return unique_name

449
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def _pop_variable(self, var_name):


"""
Pops the variable name off the stack when exiting a scope.
"""
if var_name in self.var_scope and self.var_scope[var_name]:
self.var_scope[var_name].pop()

def _get_current_var_name(self, var_name):


"""
Returns the current variable name (the one on top of the stack).
"""
if var_name in self.var_scope and self.var_scope[var_name]:
return self.var_scope[var_name][-1]
return var_name # If no stack entry, return the original name

Explanation:

1.​ _generate_unique_name: When a let binding is encountered, this method generates


a unique name for the variable by appending a counter or a suffix to avoid name
collisions. The new name is pushed onto the stack associated with that variable name.
2.​ _pop_variable: Once the body of the let expression is evaluated, the unique name is
popped from the stack, restoring the previous scope's variable name.
3.​ _get_current_var_name: This helper method retrieves the current variable name from
the stack, which ensures that you always reference the correct variable when generating
code.

Example of Usage:

For the expression:

let x <- 12 in let x <- 13 in x end + x end

This code would generate something like:

x_0 = 12
x_1 = 13
result = x_1 + x_0

Thus, the inner x would use x_1, while the outer x remains x_0, resulting in 13 + 12 = 25,
which is the correct behavior.

How would you generate code for an if-then-else expression?


To generate code for a conditional expression (i.e., an if-then-else statement), we need
to evaluate the condition and then either execute the "then" branch or the "else" branch based
on whether the condition is true or false. This is a classic use of branching instructions in
assembly languages like RISC-V.

450
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's break down how to generate code for this type of expression step-by-step, following the
structure of the IfThenElse class you’ve provided.

General Idea:

1.​ Evaluate the Condition (cond):


○​ First, generate code to evaluate the condition (B). The result will be stored in a
register, which holds a boolean value (True as 1 and False as 0).
2.​ Branch Based on the Condition:
○​ Use a conditional branch instruction to jump to either the "then" branch (e0) or
the "else" branch (e1) based on the value of the condition.
○​ If the condition is True, jump to the "then" block.
○​ If the condition is False, jump to the "else" block.
3.​ Generate Code for the Selected Branch:
○​ Only one of the two branches (e0 or e1) should be evaluated. The other should
be skipped entirely.
4.​ Combine the Results:
○​ Once the selected branch is executed, return the result of the evaluation. This
requires jumping over the code of the other branch once one branch has finished
executing.

Assembly Code Considerations:

●​ Branching: Use a conditional branch instruction (beq or bne) to check if the condition is
true or false.
●​ Labels: You need labels for branching to different parts of the code. One label for the
start of the "else" branch and one for jumping over the "else" branch after the "then"
branch is executed.

Code Example: visit_ifThenElse


def visit_ifThenElse(self, exp, prog):
"""
Generates code for an if-then-else expression.
"""
# Step 1: Evaluate the condition (exp.cond)
cond_reg = exp.cond.accept(self, prog)

# Create labels for the 'then' and 'else' branches


then_label = prog.new_label("then")
else_label = prog.new_label("else")
end_label = prog.new_label("end_if")

# Step 2: Branch to the 'else' block if the condition is False (cond_reg == 0)


prog.add_inst(AsmModule.Beq(cond_reg, "x0", else_label))

451
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

# Step 3: Generate code for the 'then' block (exp.e0)


prog.add_label(then_label)
then_result = exp.e0.accept(self, prog) # Evaluate the 'then' expression
# After 'then', jump to the end of the 'if'
prog.add_inst(AsmModule.Jump(end_label))

# Step 4: Generate code for the 'else' block (exp.e1)


prog.add_label(else_label)
else_result = exp.e1.accept(self, prog) # Evaluate the 'else' expression

# Step 5: End label (after either branch)


prog.add_label(end_label)

# The result is the value from either the 'then' or 'else' block
return then_result if then_result is not None else else_result

Conceptual Overview:

An if-then-else expression involves:

1.​ Evaluating a condition (cond).


2.​ Based on the result of the condition, either:
○​ Evaluating the "then" branch (e0) if the condition is True.
○​ Evaluating the "else" branch (e1) if the condition is False.
3.​ Only one branch (e0 or e1) is evaluated, and the result of the evaluated branch is
returned.

Steps to Generate Code:

Step 1: Evaluate the Condition (cond)

The first step is to generate code to evaluate the condition, which is an expression that results in
a boolean value (True or False). This value is stored in a register.

●​ If the condition evaluates to True, it will be represented by 1.


●​ If the condition evaluates to False, it will be represented by 0.

The result of evaluating the condition is needed for the next step, where we decide which
branch to take.

Step 2: Conditional Branching

Once the condition is evaluated, we need to generate a conditional branch that determines
which block of code to execute:

●​ If the condition is True, we execute the "then" branch.


●​ If the condition is False, we execute the "else" branch.

452
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

To achieve this, we generate a branching instruction that checks whether the result of the
condition is False (i.e., 0). If it’s False, the program jumps to the "else" branch. If the condition
is True, the program continues to the "then" branch.

Step 3: Generate Code for the "Then" Block (e0)

If the condition is True, we generate code for the "then" block (e0). The result of evaluating the
"then" expression is stored in a register. Once the "then" block has been executed, the program
must skip the "else" block and jump to the end of the if-then-else construct.

Step 4: Generate Code for the "Else" Block (e1)

If the condition is False, the program jumps to the "else" block (e1). Here, we generate code
for the "else" expression, and the result is stored in a register. Once the "else" block has been
evaluated, the program proceeds to the next part of the overall execution.

Step 5: End of the If-Then-Else Expression

At the end of the if-then-else expression, after either the "then" or the "else" block has been
executed, we return the result of the block that was executed.

To manage the control flow between the "then" and "else" blocks, we use labels and jump
instructions:

●​ We place labels at the start of the "then" block, the "else" block, and the end of the
if-then-else expression.
●​ After the "then" block is executed, the program jumps over the "else" block to the end
label.
●​ If the "else" block is executed, the program continues naturally to the end of the
expression.

Why This Works:

●​ Conditional branching ensures that only one of the two branches is executed,
depending on the value of the condition.
●​ Labels are used to mark the locations of the "then" and "else" blocks, and jump
instructions control the flow between these blocks.
●​ The final result of the if-then-else expression is the value computed by whichever
block was executed, ensuring that the program behaves correctly according to the
conditional logic.

This approach effectively mirrors the behavior of conditional expressions in high-level


languages, where control flow branches based on the evaluation of a condition and only one
branch is executed.

453
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How would be the assembly code for IfThenElse(Bln(True),


Num(42), Num(30))?
For an expression like IfThenElse(Bln(True), Num(42), Num(30)), the generated
assembly might look like this:

# Evaluate condition (Bln(True) -> 1)


addi cond_reg, x0, 1 # Load 1 (True) into cond_reg

# Conditional branch
beq cond_reg, x0, else_label # If cond_reg == 0, jump to else

# 'Then' block (Num(42))


then_label:
addi result_reg, x0, 42 # Load 42 into result_reg
j end_if # Jump to end

# 'Else' block (Num(30))


else_label:
addi result_reg, x0, 30 # Load 30 into result_reg

# End of the if statement


end_if:

Summary:

●​ The condition is evaluated first, and based on its result, the program either jumps to the
"then" or "else" block.
●​ Only one of the branches is executed, which aligns with the semantics of conditional
expressions.
●​ The result of the if-then-else expression is the value from either the "then" or "else"
branch, depending on the condition.
●​ Labels and branching instructions are key to implementing conditional logic at the
assembly level, where flow control is explicit.

But this unconditional jump instruction is not part of our set of 12


instructions. I don't think it is standard. How would you simulate it?'
You are correct that the j (jump) instruction is not part of the standard RISC-V instruction set.
Instead, RISC-V provides a set of branch and jump instructions, such as:

●​ jal (jump and link): Used to jump to a target address and optionally save the return
address in a register.
●​ jalr (jump and link register): Similar to jal, but allows the target address to be
computed dynamically.

454
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Simulating j Using Standard Instructions

The j instruction, as in j end_if, performs an unconditional jump to a target address without


saving the return address. This can be implemented in RISC-V using jal with the return
address (rd) set to x0, which discards the return address:

jal x0, end_if

Here:

●​ x0 is the zero register in RISC-V, and writing to it has no effect.


●​ end_if is the label representing the target address.

Explanation

●​ j end_if: Non-standard instruction that unconditionally jumps to the address labeled


end_if.
●​ Equivalent Standard Instruction: jal x0, end_if, where jal computes the target
address relative to the current program counter (PC) and jumps to it without saving the
return address (since rd = x0).

This approach aligns with the standard RISC-V encoding and avoids relying on any custom
instruction like j.

Not every instruction is available in RISC-V. For instance, how would you
implement logical negation?
High-Level Operation:

In high-level languages, the logical negation (NOT) operation flips the value of a boolean:

●​ NOT True becomes False


●​ NOT False becomes True

In RISC-V, however, there isn’t a direct logical negation instruction like NOT. To perform this
operation, we need to combine a few instructions.

How Logical Negation Can Be Implemented:

●​ In RISC-V, True is typically represented by 1 and False by 0.


●​ To negate a boolean value in RISC-V, we can use:
○​ The XOR instruction (XORI) with the value 1. This works because:
■​ 1 XOR 1 = 0 (negating True gives False)

455
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ 0 XOR 1 = 1 (negating False gives True)

Thus, we can use the XOR operation with the value 1 to implement logical negation.

Code Generation for Logical Negation (NOT):

def visit_not(self, exp, prog):


"""
Generates code for the logical negation of a boolean expression.
"""
# Step 1: Evaluate the expression (which is a boolean)
value_reg = exp.operand.accept(self, prog)

# Step 2: Negate the boolean value using XORI with 1


result_reg = self.next_var_name()
prog.add_inst(AsmModule.Xori(result_reg, value_reg, 1))

# Step 3: Return the result of the negation


return result_reg

Explanation:

Step 1: Evaluate the Boolean Expression

●​ The method first evaluates the operand (the boolean value that we want to negate). The
result of this evaluation is stored in value_reg, which will hold either 1 (for True) or 0
(for False).

Step 2: Negate the Boolean Value Using XOR

●​ To negate the boolean value, we use the XORI (bitwise XOR with an immediate value)
instruction.
○​ XORI with 1 effectively negates the value:
■​ 0 XOR 1 = 1 (negating False results in True)
■​ 1 XOR 1 = 0 (negating True results in False)
●​ The result is stored in result_reg.

Step 3: Return the Result

●​ Finally, we return the result stored in result_reg, which now contains the negated
boolean value.

456
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This implementation of logical negation assumes a well-typed program,


right? What would happen on an ill-typed program?
The implementation of logical negation that we've seen does indeed assume that the high-level
program is well-typed, meaning the operand (exp.operand) is guaranteed to be a boolean
value, which would be either 0 (for False) or 1 (for True). This is a key assumption of the code
generation process, and it relies on the type system of the high-level language to ensure that
only valid boolean values reach the negation operation.

Why This Assumption is Important:

1.​ Well-Typed Programs:


○​ In a well-typed program, the type system ensures that exp.operand will always
be a boolean value (either 0 or 1 in binary). This guarantees that the XORI
instruction will behave as expected:
■​ 1 XOR 1 = 0 (negating True yields False)
■​ 0 XOR 1 = 1 (negating False yields True)
2.​ Ill-Typed Programs:
○​ In an ill-typed program, where a non-boolean value (e.g., 3, which is 11 in
binary) is passed to the NOT operation, the result would indeed be incorrect. For
instance:
■​ If exp.operand = 3 (which is 11 in binary), applying XORI 3, 1 would
produce 3 XOR 1 = 2, which is incorrect for a boolean negation.
■​ This demonstrates that logical negation via XORI only works correctly
when the input is either 0 or 1. For other values, the result will be
undefined or incorrect.

Type System’s Role:

●​ Type Checking: In a typical compiler, the type system ensures that only valid boolean
values are passed to the NOT operation. During type checking (a phase that happens
before code generation), the compiler verifies that operations like NOT are only applied to
boolean expressions.
●​ Prevents Incorrect Inputs: This type checking prevents cases where non-boolean
values, like 3 or 11, are passed to boolean operations, ensuring that ill-typed programs
are caught before they reach the code generation phase.

What Would Happen in an Incorrectly Typed Program:

If the type checker failed to catch a mistake (or if no type checker exists), and a non-boolean
value (e.g., 3) was passed to the NOT operation, the XORI instruction would still execute but
produce a result that doesn't match the expected logical negation behavior:

457
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Input: 3 (binary 11) → XORI 3, 1 would result in 2 (binary 10), which is not a valid
boolean result.

This illustrates how important the type system is in ensuring that only valid, well-typed
expressions are passed to the code generation phase.

Example of ill-Typed Behavior:

Let’s consider an ill-typed program where the operand passed to NOT is 3 instead of 0 or 1:

# Suppose exp.operand evaluates to 3


exp.operand = 3

# XORI operation (assuming 3 is passed)


result = 3 ^ 1 # This results in 2, which is not a valid boolean value

Summary:

●​ The implementation of logical negation using XORI assumes that the input is well-typed
and limited to 0 or 1, representing boolean values.
●​ In an ill-typed program, where the operand is not a valid boolean, the result of using
XORI would be incorrect, as it would not produce the expected logical negation behavior.
●​ Type checking in high-level languages ensures that only valid boolean values are
passed to the NOT operation, preventing such issues from reaching the code generation
phase.

Would it be possible to have a "type-safe" implementation of logical


negation, so that it produces zero whenever the operand is non-zero, and
vice-versa?
Yes, it is possible to implement a type-safe version of logical negation where Not(e)
behaves correctly for any integer value of e, ensuring that:

●​ Not(e) returns 0 if e is non-zero (i.e., True).


●​ Not(e) returns 1 if e is zero (i.e., False).

This can be achieved using a combination of RISC-V instructions that effectively normalize any
non-zero value to 1 and then invert it using logical operations.

Idea:

The goal is to ensure that for any non-zero e, the result of Not(e) will be 0, and for e == 0,
the result will be 1. The key to achieving this is:

458
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Normalize the value of e so that any non-zero value becomes 1.


2.​ Apply logical negation to this normalized result.

Steps to Achieve Type-Safe Negation:

1.​ Normalize Non-Zero Values:


○​ We can use a comparison instruction like SLTIU (Set Less Than Immediate
Unsigned) to normalize any non-zero value to 1. This works by comparing the
value of e with 1:
■​ If e == 0, the result will be 1.
■​ If e != 0, the result will be 0.
2.​ Apply Negation:
○​ Once the value is normalized, we can simply use this result as the negation. No
further XOR operations are needed, as we've already reduced all non-zero
values to 0 and zero to 1.

Type-Safe Logical Negation Code Generation:


def visit_not_but_type_safe(self, exp, prog):
"""
Generates code for type-safe logical negation.
'Not(e)' returns 1 if e == 0, otherwise it returns 0 for any non-zero e.
"""
# Step 1: Evaluate the operand expression (which may not be a boolean)
value_reg = exp.operand.accept(self, prog)

# Step 2: Use SLTIU to normalize the result:


# 1 if value_reg == 0, 0 if value_reg != 0
result_reg = self.next_var_name()
prog.add_inst(AsmModule.Sltiu(result_reg, value_reg, 1))

# Step 3: Return the normalized and negated result


return result_reg

Explanation:

Step 1: Evaluate the Operand

●​ The operand exp.operand is evaluated first, and its result is stored in value_reg.
The operand could be any expression that evaluates to an integer (not necessarily a
boolean).

Step 2: Normalize Using SLTIU

●​ We use the SLTIU instruction, which stands for Set Less Than Immediate Unsigned.
This instruction compares the value in value_reg with the immediate value 1:
○​ If value_reg == 0, the result is 1 (indicating True, i.e., Not(False)).

459
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If value_reg != 0, the result is 0 (indicating False, i.e., Not(True)).


●​ In this way, the value is normalized so that any non-zero value becomes 0, and zero
becomes 1.

Step 3: Return the Result

●​ The result of this normalized and negated operation is stored in result_reg, which is
returned as the final value of the negation.

Example:

Let’s say we evaluate Not(3) and Not(0) using the above code generation approach.

●​ Not(3):
○​ SLTIU compares 3 with 1 and sets the result to 0 (since 3 != 0). The result of
Not(3) is 0.
●​ Not(0):
○​ SLTIU compares 0 with 1 and sets the result to 1 (since 0 == 0). The result of
Not(0) is 1.

How This Works:

●​ Normalization: The SLTIU instruction effectively normalizes any non-zero value to 0


and zero to 1, allowing us to safely implement logical negation without making any
assumptions about the operand being well-typed.
●​ Type-Safe: This implementation works regardless of whether the operand is a boolean
or an integer, ensuring that Not(e) always behaves correctly for any value of e.

Why SLTIU Works for This Case:

●​ Unsigned Comparison: SLTIU is designed to perform unsigned comparisons, which


is why comparing the operand with 1 works correctly for any value of e. If e == 0,
SLTIU returns 1, and for all other values, it returns 0.

Summary:

●​ By using the SLTIU instruction, we can implement a type-safe version of logical negation
where any non-zero value is treated as True and zero is treated as False. This
approach guarantees that Not(e) is always 1 if e == 0 and 0 if e != 0, regardless of
the original value of e.
●​ This method ensures that the code generator can handle ill-typed inputs and prevents
incorrect behavior during logical negation.

460
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Who invented this RISC-V architecture that we shall be using as our target
for code generation?
The RISC-V architecture was invented by a team of researchers at the University of California,
Berkeley, in 2010. The key individuals involved in the development of RISC-V include:

●​ Krste Asanović: A professor at UC Berkeley and one of the principal architects of


RISC-V. He has been a significant advocate for open instruction set architectures (ISAs)
and led the design effort.
●​ David A. Patterson: A professor emeritus at UC Berkeley, who is renowned for his
contributions to the development of the original RISC (Reduced Instruction Set
Computing) concept in the 1980s. He co-authored the seminal textbook Computer
Organization and Design, which now includes content on RISC-V.
●​ Andrew Waterman: A PhD student at the time of RISC-V’s inception, Waterman played
a critical role in implementing and designing the early versions of RISC-V.

The team was motivated by the need for a clean-slate, open, and extensible instruction set
architecture that could be used for academic research and industrial applications without
licensing restrictions. RISC-V was intentionally designed to be simple yet scalable, enabling its
adoption in a wide range of devices, from small embedded systems to powerful
supercomputers.

Is that the same David Patterson of the Computer Architecture Books?


Yes! David A. Patterson is a highly influential figure in computer science and has made
numerous groundbreaking contributions. He is widely regarded as one of the pioneers of
modern computing, particularly in computer architecture.

Who is David A. Patterson?

●​ Profession: Professor Emeritus at the University of California, Berkeley.


●​ Field: Computer Architecture, Systems, and Education.

461
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Recognitions:
○​ Turing Award (2017), often referred to as the "Nobel Prize of Computing," which
he shared with John L. Hennessy for their work on RISC architectures.
○​ Fellow of the ACM, IEEE, and National Academy of Engineering.

Major Contributions to Computer Science

1.​ RISC (Reduced Instruction Set Computing):


○​ Alongside John L. Hennessy, Patterson introduced and championed the RISC
design philosophy in the 1980s. This approach simplified processor instruction
sets, leading to faster and more efficient CPUs.
○​ RISC architectures have become foundational in modern computing, influencing
designs like ARM, RISC-V, and MIPS.
2.​ RISC-V:
○​ Patterson has been a strong advocate for open instruction set architectures.
Although the development of RISC-V is primarily credited to Krste Asanović and
his team, Patterson's mentorship and RISC expertise deeply influenced its
philosophy and success.
3.​ RAID (Redundant Array of Independent Disks):
○​ Patterson co-developed the RAID concept in 1987, which revolutionized data
storage by using multiple inexpensive disks to provide redundancy and improve
performance. RAID remains a cornerstone of modern storage systems.
4.​ Computer Science Education:
○​ Patterson co-authored several foundational textbooks, such as:
■​ Computer Organization and Design (with John L. Hennessy): A widely
used undergraduate textbook on computer architecture.
■​ Computer Architecture: A Quantitative Approach (with Hennessy): A
seminal graduate-level book that has guided research and development
in the field for decades.
5.​ SPARC Architecture:
○​ Patterson contributed to the development of the SPARC (Scalable Processor
Architecture), a RISC architecture used in Sun Microsystems’ processors.
6.​ Berkeley Projects:
○​ He led various impactful projects at UC Berkeley, including the Berkeley RISC
project, which laid the foundation for the RISC revolution, and later projects in
parallel computing and machine learning.
7.​ Advocate for Open-Source Hardware:
○​ Patterson is a vocal supporter of open-source hardware initiatives, such as
RISC-V, and emphasizes the importance of democratizing access to high-quality
hardware designs.

Legacy

462
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

David Patterson’s contributions have profoundly shaped computer architecture and systems
design, making computing faster, more efficient, and more accessible. His influence extends
through his research, writings, and mentorship of many students who have gone on to make
their own mark in the field.

463
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Code Generation for Statements


All the code that we have produced in the last class was for constructs that
always return a value (expressions). But imperative programs also have
statements. Do you know what a "statement" is?
In imperative programming languages like C, constructs like if-then-else and while loops are
statements rather than expressions. This means they control the flow of the program but do
not return a value. When generating code for these constructs, the focus shifts from computing
and returning values to manipulating the program's control flow.

Let’s break down how code generation changes for these control-flow constructs:

Key Differences Between Expressions and Statements:

●​ Expressions: Always return a value. Examples include arithmetic operations, boolean


operations, and function calls. Code generation for expressions focuses on evaluating
sub-expressions and returning results via registers.
●​ Statements: Do not return values; instead, they alter the program's control flow (e.g.,
jumping to different code blocks). Examples include if-then-else, while loops, and
assignment statements. Code generation for statements focuses on managing the
program counter and generating branches and jumps.

Can you give me semantic rules for typical statements, e.g., if-then-else,
while, etc?
I'll provide an overview of operational semantics rules for typical imperative constructs like
if-then-else, while loops, and assignment. As we had previously discussed, in operational
semantics, we describe the execution of programs as transitions between configurations
(which include the state of memory, control flow, etc.). For statements, we typically express how
the program moves from one configuration to the next. See below examples of small-step
operational semantics, where the execution is broken into atomic steps:

464
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Assignment Statement: x := e

The assignment statement x := e evaluates the expression e and stores the result in the
variable x. The semantics specify how the state is updated.

Explanation:

465
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Premise: The expression e evaluates to a value v in the current state s.


●​ Conclusion: The program state is updated so that the variable x is now mapped to the
value v. The notation s[x \mapsto v] means that the state s is modified such that x
now holds v.

2. If-Then-Else Statement: if b then S1 else S2

The if-then-else statement evaluates the boolean expression b. If b evaluates to true, it


executes the first statement (S1); otherwise, it executes the second statement (S2).

Explanation:

●​ True Condition: If b evaluates to true in state s, the execution proceeds with the
statement S1 in the same state s.
●​ False Condition: If b evaluates to false, the execution proceeds with S2 in the same
state s.

3. While Loop: while b do S

The while loop evaluates the boolean expression b. If b is true, it executes the body S and
then repeats the process. If b is false, it terminates the loop.

Explanation:

●​ True Condition: If b evaluates to true, the loop body S is executed, and after that, the
entire while loop is executed again (hence the transition to S; while b do S).
●​ False Condition: If b evaluates to false, the loop terminates, and the state s remains
unchanged (the program just proceeds with whatever comes after the loop).

4. Sequence of Statements: S1; S2

The sequence S1; S2 means that S1 is executed first, and then S2 is executed.

Explanation:

●​ The first rule handles the case where S1 reduces in multiple steps to a sub-statement
S1', so the sequence continues with S1' followed by S2.
●​ The second rule handles the case where S1 finishes, and the execution continues with
S2.

5. Skip Statement: skip

466
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The skip statement does nothing and simply returns the current state.

Explanation:

●​ The skip statement doesn’t alter the state, so the program proceeds with no changes.

Putting It All Together:

With these rules, we can now formally describe how typical imperative constructs work in terms
of transitions between states. Here’s how they fit into code generation and execution:

●​ Assignments change the values stored in variables.


●​ If-then-else branches execution based on a condition but doesn’t return values like the
expression-based if does.
●​ While loops repeatedly check a condition and execute a block of code.
●​ Sequence connects multiple statements into one, ensuring they run in order.

These rules form the basis of operational semantics for imperative programs and dictate how
control flows through a program. They help in defining how constructs behave, how the state is
updated, and how execution proceeds from one step to the next.

How would you generate code for an if-then-else statement?

Unlike the if-then-else expression, the C-style if-then-else statement doesn't return a
value. Its purpose is to execute one block of code if a condition is true and another block of
code if the condition is false. After either block, the program continues to the next statement.

Code Generation Strategy:

1.​ Evaluate the condition: Generate code to evaluate the condition.


2.​ Branch based on the condition:
○​ If the condition is true, jump to the "then" block.
○​ If the condition is false, jump to the "else" block (if it exists).
3.​ Continue execution: After either the "then" or "else" block is executed, the program
continues with the next statement.

Code Example for if-then-else Statement:


def visit_ifThenElseStmt(self, stmt, prog):
"""
Generates code for a C-style if-then-else statement.
"""
# Step 1: Evaluate the condition
cond_reg = stmt.cond.accept(self, prog)

# Step 2: Generate labels for the 'else' block and end of the if-then-else
else_label = prog.new_label("else")

467
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

end_label = prog.new_label("end_if")

# Step 3: Branch to the 'else' block if the condition is false (cond_reg == 0)


prog.add_inst(AsmModule.Beq(cond_reg, "x0", else_label))

# Step 4: Generate code for the 'then' block


stmt.then_block.accept(self, prog) # Execute 'then' block code
prog.add_inst(AsmModule.Jump(end_label)) # Skip 'else' block after 'then' block

# Step 5: Generate code for the 'else' block (if it exists)


prog.add_label(else_label)
if stmt.else_block:
stmt.else_block.accept(self, prog) # Execute 'else' block code

# Step 6: End of the if-then-else statement


prog.add_label(end_label)

Explanation:

●​ No Value Return: Unlike an expression, this if-then-else statement doesn’t return a


value, so we don't need to manage registers for results.
●​ Branching for Control Flow: The generated code uses conditional branches (Beq) and
unconditional jumps (Jump) to control which block (then/else) gets executed.
●​ Labels: Labels are used to mark the "else" block and the end of the entire
if-then-else statement so that the program can jump over blocks as needed.

Can you "draw" the code that will be produced for an if-then-else
statement?
Sure. Depending on the existence of the else branch, we might have two different graphs
showing how control flows throughout the program:

468
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the above figures, the term "fall through" refers to the control flow when no jump is taken after
executing a conditional block, and the program proceeds to the next block of code. Here’s how
the fall through works in the two examples:

1.​ If cond then then_block (left side):


○​ The conditional expression (exp.cond) is evaluated, and the result is stored in
cond_reg.
○​ If the condition is false (i.e., cond_reg == 0), a branch is taken to
else_label.
○​ If the condition is true (i.e., cond_reg != 0), the program "falls through" to
execute the then_block, and there is no jump.
○​ After the then_block is executed, the program jumps to end_label, which
represents the point after the entire conditional construct.
2.​ If cond then then_block else else_block (right side):
○​ Similarly, the condition is evaluated, and a branch to else_label occurs if the
condition is false.
○​ If the condition is true, the program "falls through" to execute the then_block.
○​ After the then_block is executed, a jump is made to end_label to skip the
else_block.
○​ If the condition was false, the program jumps to else_label, executes
else_block, and then "falls through" to end_label.

The dashed lines represent this "fall through" behavior, indicating where the execution continues
without an explicit branch instruction. It’s essentially the default execution flow when no
condition causes a jump.

469
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

And how would you generate code for a C-Style "while" loop?
A C-style while loop repeatedly executes a block of code as long as a condition holds true.
Like the if-then-else statement, it does not return a value, but it controls the program's flow.

Code Generation Strategy:

1.​ Loop Condition: Evaluate the loop's condition at the beginning of each iteration.
2.​ Branch on Condition: If the condition is true, execute the loop body; otherwise, jump to
the end of the loop.
3.​ Repeat: After executing the body, jump back to evaluate the condition again.

Code Example for while Loop:


def visit_whileStmt(self, stmt, prog):
"""
Generates code for a C-style while loop.
"""
# Step 1: Generate labels for the start of the loop and the end of the loop
loop_start = prog.new_label("loop_start")
loop_end = prog.new_label("loop_end")

# Step 2: Start of the loop


prog.add_label(loop_start)

# Step 3: Evaluate the condition


cond_reg = stmt.cond.accept(self, prog)

# Step 4: Exit the loop if the condition is false


prog.add_inst(AsmModule.Beq(cond_reg, "x0", loop_end))

# Step 5: Generate code for the loop body


stmt.body.accept(self, prog) # Execute loop body

# Step 6: Jump back to the start of the loop to recheck the condition
prog.add_inst(AsmModule.Jump(loop_start))

# Step 7: End of the loop


prog.add_label(loop_end)

Explanation:

●​ No Value Return: The loop does not return any value; its sole purpose is controlling the
flow of the program by repeatedly executing the loop body as long as the condition
holds.
●​ Labels for Repetition: The loop has two key labels: one for the start of the loop
(loop_start) and one for the end (loop_end). The program jumps between these
labels to repeat the loop body or exit the loop when the condition is false.

470
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Conditional Branching: The condition is checked using a Beq instruction. If the


condition evaluates to False (0), the program jumps to loop_end and exits the loop. If
True, it continues executing the loop body.
●​ Unconditional Jump: After executing the body, the program jumps back to
loop_start to re-evaluate the condition.

Can you draw the code produced for a while?


See the control-flow graph below:

In this diagram, there is only one fall through—from the condition block (loop_start) to the
loop body when the condition is true. The control flow diagram now makes it clear that after
executing the loop body, there is an unconditional jump (jump loop_start) back to the
condition check, which means there's no fall through edge there.

So, the correct interpretation is:

●​ One fall through: From the loop_start block to the loop body if the condition is true.

The transition from the body to the condition check (loop_start) is handled by the jump, not a
fall through. Thanks for the clarification!

Our while contains two jumps: a conditional test and an unconditional one.
Would you be able to generate the while with only one dynamic jump?
Yes! Typical compilers use a common optimization called a loop inversion (also called
do-while conversion) to reduce the number of jumps in a loop, which improves efficiency by
removing the unconditional jump back to the condition.

471
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How Loop Inversion Works:

Instead of placing the condition check at the start of the loop, typical compilers such as GCC
place it at the end of the loop body. This allows the loop to execute the first iteration without a
jump to the condition check, avoiding the unconditional jump altogether after the first iteration.

Here's how it works:

1.​ Initial Jump to Body:


○​ At the start of the loop, there's a branch that skips the loop body if the condition is
false.
2.​ Single Conditional Check in Each Iteration:
○​ After each iteration of the loop body, the condition is checked at the end.
○​ If the condition is still true, the control jumps back to the start of the loop body.

This method removes the need for an unconditional jump at the end of each iteration and
requires only one conditional branch per iteration.

Could you show an example of loop inversion?


If you apply this to the control flow diagram, it would look something like this:

1.​ Initial conditional jump to the loop body (if cond is true).
2.​ After each loop body execution, check cond again (branch if true, exit if false).

This avoids needing to unconditionally jump to the condition after every iteration. Below we see
the new control-flow graph:

Why is the inverted loop faster?


The inverted loop has two advantages over the original code:

472
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ No Unconditional Jump: In the traditional while loop, you'd need a jump to the
condition check after every iteration. By placing the condition at the end of the body, you
eliminate the unconditional jump.
●​ Reduced Branch Misprediction: With fewer jumps and branches, the processor can
better predict the branch directions, reducing the chances of branch misprediction.

It seems that the inverted loop is a C-style do-while. Is that the case?
Yes, you're absolutely right! The loop inversion optimization transforms a while loop into
something that behaves like a do-while loop in C.

In a do-while loop, the loop body is executed at least once before the condition is checked.
This is exactly what happens in the inverted loop generated by GCC. Here's a breakdown:

While Loop in C:
while (cond) {
// loop body
}

●​ The condition is checked before the loop body executes.


●​ If the condition is false initially, the loop body is skipped entirely.

Do-While Loop in C:
do {
// loop body
} while (cond);

●​ The loop body is executed at least once, regardless of the initial value of cond.
●​ The condition is checked after the loop body executes.

Inverted While Loop ≈ Do-While Loop

When a typical compiler like GCC applies the loop inversion optimization, it transforms the
traditional while loop into an inverted form that behaves like a do-while loop:

1.​ Initial check: The condition is checked once before the first iteration (just like a while
loop), but this is done with an initial branch that skips the loop entirely if the condition is
false.
2.​ Loop body execution: The loop body is executed at least once (like a do-while loop).
3.​ Condition after body: After the loop body, the condition is checked again, and if true,
the loop continues. This is just like the condition in a do-while loop being evaluated
after each iteration.

473
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Conclusion:

The optimized version does behave like a do-while loop, but the key difference is that the
initial condition is still checked before the first iteration to ensure the loop doesn't run if the
condition is initially false. In contrast, a pure do-while loop in C always runs the loop body at
least once without any prior condition check. However, for subsequent iterations, the condition is
checked after the loop body, making the control flow similar to a do-while.

Can you give me a visitor that produces an inverted loop?


To invert the while loop into something like a do-while loop, we need to change the order of
the loop's condition check and the body execution. In the inverted loop structure:

1.​ The body of the loop will execute first.


2.​ After executing the body, the condition will be checked.
3.​ If the condition is true, the loop continues; otherwise, it exits.

Here's a new version of visit_whileStmt that implements this structure:

def visit_whileStmt(self, stmt, prog):


"""
Generates code for an inverted while loop with an initial condition check.
"""
# Step 1: Generate labels for the body of the loop and the condition check
loop_body = prog.new_label("loop_body")
loop_end = prog.new_label("loop_end")

# Step 2: Initial condition check (if the condition is false, skip the loop)
cond_reg = stmt.cond.accept(self, prog)
prog.add_inst(AsmModule.Beq(cond_reg, "x0", loop_end))

# Step 3: Start of the loop body


prog.add_label(loop_body)

# Step 4: Generate code for the loop body


stmt.body.accept(self, prog) # Execute loop body

# Step 5: Condition check after the body is executed (inverted loop check)
cond_reg = stmt.cond.accept(self, prog)
prog.add_inst(AsmModule.Bne(cond_reg, "x0", loop_body))

# Step 6: End of the loop (when condition is false)


prog.add_label(loop_end)

Notice that for a do-while-like loop, we must still check the condition before entering the loop
to avoid running the body when the condition is false right from the start. Without that, the loop
would always execute the body at least once, which doesn't match the behavior of a regular
while loop in C.

474
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explanation:

1.​ Initial Condition Check:​


Before entering the loop, we evaluate the condition. If the condition is false (cond ==
0), we immediately jump to loop_end, bypassing the body of the loop entirely. This is
similar to how a regular while loop works.
2.​ Loop Body Execution:​
If the condition is true at the start, we enter the loop body.
3.​ Inverted Loop:​
After executing the body, we check the condition again. If it remains true, we jump back
to the start of the body (loop_body). Otherwise, we fall through and exit the loop.

This way, the code behaves like a normal while loop but uses the more efficient inverted loop
structure inside. The surrounding condition ensures that we only enter the loop if the condition is
true initially, avoiding the do-while behavior where the body would always execute at least
once.

We have been calling these graphs we draw "Control-Flow Graphs". Is


there a formal definition for these graphs?
A Control-Flow Graph (CFG) is a fundamental concept in computer science, particularly in the
fields of compiler design, program analysis, and optimization. It is a directed graph that
represents the flow of control through a program during its execution.

Key Components of a Control-Flow Graph:

1.​ Nodes (Basic Blocks):​


In a CFG, each node represents a basic block. A basic block is a straight-line sequence
of instructions with:
○​ One entry point: the first instruction is the only point where control can enter the
block.
○​ One exit point: control can leave the block only at the last instruction (except for
conditional branches at the end).
2.​ A basic block doesn't contain any control flow instructions (like jumps or branches)
except at the end.
3.​ Edges (Control Flow):​
The edges between nodes represent the possible control flow between the basic
blocks. An edge from block A to block B indicates that after executing the instructions in
block A, the program might transfer control to block B. This happens when:
○​ There's a conditional branch at the end of block A that may lead to block B.
○​ Block A ends with an unconditional jump to block B.
○​ Block A naturally falls through into block B (in the absence of jumps or branches).

475
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

CFG in Optimizations:

When optimizing loops (e.g., using loop inversion, as we discussed), the CFG changes.
Instead of checking the condition first, the compiler may rearrange the flow to execute the loop
body before the condition (similar to a do-while loop). The CFG would reflect this
transformation by adjusting the edges and nodes to reflect the modified control flow.

Summary:

A Control-Flow Graph is a graphical representation of all the possible paths a program might
take during its execution. It is a vital tool in program analysis and optimization, helping compilers
and developers understand how control moves through a program, where optimizations can be
applied, and what parts of the code might be unreachable or need more attention.

Why are CFGs important?


Control-flow graphs are important for a number of reasons:

1.​ Program Analysis and Optimization:​


Compilers use CFGs to perform various analyses and optimizations, such as:
○​ Dead code elimination: Removing code that can never be executed based on
the flow.
○​ Loop optimizations: Analyzing loops to improve performance, such as through
loop unrolling or invariant code motion.
○​ Data-flow analysis: Determining the possible values that variables can take at
different points in the program.
2.​ Pathfinding:​
CFGs allow compilers to find paths through the program. For instance, it helps determine
whether a certain block of code can be reached or whether an infinite loop might exist.
3.​ Code Generation:​
When generating machine code, the compiler can traverse the CFG to produce more
efficient jump and branch instructions.
4.​ Testing and Verification:​
CFGs are also useful in software testing, where they help ensure that every possible
path through the program is tested (a concept known as path coverage).

Could you give me a visitor that produces code for a C-style for loop?

Here's how you would structure a visitor for a typical for loop. A for loop has four parts:

1.​ Initialization: This is executed once at the beginning of the loop.


2.​ Condition: Checked before each iteration to decide whether to continue looping.
3.​ Increment: Executed after each iteration of the loop.
4.​ Body: The code inside the loop that is executed for each iteration.

476
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's create a visitor that generates assembly code for this structure:

def visit_forStmt(self, stmt, prog):


"""
Generates code for a typical C-style for loop.
"""
# Step 1: Generate labels for the loop condition, body, and end
loop_cond = prog.new_label("loop_cond")
loop_body = prog.new_label("loop_body")
loop_end = prog.new_label("loop_end")

# Step 2: Initialization (executed once before the loop starts)


stmt.init.accept(self, prog) # Generate code for the initialization statement

# Step 3: Jump to the loop condition


# (to avoid executing the body without checking the condition first)
prog.add_inst(AsmModule.Jump(loop_cond))

# Step 4: Loop body


prog.add_label(loop_body)
stmt.body.accept(self, prog) # Generate code for the body of the loop

# Step 5: Increment (executed after the body for each iteration)


stmt.incr.accept(self, prog) # Generate code for the increment expression

# Step 6: Condition (executed before each iteration


# to check if we should continue looping)
prog.add_label(loop_cond)
cond_reg = stmt.cond.accept(self, prog)
# If cond != 0, jump to loop_body
prog.add_inst(AsmModule.Bne(cond_reg, "x0", loop_body))

# Step 7: End of the loop


prog.add_label(loop_end)

This structure captures the typical behavior of a for loop in languages like C, ensuring the
initialization is done once, the condition is checked at each iteration, and the increment is
executed after the body on each loop iteration.

477
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What would the CFG of a for loop look like?


The figure below shows the CFG of a for loop:

Explanation:

1.​ Initialization Block:


○​ This part runs once, at the very beginning of the loop. It's generated using
stmt.init.accept(self, prog).
2.​ Condition Check:
○​ Before entering the loop body, we check the condition. If the condition is false
(cond == 0), the loop exits and we jump to loop_end.
3.​ Loop Body:
○​ If the condition is true, we execute the loop body.
4.​ Increment Block:
○​ After the body is executed, the increment step is performed before checking the
condition again.
5.​ Loop Flow:
○​ After performing the increment step, the code jumps back to the condition check
label (loop_cond), creating a cycle until the condition is false.
6.​ End Label:
○​ When the condition becomes false, control falls through to the loop_end label,
exiting the loop.

How can we model a "break" statement that exits the surrounding loop?
Well, let's first discuss how to model an AST node for a break statement and then how to
implement the visit_breakStmt to generate the appropriate assembly code for it.

Modeling the AST Node for a break Statement

478
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

A break statement is used to exit the nearest enclosing loop prematurely. To represent this
premature exit in the AST, we could create a simple node with no specific fields because the
break statement doesn't depend on any values or conditions; it just transfers control to the
statement after the loop. Here's a possible structure for a BreakStmt class in your AST:

class BreakStmt:
def __init__(self):
# No fields are necessary since break has no arguments
pass

def accept(self, visitor, prog):


return visitor.visit_breakStmt(self, prog)

This BreakStmt class has an accept method that will call the visitor's visit_breakStmt
method, passing in the BreakStmt node and the current program object (prog).

How would be code generation for the break statement?


In code generation, when we encounter a break statement, we need to jump to the end of the
loop, which means the loop must have a known exit point (label). To do this, we need a way to
pass the loop’s exit_label to the break statement.

Storing Loop Context

One way to track the current loop's exit_label is by using a stack of labels inside the visitor.
Each time we enter a loop, we push its exit_label onto this stack. When generating code for
a break, we can use the label at the top of the stack to generate a jump to the end of the loop.

Here's an example of how you can modify the visitor to handle this:

1.​ Add a Stack for Exit Labels: The visitor will need a stack to track the exit labels of
loops.

class CodeGenerator:
def __init__(self):
# Stack for loop exit labels
self.exit_label_stack = []

2.​ Modify the visit_whileStmt and visit_forStmt Methods: Before generating the
body of the loop, push the exit_label onto the stack. After generating the body, pop
the label off the stack. For the while loop, see the code below (for the for loop, you
would do something similar with the loop_end label):

def visit_whileStmt(self, stmt, prog):

479
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

loop_start = prog.new_label("loop_start")
loop_end = prog.new_label("loop_end")

# Push the exit label onto the stack


self.exit_label_stack.append(loop_end)

prog.add_label(loop_start)
cond_reg = stmt.cond.accept(self, prog)
prog.add_inst(AsmModule.Beq(cond_reg, "x0", loop_end))
stmt.body.accept(self, prog)
prog.add_inst(AsmModule.Jump(loop_start))

# Pop the exit label off the stack


self.exit_label_stack.pop()

prog.add_label(loop_end)

3.​ Implement visit_breakStmt: When visiting a break statement, generate a jump to


the label at the top of the exit_label_stack (which is the end of the current loop).

def visit_breakStmt(self, stmt, prog):


# The break should jump to the current loop's exit label
if not self.exit_label_stack:
raise Exception("Break statement not inside a loop")

# Get the label for the current loop's end


exit_label = self.exit_label_stack[-1]
prog.add_inst(AsmModule.Jump(exit_label))

Explanation:

1.​ visit_whileStmt and visit_forStmt: These methods push the loop_end label
onto the exit_label_stack before generating the loop body. When the loop ends,
they pop the label off the stack.
2.​ visit_breakStmt: This method generates a jump to the label at the top of the
exit_label_stack, effectively breaking out of the loop.

Summary:

●​ The BreakStmt AST node is simple because break doesn't take arguments.
●​ The visitor keeps track of the nearest loop's exit label using a stack
(exit_label_stack), allowing nested loops to work correctly.
●​ The visit_breakStmt generates a jump to the current loop’s exit_label, making
the break statement functional.

The figure below summarizes the code that we've just produced:

480
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure below shows an example of a control-flow graph representing a for loop with a break
statement guarded by a conditional statement. The numbers show the order in which basic
blocks are aligned in the final assembly representation.

481
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How would be code generation for the continue statement?


Just like the break statement, the continue statement also affects the flow inside loops, but
instead of exiting the loop, continue skips the rest of the current iteration and jumps back to
the loop's condition check (or increment in the case of a for loop).

To generate code for the continue statement, we'll need to jump to the point where the loop
condition or increment (in a for loop) is re-evaluated. This means we will need to track a
continue label for each loop, in addition to the exit label we used for break. The declaration of
the visitor will become like the following piece of code:

class CodeGenerator:
def __init__(self):
# Stack for loop exit labels (for break statements)
self.exit_label_stack = []
# Stack for loop continue labels (for continue statements)
self.continue_label_stack = []

def visit_whileStmt(self, stmt, prog):


...

def visit_forStmt(self, stmt, prog):


...

def visit_breakStmt(self, stmt, prog):


...

def visit_continueStmt(self, stmt, prog):


...

Plan:

●​ We will create a ContinueStmt AST node, similar to BreakStmt.


●​ We will add a stack of continue labels to track the loop's condition check or increment.
●​ We will write a visit_continueStmt method that generates the correct jump to the
continue label.

Step 1: Modeling the ContinueStmt AST Node

We need to model the continue statement with a simple AST node, similar to BreakStmt.
Here's an example:

class ContinueStmt:
def __init__(self):
# No fields needed since continue has no arguments
pass

482
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def accept(self, visitor, prog):


return visitor.visit_continueStmt(self, prog)

Step 2: Handling Continue Labels in the Visitor

We now modify the code generator to handle continue statements. We will introduce a
continue label stack in the same way we did with the exit_label_stack for break
statements.

Modifying visit_whileStmt and visit_forStmt

Just like we track exit labels for break, we will track continue labels for continue. For a
while loop, the continue label should be the loop condition check. For a for loop, it should be
the increment step. In a while loop, we jump back to the condition check when encountering a
continue statement:

def visit_whileStmt(self, stmt, prog):


loop_start = prog.new_label("loop_start")
loop_end = prog.new_label("loop_end")

# Push the exit and continue labels onto the stack


self.exit_label_stack.append(loop_end)
# In a while loop, we recheck the condition at the start of the loop
self.continue_label_stack.append(loop_start)

prog.add_label(loop_start)
cond_reg = stmt.cond.accept(self, prog)
prog.add_inst(AsmModule.Beq(cond_reg, "x0", loop_end))
stmt.body.accept(self, prog)
prog.add_inst(AsmModule.Jump(loop_start))

# Pop the labels off the stack after the loop


self.continue_label_stack.pop()
self.exit_label_stack.pop()

prog.add_label(loop_end)

In a for loop, we jump to the increment step when encountering a continue statement.

def visit_forStmt(self, stmt, prog):


loop_cond = prog.new_label("loop_cond")
loop_body = prog.new_label("loop_body")
loop_end = prog.new_label("loop_end")
loop_incr = prog.new_label("loop_incr") # Increment step for the loop

# Push the exit and continue labels onto the stack

483
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

self.exit_label_stack.append(loop_end)
self.continue_label_stack.append(loop_incr)

# Initialization
stmt.init.accept(self, prog)

# Jump to the condition check first


prog.add_inst(AsmModule.Jump(loop_cond))

# Loop body
prog.add_label(loop_body)
stmt.body.accept(self, prog)

# Increment step
prog.add_label(loop_incr)
stmt.incr.accept(self, prog)

# Condition check
prog.add_label(loop_cond)
cond_reg = stmt.cond.accept(self, prog)
prog.add_inst(AsmModule.Bne(cond_reg, "x0", loop_body))

# Pop the labels off the stack after the loop


self.continue_label_stack.pop()
self.exit_label_stack.pop()
prog.add_label(loop_end)

Step 3: Implementing visit_continueStmt

The visit_continueStmt method will generate a jump to the current loop’s continue label,
which is the label at the top of the continue_label_stack.

def visit_continueStmt(self, stmt, prog):


# The continue should jump to the current loop's continue label
if not self.continue_label_stack:
raise Exception("Continue statement not inside a loop")

# Get the label for the current loop's continue point


continue_label = self.continue_label_stack[-1]
prog.add_inst(AsmModule.Jump(continue_label))

Summary:

●​ BreakStmt jumps to the exit_label of the nearest loop to exit the loop.
●​ ContinueStmt jumps to the continue_label of the nearest loop, skipping the rest of
the loop body and re-evaluating the loop condition or increment.

484
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ We use two separate stacks, one for exit_labels (for break) and one for
continue_labels (for continue), to keep track of nested loops.

This setup will allow you to correctly handle both break and continue statements inside
nested loops in your code generation.

How would you generate code for a switch statement?

To generate code for a switch statement in RISC-V, we first need to define the structure of the
switch statement in the AST. Below is the representation of the SwitchStmt AST node,
followed by the implementation of the visit_switchStmt method for code generation.

Representation of the SwitchStmt AST Node


class SwitchStmt:
def __init__(self, switch_expr, cases, default_case=None):
"""
Represents a C-style switch statement.

:param switch_expr: The expression to evaluate for the switch.


:param cases: A list of tuples (case_value, case_body), where:
- case_value is the value for this case.
- case_body is the block of code for this case.
:param default_case: The block of code for the default case.
"""
# Expression whose value determines the branch
self.switch_expr = switch_expr
# List of (case_value, case_body) pairs
self.cases = cases
# Block of code for the default case
self.default_case = default_case

def accept(self, visitor, prog):


"""
Accept a visitor for code generation.
"""
return visitor.visit_switchStmt(self, prog)

Implementation of visit_switchStmt
def visit_switchStmt(self, stmt, prog):
"""
Generates code for a C-style switch statement with proper fall-through handling.
"""
# Step 1: Evaluate the switch expression and store the result in a register
switch_reg = stmt.switch_expr.accept(self, prog)

# Step 2: Generate labels for each case and the default case

485
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

case_labels = [prog.new_label(f"case_{i}") for i in range(len(stmt.cases))]


default_label = prog.new_label("default") if stmt.default_case else None
end_label = prog.new_label("end_switch")

# Step 3: Store the target of a break unconditional jump, if any


self.exit_label_stack.append(end_label)

# Step 4: Generate comparison and branch instructions for each case


for i, (case_value, _) in enumerate(stmt.cases):
case_label = case_labels[i]

# Allocate a temporary register and load the case value into it


temp_reg = self.next_var_name()
prog.add_inst(AsmModule.Addi(temp_reg, "x0", case_value))

# Compare the switch expression with the case value and branch if equal
prog.add_inst(AsmModule.Beq(switch_reg, temp_reg, case_label))

# Step 5: Jump to the default case if no matches are found (if default exists)
if default_label:
prog.add_inst(AsmModule.Jump(default_label))
else:
prog.add_inst(AsmModule.Jump(end_label))

# Step 6: Generate code for each case


for i, (_, case_body) in enumerate(stmt.cases):
prog.add_label(case_labels[i]) # Label for the case
case_body.accept(self, prog) # Generate code for the case body
# Fall-through to the next case unless break is used

# Step 7: Generate code for the default case (if it exists)


if stmt.default_case:
prog.add_label(default_label)
stmt.default_case.accept(self, prog)

# Step 8: Pop the exit label off the stack after the switch
self.exit_label_stack.pop()

# Step 9: End of the switch statement


prog.add_label(end_label)

Explanation

1.​ Switch Expression Evaluation:


○​ The switch expression is evaluated, and its value is stored in a register
(switch_reg).
2.​ Case Labels:
○​ Each case is assigned a unique label for branching.
3.​ Branching Logic:
○​ For each case, the generated code compares the value of the switch_reg with
the case value. If they are equal, execution jumps to the corresponding case
label.

486
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ Default Case:


○​ If no cases match, the code jumps to the default case label (if it exists). If there's
no default case, it jumps to the end of the switch.
5.​ Case Body Execution:
○​ The body of each case is executed, followed by a jump to the end of the switch to
avoid "fall-through" behavior.
6.​ End Label:
○​ Marks the end of the switch statement, ensuring that execution resumes normally
after the switch.

Handling the break statement within a switch


We handle break statements within switches pretty much as we do in loops: by using the
exit_label_stack to manage the break target for the switch statement, we streamline
handling control flow, particularly in cases where break statements are used. Here's how it
works:

1.​ Consistency with Loops: Using the exit_label_stack ensures that break behaves
uniformly across all constructs (while, for, switch).
2.​ Fall-Through Logic: By omitting the unconditional jump to end_label in each case,
we naturally allow for fall-through behavior unless explicitly overridden by a break
statement. This aligns with C-style switch semantics.
3.​ Simplified Break Handling: Since the target of break is stored in
exit_label_stack, the logic for generating break instructions remains unchanged
across constructs. It simply jumps to the top of the stack.

Step-by-Step RISC-V Assembly Translation

1.​ Evaluate the switch expression:


○​ The value of x is already in register x5.
2.​ Branch to case labels:
○​ Compare x5 with the values for each case. Branch to the corresponding label if
the value matches.
3.​ Execute case body and jump to the end:
○​ Each case body will execute its respective instructions and then jump to the end
of the switch statement.
4.​ Default case handling:
○​ If no case matches, jump to the default case.
5.​ End of the switch statement:
○​ Resume execution after the switch.

The figure below illustrates how code is produced for a typical switch statement:

487
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explanation of the Assembly Code

1.​ Immediate Value Loading (addi):


○​ The addi instruction loads the immediate value for each case (1 and 2) into a
temporary register (t0, t1, etc).
2.​ Branch if Equal (beq):
○​ The beq instruction checks if x5 (the value of x) matches the immediate value in
t0. If true, it jumps to the corresponding case label.
3.​ Unconditional Jump (j):
○​ After executing the code for a specific case, an unconditional jump ensures that
execution skips the remaining cases and the default case.
4.​ Function Calls (call):
○​ The call instruction represents a function call to do_something,
do_something_else, or handle_default.
5.​ Default Case Handling:
○​ If no match is found, the program jumps directly to the default_case label.
6.​ End of the switch:
○​ The end_switch label marks the point where the program continues after the
switch statement.

488
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What is the complexity of running this implementation of the switch


statement?

The current implementation of the switch statement uses a sequence of comparisons


(branches) to determine the matching case, resulting in O(n) complexity, where n is the number
of cases.

Explanation of O(n) Complexity

1.​ Sequence of Branches: Each case requires a comparison between the switch
expression value and the case value.
2.​ Linear Scan: The program evaluates cases sequentially until a match is found or all
cases are exhausted. This linear scan means the time complexity grows with the number
of cases.

Trade-offs and Limitations

●​ Small Case Sets: This approach is simple and works efficiently when the number of
cases is small.
●​ Dense Case Values: If the case values are dense and numeric (e.g., 1, 2, 3, ...), more
optimized approaches like a jump table could be more efficient.
●​ Sparse Case Values: For non-contiguous or non-numeric cases, hash tables or decision
trees might be better alternatives.

Is it possible to implement faster switch statements?

If you want to improve performance, consider the following:

1. Jump Table

●​ Complexity: O(1) for dense, contiguous case values.


●​ How It Works: Precompute a table where each index corresponds to a case value. The
switch expression's value is used as an index to jump directly to the correct case.
●​ Pros:
○​ Fast lookup for dense, numeric cases. Cons:
○​ Wastes memory if the case values are sparse.

Example Assembly:

slli t1, x5, 2 # Multiply switch value by 4 (word size)


la t2, jump_table # Load address of the jump table
lw t3, 0(t2, t1) # Load jump target from the table
jr t3 # Jump to the target

489
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].


2. Binary Search

●​ Complexity: O(log ⁡n) for sorted case values.


●​ How It Works: Use a binary search over the case values to find the matching case.
●​ Pros:
○​ Works well for sparse case values. Cons:
○​ More complex to implement in assembly.

Example:

case_values = [1, 3, 7, 10]
targets = [case_1, case_3, case_7, case_10]
# Perform binary search on `case_values`

3. Hash Table

●​ Complexity: O(1) average, O(n) worst-case.


●​ How It Works: Use a hash function to map case values to jump labels.
●​ Pros:
○​ Efficient for sparse, non-contiguous case values. Cons:
○​ Overhead of maintaining a hash table.

Example:

case_map = {1: case_1, 3: case_3, 7: case_7, 10: case_10}
target = case_map[switch_value]

So, a "good" switch statement involves only numeric cases, right? Which
early languages introduced thi kind of statement?
The earliest programming languages to feature switch statements with numeric cases were
designed with optimization opportunities in mind, particularly for numeric comparisons. These
"good switches" are well-suited for performance improvements, such as jump tables, because
they map numeric case values directly to executable code locations.

Key Early Languages with switch

ALGOL W (1966)

One of the first languages to include a case statement for numeric values.

case x of
1: statement1;
2: statement2;

490
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

default: statement3;
end case;

Optimization: Numeric cases allow the use of jump tables, and its influence shaped later
languages.

FORTRAN (via Computed GOTO, mid-1950s)

While not a switch statement per se, FORTRAN's Computed GOTO provided a similar
concept. Depending on the value of I, the program jumps to label 10, 20, or 30:

GOTO (10, 20, 30), I

Optimization: This inspired efficient jump table generation.

PL/I (1964)

Introduced a formal SELECT statement for branching:

SELECT (X);
WHEN (1) CALL FOO;
WHEN (2) CALL BAR;
OTHERWISE CALL DEFAULT;
END;

Optimization: Its numeric nature made jump table use possible.

C (1972)

One of the most famous and enduring implementations of switch, explicitly designed for
numeric values. Example:

switch (x) {
case 1:
foo();
break;
case 2:
bar();
break;
default:
baz();
}

Optimization: The simplicity of C’s switch syntax and numeric cases made it ideal for jump
tables.

491
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Pascal (1970)

Introduced case as part of its structured programming design. Example:

case x of
1: writeln('Case 1');
2: writeln('Case 2');
else writeln('Default');
end;

Optimization: Pascal's strict typing made numeric optimization straightforward.

Why Numeric Cases Are Ideal for Optimization

1.​ Predictable Execution:


○​ Numeric cases (e.g., integers) are easy to map directly to memory offsets.
2.​ Jump Table Feasibility:
○​ Dense numeric ranges enable direct indexing into a jump table, achieving
O(1)O(1) time complexity.
3.​ Compiler Simplicity:
○​ Numeric cases simplify compiler implementation for branching logic compared to
string or complex case types.

Early Limitations and Developments

●​ Sparse Numeric Cases: Languages like C allowed sparse case values (e.g., 1, 100,
1000), requiring additional handling (e.g., binary search or chains of comparisons).
●​ Non-Numeric Cases: Later languages like Ada and Java introduced switch
statements supporting non-numeric cases (strings, enums), which complicate
optimization.

Conclusion

The "good switches" — numeric case-based switches in early languages like ALGOL W,
FORTRAN, and C — laid the foundation for efficient jump table optimizations. These early
implementations remain highly relevant, especially in languages like C and its descendants.

It seems that C was very important to the advancement of compiler design,


right? It had switches, while, do-while, for, etc, etc.
If I had to name just one programming language that was fundamental historically to the
advancement of code generation techniques, it would be C, indeed.

492
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why C?

1.​ Simplicity and Pragmatism:​


C was designed (in the early 1970s) to be close to the hardware while remaining
high-level enough to be portable across machines. This required compilers to generate
efficient machine code for constructs like while, do-while, if, and switch, which
were central to its syntax.
2.​ Influence on Compiler Design:​
The design of the C compiler (particularly cc and later gcc) pushed forward
techniques for:
○​ Loop optimization (e.g., reducing overhead in while and do-while loops).
○​ Efficient branch handling in conditional statements.
○​ Advanced code generation for switch statements, including jump table
optimization.
3.​ Ubiquity of C Syntax:​
The constructs of C (while, do-while, for, if-else, switch) became a model for
countless subsequent languages like C++, Java, C#, and JavaScript. This spread made
compiler optimizations for these constructs universally important.
4.​ Efficient Use of Hardware:​
C's low-level features, like pointers and manual memory management, demanded that
compilers generate code with tight control over registers, stack memory, and branching
logic.
5.​ Legacy in Systems Programming:
○​ Operating Systems: C was used to implement UNIX, which necessitated
efficient code generation for constructs common in system-level programming.
○​ Embedded Systems: C’s efficiency and direct hardware access meant that its
compilers set the bar for performance in resource-constrained environments.

Historical Impact

C's impact is unparalleled because it created a feedback loop:

●​ The language's popularity spurred advances in compiler technology.


●​ Those compiler improvements reinforced the dominance of the C programming model.
●​ Modern optimizations like loop unrolling, branch prediction, and jump tables owe
much to the need to optimize C constructs.

Honorable Mention: FORTRAN

FORTRAN predates C and was groundbreaking in the development of compilers, particularly in


loop optimizations and mathematical code generation. However, C had a broader impact on
general-purpose and systems programming, making it the more foundational choice overall.

493
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

C was invented by Dennis Ritchie. He was very important to computer


science, right?
Yes, Dennis Ritchie invented C, and he was indeed a towering figure in computer science. His
contributions go beyond just the C programming language, as they helped shape the very
foundation of modern computing.

Key Contributions of Dennis Ritchie

1. The C Programming Language

●​ Invention of C:​
In the early 1970s at Bell Labs, Ritchie developed C as a systems programming
language to write operating systems, most notably UNIX. C struck a balance between
low-level control (close to assembly) and high-level abstraction, making it both powerful
and portable.
○​ Impact: C became one of the most widely used programming languages in
history, influencing languages like C++, Java, Python, C#, and many more.

2. Development of the UNIX Operating System

●​ Collaboration with Ken Thompson:​


Ritchie co-created UNIX alongside Ken Thompson. While Thompson initially

494
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

developed the first version of UNIX in assembly for the PDP-7, Ritchie played a crucial
role in rewriting UNIX in C. This made UNIX portable across different machines, a
revolutionary idea at the time.
●​ Impact:
○​ UNIX became the foundation for modern operating systems, including Linux,
macOS, and indirectly Windows.
○​ Its design philosophy (modularity, simplicity, and the "everything is a file"
paradigm) continues to influence systems programming and software
engineering.

3. Compiler and Systems Design

●​ Ritchie helped pioneer compiler design for high-level languages, ensuring they could
generate efficient machine code. His work on C and UNIX required optimizing constructs
like loops, conditionals, and pointers.
●​ His insights laid the groundwork for many techniques used in systems-level
programming.

4. "The C Programming Language" (K&R)

●​ Co-authored with Brian Kernighan, The C Programming Language (commonly called


"K&R") became the definitive guide to C programming.
●​ Impact:
○​ It popularized C and introduced countless programmers to structured
programming.
○​ The book is still widely regarded as a masterpiece in technical writing.

5. Theoretical and Practical Advances

●​ Ritchie’s work bridged the gap between theory and practice. By creating tools like C
and UNIX, he demonstrated how abstract computer science concepts (e.g., structured
programming) could lead to practical systems of immense value.

Legacy

●​ C’s Influence: Most modern languages owe their syntax and structure to C. Even
domains like embedded systems, graphics, and AI still rely on C for performance-critical
code.
●​ UNIX’s Progeny: UNIX and its descendants (e.g., Linux) run the majority of servers and
power the internet today. Mobile operating systems like Android and iOS are also
UNIX-based.
●​ Awards and Recognition:
○​ Turing Award (1983): With Ken Thompson, for their work on UNIX.
○​ National Medal of Technology (1999): With Thompson, for co-creating UNIX
and C.

495
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Ritchie's Philosophy

Dennis Ritchie valued simplicity, portability, and efficiency in design. He avoided unnecessary
complexity, which allowed his work to endure for decades. His tools and ideas remain the
backbone of modern computing.

496
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Code Generation for Functions


Code: https://github.com/pronesto/DCC053_Public/tree/main/16_FunctionCodeGen

Consider the program below. Where do you think each variable will be
allocated?
This program shows local and global variables defined in C:

#include <stdio.h>

int global_var = 17;

int fun(int parameter) {


printf("%p, %lu, parameter\n", &parameter, (size_t)&parameter % 100);
int local_fun = parameter + 1;
printf("%p, %lu, local_fun\n", &local_fun, (size_t)&local_fun % 100);
return 7 * local_fun;
}

int main() {
printf("%p, %lu, global_var\n", &global_var, (size_t)&global_var % 100);
int local_main = global_var + 13;
printf("%p, %lu, local_main\n", &local_main, (size_t)&local_main % 100);
global_var = fun(5 * local_main);
return 3 * global_var;
}

In this C program, the variables will be allocated in different sections of memory, depending on
whether they are global or local, and whether they belong to a function. Here's where each
variable will be allocated:

1. Global Variables:

●​ global_var:
○​ Location: The global variable global_var is allocated in the data segment
(specifically in the initialized data segment). This is because it is defined outside
of any function and initialized with a value (17).
○​ Lifespan: global_var persists for the lifetime of the program (from startup to
program termination).

2. Local Variables (Function Parameters and Function Local Variables):

●​ parameter (in fun function):

497
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Location: This is a parameter of the fun function, which is treated like a local
variable. It will be allocated in the stack frame of fun. Each time fun is called, a
new stack frame is created, and parameter is stored in that frame.
○​ Lifespan: It only exists while fun is executing, and its memory is freed once the
function returns.
●​ local_fun (in fun function):
○​ Location: This is a local variable in the fun function, so it will also be allocated
in the stack frame of fun. Like parameter, it will exist only during the execution
of fun and will be destroyed afterward.
○​ Lifespan: It exists while fun is executing and is destroyed when the function
returns.
●​ local_main (in main function):
○​ Location: This is a local variable in the main function, so it will be allocated in
the stack frame of main. Since main is called only once and lasts for the
program's duration, local_main will stay in memory for as long as main is
running.
○​ Lifespan: It exists for the duration of the main function, which typically means
until the program terminates.

Summary of Memory Allocation:

●​ Data Segment (Initialized):


○​ global_var
●​ Stack (Local Variables and Parameters):
○​ parameter (in fun)
○​ local_fun (in fun)
○​ local_main (in main)

Execution Flow:

●​ global_var is initialized in the data segment.


●​ local_main is allocated on the stack when main starts, and its value depends on
global_var.
●​ When fun is called from main, parameter and local_fun are allocated in the stack
frame of fun.
●​ After fun returns, the stack frame of fun is destroyed, and the result is stored back into
global_var.

Each time you print the address and remainder, you are observing the addresses where these
variables are stored in memory, and the modulo operation (% 100) provides a convenient way
to distinguish memory locations by checking the last two digits of the address.

498
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What is a stack frame?


The stack frame (also known as activation record) is a section of memory that holds local
variables, function parameters, return addresses, and other data that a function uses during its
execution. When a function is called, a new stack frame is created, and when the function
finishes, that stack frame is destroyed. The stack is called a "stack" because it's like a stack of
plates: each function's data gets "pushed" onto the stack when the function is called, and
"popped" off the stack when the function returns.

Key Characteristics:

●​ Automatic Memory Management: The stack grows and shrinks automatically as


functions are called and return. When a function is called, its frame is pushed onto the
stack. When the function returns, the frame is popped off the stack.
●​ Fast Access: Accessing data on the stack is very fast because the stack pointer (sp)
keeps track of where the top of the stack is.
●​ Limited Size: The stack is limited in size (typically a few megabytes), and deep
recursion or allocating large amounts of data on the stack can lead to a stack overflow.
●​ Lifetime: Variables in the stack frame exist only as long as the function is running. Once
the function returns, these variables are no longer accessible, and the memory is
reclaimed.

Example:

In the fun function of your C program, both parameter and local_fun are allocated in the
stack frame for fun. When fun is called, space is allocated on the stack for these variables,
and when fun returns, this space is reclaimed.

And what is a data segment, where the global variable is located?


The data segment is a section of memory used to store global and static variables. These
variables persist for the entire lifetime of the program.

The data segment is divided into two parts:

●​ Initialized Data Segment: This stores global and static variables that are initialized with
a value at the start of the program (e.g., int global_var = 17;).
●​ Uninitialized Data Segment (BSS): This stores global and static variables that are not
explicitly initialized (e.g., int uninitialized_global_var;). They are initialized to
zero by default.

Key Characteristics:

499
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Global and Static Variables: Global variables (like global_var in your program) and
static variables (variables declared with the static keyword inside a function or
globally) reside in this segment.
●​ Lifetime: Variables in the data segment exist for the entire duration of the program. They
are initialized when the program starts and persist until it ends.
●​ Not Automatically Managed: Unlike stack memory, which is automatically managed,
data in the data segment remains allocated for the entire run of the program, whether or
not it’s actively being used.

Example:

In your C program, global_var is allocated in the initialized data segment because it’s a
global variable with an initial value (17). It exists throughout the program's execution and its
memory is not released until the program ends.

It seems that the program's memory is divided in multiple segments then?


That's correct. A typical program's memory is divided into the following segments:

1.​ Text Segment: Contains the actual compiled code (machine instructions).
2.​ Data Segment:
○​ Initialized Data: Stores initialized global and static variables.
○​ Uninitialized Data (BSS): Stores uninitialized global and static variables (default
initialized to zero).
3.​ Heap: Dynamically allocated memory (using malloc, new, etc.). The heap grows
upwards as memory is dynamically allocated.
4.​ Stack: Stores function call information, local variables, and function parameters. It grows
downwards as new functions are called and stack frames are created.

The figure below shows how space can be allocated in a typical ARM program.

500
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why do we need to store local variables on a stack? Can't we allocate them


statically, like we do with global variables?
In principle, we cannot allocate local variables always at the same address (as if they were
statically allocated) in typical C programs. Local variables need to have unique addresses on
each function call, especially in recursive or reentrant functions. If local variables were statically
allocated (always at the same address), recursive calls or multiple invocations of the same
function would overwrite each other’s data. Here’s why:

●​ Recursion: If a function is called recursively, each instance of the function needs its own
set of local variables. If local variables were always allocated at the same address,
recursive calls would share the same memory for these variables, leading to incorrect
behavior because values would be overwritten.

501
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Reentrancy: Functions in C can be called concurrently (e.g., via threads). If local


variables were statically allocated at the same address, multiple concurrent calls would
overwrite each other’s data, leading to race conditions.

Could you give an example of a function that would fail if local variables
were statically allocated?
Yes. Consider, for instance, the program below:

#include <stdio.h>

int st_fact(int n) {
static int result = 1;
result = n;
result *= (result <= 1) ? 1 : st_fact(result - 1);
return result;
}

int dy_fact(int n) {
int result = 1;
result = n;
result *= (result <= 1) ? 1 : dy_fact(result - 1);
return result;
}

int main(int argc, char** argv) {


for (int i = 2; i < 2 * argc; ++i) {
printf("%d: st_fact = %d, dy_fact = %d\n", i, st_fact(i), dy_fact(i));
}
return 0;
}

The reason your program produces different results for the static (st_fact) and dynamic
(dy_fact) versions of the factorial function lies in how static variables behave compared to
dynamically allocated local variables.

Key Differences in Behavior:

1.​ Static Allocation (st_fact):


○​ The static int result = 1; inside st_fact ensures that the variable
result retains its value between function calls. Once initialized, the value of
result persists across multiple invocations of st_fact, and it is shared across
all calls to this function.
○​ Why this affects the result: When st_fact is called for n = 2, it sets result
= 2. The next time st_fact is called (for n = 3), result still holds the value

502
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

from the previous call (result = 2), and this value is used as the base for the
calculation instead of resetting to 1. As a result, once st_fact sets result to a
value less than or equal to 1, the recursive calls always short-circuit (returning 1),
which is why you see 1 as the output for st_fact after the first call.
2.​ Dynamic Allocation (dy_fact):
○​ The int result = 1; in dy_fact creates a new variable result on the
stack for each call to dy_fact. This ensures that the variable result starts
fresh (with result = 1) for every function invocation.
○​ Why this gives correct results: Since each call to dy_fact initializes result
independently, the recursion proceeds as expected. Each recursive call
computes result based on the correct value of n passed into the function.
Therefore, the function behaves correctly and returns the correct factorial values.

Example Walkthrough:

Let’s go through a specific call to see the difference more clearly:

First call: st_fact(2) and dy_fact(2)

●​ st_fact(2):
1.​ static int result = 1; → Initializes result (but only the first time the
function is called).
2.​ result = 2; → result is now 2.
3.​ Recursively calls st_fact(1) → Since n == 1, assigns 1 to result.
4.​ Returns 1.
●​ dy_fact(2):
1.​ int result = 1; → A new result variable is created with an initial value of
1.
2.​ result = 2; → Updates result to 2.
3.​ Recursively calls dy_fact(1) → result = 1; (from initialization in
dy_fact(1)).
4.​ Returns 2 * 1 = 2.

Subsequent call: st_fact(3) and dy_fact(3)

●​ st_fact(3):
1.​ result still holds the value 1 from the previous call (st_fact(2)).
2.​ result = 3; → Updates result to 3.
3.​ Recursively calls st_fact(2) → st_fact(2) returns 1 because result is
still 1.
4.​ Returns 1.

503
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ dy_fact(3):
1.​ A new result variable is created with an initial value of 1.
2.​ result = 3; → Updates result to 3.
3.​ Recursively calls dy_fact(2) → dy_fact(2) returns 2.
4.​ Returns 3 * 2 = 6.

Conclusion:

●​ Static variables are shared across function invocations and retain their values between
calls, which can lead to unexpected behavior in recursive functions like st_fact. The
static variable doesn’t reset, leading to incorrect results in this case.
●​ Dynamic variables are reallocated on the stack for each function call, ensuring that
each call has its own fresh storage for local variables. This allows recursive functions to
work correctly, as seen in dy_fact.

The static version leads to incorrect results because the static variable persists across calls
and causes the recursive function to behave improperly after the first call.

So, if I am compiling a language that does not support recursion, then I


could allocate all the local variables statically?
Well, more or less. In addition to recursion, indirect function calls (such as those made
through function pointers) present a challenge for static allocation of local variables. We will talk
about that later. But, in the absence of recursion and indirect calls, you can happily allocate all
the variables statically. This is because, without recursion, there is no risk of multiple active calls
to the same function, meaning each function only ever needs one instance of its local variables.

Why Static Allocation Works Without Recursion:

In non-recursive programs, each function is called, executed, and returns before any other calls
to the same function can happen. Since there is no chance of re-entering a function while it is
still active (as recursion would allow), the program only ever needs one copy of each function's
local variables.

This means that:

●​ No stack is required for local variable storage.


●​ Each function's variables can be placed in fixed memory locations known at compile
time, typically in a global or static memory area.

Fortran and COBOL in Their Early Days:

●​ Fortran: Early versions of Fortran (before Fortran 90) did not support recursion.
Therefore, all local variables could be statically allocated in memory, just as you

504
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

described. This design allowed for simpler and more efficient memory management, as
there was no need to maintain a call stack for local variables.
●​ COBOL: Early COBOL programs also lacked recursion support. The language was
primarily used for business applications, and recursion was not a common pattern in
such applications at the time. This allowed COBOL compilers to allocate all variables
statically, as the memory footprint of the program could be determined at compile time.

Key Benefits of Static Allocation:

●​ Simplicity: Without recursion, static allocation simplifies both the compiler's task and the
runtime's memory management. There is no need to allocate and deallocate memory for
variables dynamically, and there is no stack frame to manage.
●​ Efficiency: Static allocation can lead to faster execution, as variables are accessed
directly via known memory addresses, avoiding the overhead of stack manipulation.

Limitations:

●​ Lack of recursion: Since each function’s local variables are statically allocated,
recursion is impossible without risk of overwriting previous values.
●​ Limited parallelism: Static allocation also restricts concurrent execution of the same
function (e.g., in multithreading), as all threads would share the same variable memory,
leading to race conditions unless carefully managed.

Modern Static Allocation and Recursion:

While modern languages support recursion and dynamically allocate local variables on the
stack, some static languages (e.g., embedded systems or low-level programming in C) still use
global or static variables when recursion is not involved. In such cases, you can sometimes
manually control static allocation for performance reasons, but this generally requires careful
programming.

What's an indirect call? Why do they complicate static allocation?


An indirect function call is a call through a pointer. For instance, consider the program below:

#include <stdio.h>

int foo(int x) {
return x + 1;
}

int bar(int x) {
return x * 2;
}

int call_fun(int (*fptr)(int), int val) {

505
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

return fptr(val);
}

int main() {
printf("%d\n", call_fun(foo, 3)); // calls foo: should return 4
printf("%d\n", call_fun(bar, 3)); // calls bar: should return 6
}

When calling call_fun(foo, 3) and call_fun(bar, 3), the exact function (foo or bar)
being called is determined at runtime via the function pointer fptr. Therefore, the local
variables and return addresses for foo and bar can't be statically allocated at compile time, as
the target function is only known during execution.

Indirect Function Calls and Static Allocation:

1.​ Unpredictable Call Targets:


○​ In the case of indirect function calls (where the function to be called is determined
at runtime, such as through function pointers), the exact function being called
might not be known at compile time.
○​ If you don't know the specific function in advance, you can't pre-allocate memory
for its local variables because multiple different functions could be called at the
same point in the code, each requiring different local variable storage.
2.​ Dynamic Call Context:
○​ Indirect function calls often require managing dynamic call contexts, which
include the return address, the function's parameters, and its local variables.
Since the target function is not known until runtime, you can't assign a fixed
memory location to store its context (parameters and return address).
○​ For example, the return address must be stored dynamically based on the
specific function being called. In static allocation, there's no way to guarantee a
unique return address location for each function when the call is indirect.
3.​ Function Pointers and Recursion:
○​ If a function is called through a pointer, or if recursion is involved (where a
function calls itself), you can't simply reuse the same memory locations for the
local variables and return addresses of each function call.
○​ Example: Consider a function pointer calling fun1() in one case and fun2() in
another case. Each function has different local variables, and if these are
statically allocated, calling fun1() would overwrite the memory used by
fun2(), leading to incorrect behavior.

Why Static Allocation Fails with Indirect Calls:

With static allocation, the memory layout for each function is determined at compile time. If
function pointers or indirect calls are involved, the compiler can't determine at compile time:

506
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Which function will be called.


●​ Where to store the return address.
●​ How to pass parameters (since parameter types and locations might differ across
functions).
●​ How to handle local variables (which could be of different types and sizes for different
functions).

Because of these unknowns, dynamic allocation using a stack (or heap, in some cases) is
necessary. This allows:

●​ Dynamic storage of return addresses: Each function call, whether direct or indirect,
needs its own space to store the return address.
●​ Dynamic allocation of parameters and local variables: Each function call must have
its own memory allocated for its parameters and local variables, especially when
recursion or function pointers are involved.

Static Allocation Works Only Without Indirect Calls:

In languages that don't allow function pointers or indirect calls, the absence of dynamic control
over which function gets called makes it easier to allocate local variables statically. But as soon
as indirect calls are introduced, dynamic memory management becomes necessary to handle
the flexibility that such calls provide.

Fortran and COBOL:

●​ Early versions of Fortran and COBOL also avoided indirect calls through function
pointers, which further enabled static allocation.
●​ Without function pointers or recursion, the call graph of the program is fully known at
compile time, making static allocation straightforward.

Conclusion:

●​ In languages without recursion or indirect function calls, static allocation is feasible


because the memory layout of function calls is fully known at compile time.
●​ However, if a language supports indirect function calls (e.g., through function
pointers), recursion, or closures, dynamic memory management (typically using a
stack) is necessary to properly handle the flexible call structure and variable lifetimes.

You mentioned a "call graph". What's that?


A call graph is a directed graph that represents the calling relationships between functions in a
program. Each node in the graph represents a function, and each edge (directed arrow) from
one node to another indicates that the first function calls the second function during execution.

Key Concepts of a Call Graph:

507
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Nodes: Represent functions in a program.


●​ Edges: Directed edges indicate a call from one function to another. If function A calls
function B, there will be an edge from the node representing A to the one representing B.

Types of Call Graphs:

1.​ Static Call Graph:


○​ Constructed by analyzing the source code.
○​ Represents all possible calls between functions based on the code structure.
○​ Typically does not account for dynamic or indirect calls, as these cannot be
determined by just analyzing the code without running the program.
2.​ Dynamic Call Graph:
○​ Constructed by tracking the actual calls made during the execution of the
program.
○​ Can capture indirect calls (e.g., through function pointers or virtual methods),
recursion, and conditional calls.
○​ Depends on the specific inputs and paths taken during execution.

Why Call Graphs Are Important:

1.​ Program Analysis:


○​ Call graphs are a fundamental tool in compiler design, program optimization,
and static analysis.
○​ They help in determining the dependencies between functions and understanding
how the execution flows from one function to another.
2.​ Memory Management:
○​ As we discussed earlier, static allocation of variables is possible if the call graph
is acyclic (i.e., there are no recursive calls) and all function calls are known at
compile time (no indirect calls). If the call graph contains cycles (indicating
recursion), the program must use dynamic allocation, typically via the stack.
3.​ Performance Optimization:
○​ Call graphs help in inlining functions (replacing a function call with the
function’s body), identifying hot spots (frequently called functions), and dead
code elimination (removing unused functions).
4.​ Recursion Detection:
○​ By analyzing the call graph, compilers or developers can detect recursive calls if
a node has a path back to itself, indicating that the function can call itself directly
or indirectly.
5.​ Code Coverage:
○​ Call graphs are useful in testing to check how much of the code is exercised by a
particular set of inputs or during specific tests. Functions that are never called
can be identified for further testing or removal.

508
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure below shows an example of a call graph for our earlier example involving the two
different implementations of the factorial function:

Call Graph Cycles and Recursion:

A cycle in a call graph means that a function can eventually call itself, either directly or indirectly,
leading to recursion. When recursion is detected, the compiler knows that dynamic memory
management (via stack frames) is required for local variables and return addresses.

How Call Graphs Relate to Optimization:

●​ Tail Call Optimization: When compilers recognize tail recursion (a function calling itself
as its last operation), they can optimize the recursion by reusing the same stack frame,
avoiding the overhead of a new call. Analyzing the call graph can help identify such
cases.
●​ Inlining: Inlining decisions—replacing function calls with the body of the function—can
be guided by call graph analysis to improve performance for small or frequently called
functions.

Is it possible to use clang and LLVM's opt to see a call graph?


Yes, you can use Clang and LLVM's opt tool to generate and visualize a call graph of a
program. Here's how you can do it:

Steps to Generate and View a Call Graph:

1. Compile the C Program to LLVM Intermediate Representation (IR)

509
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

First, use Clang to compile your C program into LLVM IR, which is necessary to analyze the
program using LLVM's optimization tools (opt).

clang -S -emit-llvm program.c -o program.ll

This will generate the LLVM IR for the C program in a file called program.ll.

2. Use opt to Generate the Call Graph

LLVM provides the -dot-callgraph option in the opt tool, which outputs the call graph in
DOT format. Here's how to run it:

opt -dot-callgraph program.ll

This command will generate a file named callgraph.dot (or several files if there are multiple
functions).

3. Visualize the Call Graph

To visualize the call graph, you can use a graph visualization tool like Graphviz. Graphviz's dot
command can convert the .dot file into an image.

dot -Tpng callgraph.dot -o callgraph.png

This will create an image file (callgraph.png) containing the call graph that you can view.

4. Examine the Output

Once you have generated the image, you can open it to explore the function calls in your
program. The nodes represent the functions, and the directed edges show which functions call
others.

Let's produce code for static functions. What are the assumptions that we
can make?
There are a number of assumptions that will simplify our job:
1.​ All the functions have different names. This is valid, for our programming language is
totally static.
2.​ We have an infinite surplus of virtual registers. That means that we don't need to worry
about saving the state of registers.
3.​ Every function has only one formal parameter. And all the parameters of functions have
different names.
Under these assumptions, code generation becomes relatively easy. See below the code
generator for function definitions:

510
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

def visit_named_fn(self, exp, prog):


# The address of the function is known and static.
func_label = self._convert_name_to_addr(exp.name)
prog.add_label(func_label)

# Let's assume that all the functions have different names.


# That's ok, otherwise the program would not compile due to
# a type resolution conflict. Let's create a virtual register
# to hold the function's name:
return_addr = f"{exp.name}_ret_addr"

# Save the return address. Assume it's passed through "ra"


# Thus: memory[return_addr] = value(ra)
prog.add_inst(AsmModule.Add(return_addr, "ra", "x0"))

# Remember: Every function has only one parameter.


# Get the parameter, which we assume to be in a0:
prog.add_inst(AsmModule.Add(exp.formal, "a0", "x0"))

# Generate the function body:


return_var = exp.body.accept(self, prog)

# Place the return value in a0:


prog.add_inst(AsmModule.Add("a0", return_var, "x0"))

# Jump back to the caller:


prog.add_inst(AsmModule.Jalr("x0", return_addr))

●​ Function Labeling: Functions are still statically labeled using


prog.add_label(func_label). To get the address of this label, we assume that we
have a function that translates names into addresses, e.g.,
self.convertname_to_addr(exp.name).
●​ Return Address in Virtual Register: The return address is saved in a virtual register
(return_addr), which is used later for returning to the caller.
●​ Parameter Handling: Since there is a single parameter, it’s passed in the a0 register
and copied into a virtual register representing the parameter name.
●​ Return Address: The Jalr instruction is used to return to the caller via the saved return
address in the virtual register.

And below we have the code generator for function applications:

def visit_named_app(self, exp, prog):


# Generate code for the actual parameter of the call:
param_value = exp.actual.accept(self, prog)
# Pass parameter in a0
prog.add_inst(AsmModule.Add("a0", param_value, "x0"))

511
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

# Static jump to the function. Notice that we need the name of the
# function to be available at this point. This code generation approach
# would not work with indirect calls:
func_label = self._convert_name_to_addr(exp.function.name)
prog.add_inst(AsmModule.Jal("ra", func_label))

# Get the return value (which will be in a0):


ret_var = self.next_var_name()
prog.add_inst(AsmModule.Add(ret_var, "a0", "x0"))
return ret_var

●​ Parameter Passing: The parameter is computed and placed in the a0 register using the
virtual register param_value.
●​ Static Function Call: The function is called using a static jump (Jal) to its label, with
the return address saved in ra.
●​ Return Value Handling: The return value is copied from a0 into a virtual register
(ret_var) for use after the function call.

Can you show an example of how that works?


Let's consider the functions inc and sqr in the figure below. They will be treated as named
functions with statically known labels, and the function application inc (sqr 4) will be treated
as two nested function calls: first calling sqr(4), then passing the result to inc.

512
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explanation of Generated Code:

1.​ Function sqr:


○​ The parameter y is passed in the register a0. We copy it to the virtual register y.
○​ We compute y * y using the mul instruction and store the result in result.
○​ The result is moved back to a0 as the return value, and the program jumps back
to the caller using the stored return address.
2.​ Function inc:
○​ The parameter x is passed in a0. We copy it to the virtual register x.
○​ We compute x + 1 using the addi instruction and store the result in result.
○​ The result is moved to a0, and the program jumps back to the caller using the
return address.
3.​ Main Application:
○​ First, sqr(4) is called by placing 4 in a0 and jumping to sqr.
○​ After sqr(4) returns, the result is in a0. We immediately call inc by jumping to
its label with the result of sqr(4) in a0.
○​ The final result of inc(sqr(4)) is in a0, and at this point, the program ends (or
could perform further operations).

Summary:

This generated code respects the assumptions we made:

●​ No recursion or indirect function calls.


●​ Static function names and parameter names.
●​ No stack manipulation; return addresses are saved in virtual registers.

The code for both sqr and inc follows the same pattern: saving the return address, handling
the parameter, computing the result, and returning the value via a0. The main application just
chains the function calls, passing the result of sqr(4) into inc.

Is this convention of using ra to save return addresses and a0 to pass


parameters what we call the ABI?
Yes, the behavior you describe—how the return address is saved and how parameters are
passed—relates to the Application Binary Interface (ABI).

What is an ABI?

An ABI is a set of rules and conventions that define how binary programs interact at runtime. It
ensures compatibility between compiled code, libraries, and the operating system. Specifically, it
governs:

513
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Function Calling Conventions:


○​ How arguments are passed to functions (e.g., in registers like a0 or on the
stack).
○​ How the return value is delivered (e.g., in a register like a0).
○​ Where the return address is stored (e.g., in ra).
2.​ Register Usage:
○​ Which registers are caller-saved (must be preserved by the caller) and which are
callee-saved (preserved by the callee).
3.​ Memory Layout:
○​ Stack growth direction and alignment.
○​ Location of global and static variables.
4.​ Binary Format:
○​ Details like how shared libraries are loaded and symbols are resolved.

Your Code and the ABI

If your code saves the return address in the ra register and passes parameters in a0, it is
following a specific calling convention—a subset of the ABI. In the context of RISC-V, for
example:

●​ The return address is typically stored in ra (x1).


●​ The first argument is passed in a0 (x10).
●​ If a function needs to return a value, it's usually placed in a0 as well.

This means your code adheres to a calling convention compatible with the RISC-V ABI, which
ensures that your functions interact correctly with other code that follows the same ABI.

Why is the ABI Important?

When all components of a program (libraries, OS, etc.) follow the same ABI:

●​ Compiled code from different sources can interoperate seamlessly.


●​ Programs can call system functions and libraries without needing recompilation.

Oh, is that why we can call a C function from a Lua program and
vice-versa?
Exactly! The ability of a Lua program to call C functions (and vice versa) is a perfect example of
the importance of an ABI. Here’s how it works in this context:

Lua and C Interoperability

514
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Lua uses the C API to enable interaction between Lua scripts and C functions. This API relies
on a shared ABI, ensuring the Lua interpreter and compiled C libraries can communicate
seamlessly.

1.​ Lua Calling C:​

○​ When Lua calls a C function, it passes arguments through a predefined


mechanism. For example, Lua values are pushed onto a stack (managed by the
Lua interpreter), and the C function accesses these arguments through the stack.
○​ The C function then performs its operations, places the return values back on the
stack, and Lua retrieves them.
2.​ C Calling Lua:​

○​ C can also call Lua functions using the same stack mechanism. The C code
pushes arguments onto the Lua stack, invokes a Lua function using the
interpreter, and retrieves the results from the stack.

ABI’s Role in this Interoperability

The shared ABI ensures:

●​ Function Calls: The Lua interpreter, written in C, follows the same calling convention as
the Lua C API. For instance, parameters might be passed in specific registers or on the
stack, depending on the platform’s ABI.
●​ Data Representation: Lua's stack and data structures (like Lua values) are consistent in
memory, enabling C functions to manipulate them directly.
●​ Binary Compatibility: Precompiled Lua interpreters and C libraries can interoperate as
long as they adhere to the same ABI (e.g., for the x86-64 architecture or RISC-V).

Without an agreed-upon ABI, Lua scripts wouldn't be able to call C functions reliably, and
embedding Lua into a C program would become a nightmare.

Why This Matters

The Lua-C interaction showcases how ABIs abstract away low-level details, allowing different
programming languages and systems to interact efficiently. It's not just Lua—any "foreign
function interface" (FFI), such as Python's ctypes or Java's JNI, heavily depends on the
underlying ABI. Thus, if you're working on systems or compilers, understanding the ABI helps
ensure interoperability and efficient execution of multi-language programs.

515
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Implementing activation records for a language where everything is


statically allocated is very simple, right?
Yes, that's correct! An activation record (also known as a stack frame) is a block of memory
used to manage the data a function (or procedure) needs during its execution. It is typically
created whenever a function is called, and it stores all the information required to manage the
execution of that function, such as parameters, local variables, return value, and the return
address.

In a language where everything is statically allocated, the structure of an activation record


would indeed be simpler. Here's a breakdown of the typical contents of an activation record in
such a language:

Activation Record Contents:

1.​ Parameters (Arguments):


○​ The values passed to the function (or procedure) by the caller are stored here.
○​ These parameters are often placed in a specific order, depending on the calling
convention of the language.
2.​ Local Variables:
○​ These are variables that are declared and used inside the function.
○​ Local variables only exist during the function's execution and are stored in the
activation record.
3.​ Return Value:
○​ The function's return value (if any) needs to be stored somewhere so it can be
accessed by the caller after the function completes execution.
○​ The return value may be stored in a specific register or in a designated space
within the activation record.
4.​ Return Address:
○​ This stores the address where the program should continue after the function
returns.
○​ When a function finishes executing, it uses the return address to jump back to the
point in the program where it was called.

Statically Allocated Language:

In a statically allocated language, the memory layout is simpler because the sizes of all
variables (both local and global) and parameters are known at compile-time. There is no need
for complex memory management mechanisms like heap allocation or dynamic memory
resizing, which simplifies the structure of the activation record.

●​ Fixed Size: Since the size of all variables, parameters, and return values is known in
advance, the activation record has a fixed size.

516
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Static Allocation: Memory for variables is allocated in a predefined memory region (like
the data segment), rather than dynamically allocated at runtime.

Simplification in Statically Allocated Languages:

In languages with static allocation:

●​ You don’t need a heap or dynamic memory allocation.


●​ The activation record layout is predictable, fixed, and the locations of all parameters,
local variables, and return addresses are known at compile-time.

Thus, the activation record would only need to contain:

●​ The parameters passed to the function.


●​ The local variables declared inside the function.
●​ The return value (if necessary).
●​ The return address to jump back to the caller.

In a language where everything is statically allocated, recursion might be limited or unsupported


because each function would have a fixed memory location (which could lead to overwriting of
data), and re-entrant functions would be difficult to manage.

The previous example uses "virtual registers", e.g., registers that simulate
variable names, as already discussed in a previous class. Do you
remember what a virtual register really is?
The concept of "virtual registers" is quite common in intermediate representations (IR) used by
compilers. Virtual registers are placeholders or abstract names for values that haven't been
mapped to actual machine registers yet. They are commonly used during the early stages of
compilation, particularly before Register Allocation. As an example, the instruction add
sqr_ret_add ra x0 uses a virtual register (sqr_ret_add), and two physical registers: ra
and x0.

Why Use Virtual Registers?

1.​ Abstraction: Virtual registers allow the compiler to treat values and variables as if there
are an infinite number of registers. This makes it easier to express computations without
worrying about the physical limitations of the hardware (e.g., the number of registers
available on a machine).
2.​ Simplicity: Virtual registers make it easier to represent the semantics of a program
during the Intermediate Representation (IR) phase. Compilers often use IRs that are
closer to the source code (SSA, for example), where each value has its own unique
"register" (virtual register) to simplify analysis and optimization.

517
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Optimization: Compilers can perform a wide range of optimizations on the code without
needing to assign physical registers. Once optimizations are complete, the final step is to
map the virtual registers to the real physical registers of the target machine.

Register Allocation

In the final stages of compilation, the compiler uses a process called register allocation to map
virtual registers to physical machine registers. The graph coloring algorithm is often used
here:

●​ Virtual registers are treated as nodes in a graph.


●​ An edge is drawn between two virtual registers if they are live (i.e., their values are
needed) at the same time.
●​ The goal is to assign a physical register to each virtual register in a way that minimizes
spills (where values need to be stored in memory because there aren't enough
registers).

Examples in Common IRs:

●​ LLVM IR: LLVM uses a large number of virtual registers in its intermediate
representation. It defers register allocation until later in the compilation process, after
performing optimizations.
●​ SSA (Static Single Assignment): Each variable is assigned exactly once, and virtual
registers naturally fit into this model. Register allocation is then performed to map these
variables to physical registers.

In summary, virtual registers provide flexibility and allow compilers to focus on optimization
without worrying about hardware constraints until the final stages of code generation.

Would the code generator we wrote work for a programming language with
anonymous functions?
That code generator would not work because we are calling functions directly via jal
instructions. These instructions need a fixed address (the label that is associated with the name
of the function).

In our expression-oriented language, all the syntactic constructions


represent "values". We have three kinds of values: integers (as in 1 + 2),
booleans (as in 1 < 2) and functions, as in fn x => x + 1. How do we
denote function values in the low-level code that we generate?
In your expression-oriented language, functions are also considered values, just like integers
and booleans. When we generate code for functions, the value that we store in a register for a
function is typically a pointer to the function's code. Here's a detailed breakdown of this idea:

518
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Storing Functions in Registers

1. Integers and Booleans:

●​ For integers, you simply store the integer value (e.g., 1, 42, -7) in a register.
●​ For booleans, you store a binary value (0 for false, 1 for true).

2. Functions:

●​ Functions are more complex than basic data types like integers or booleans, because
they represent a block of executable code along with the environment in which the
function was created (for closures).

When producing code for a function, the value of a function in this context is the address of the
function's code in memory. This is usually stored as a pointer in a register.

What exactly gets stored in a register for a function?

There are two common scenarios for storing a function's value:

1. Non-Closure Functions (Simple Functions):

●​ For non-closure functions (functions that don’t depend on external variables from their
environment), the function's value is simply a pointer to the code of the function, e.g.:

fn x => x + 1

●​ The value stored in the register is the address of the machine code that implements this
function (i.e., the instructions for x + 1). This address allows the program to jump to
the function's code when it is called.

2. Closure Functions:

●​ For closures, a function can capture variables from its surrounding environment (lexical
scope). In this case, the value that gets stored in the register is often a closure object
that contains:
○​ A pointer to the function's code.
○​ A pointer to the environment (i.e., the variables that were captured from the
enclosing scope), e.g.:

let y = 10 in fn x => x + y

●​ In this case, y is part of the environment captured by the function. The value stored in a
register would be a pointer to a closure that contains:
○​ The address of the code for fn x => x + y.

519
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ A reference to the value of y, so that when the function is called, it can correctly
add y to x.

Conclusion

The value of a function stored in a register is:

●​ For simple functions (non-closures): A pointer to the function's code.


●​ For closures: A closure object, which includes a pointer to the function's code and a
pointer to the captured environment (if any).

This allows the program to correctly jump to the function's code when it is called, and if
necessary, access any captured variables that are part of its environment.

How would you generate code for anonymous functions then, assuming
that we don't have closures?
Anonymous functions, also called lambda functions or function literals, are functions that do
not have a name. They can be created dynamically, and we often pass them as arguments to
other functions. In this case, the same code generation principles apply as with named
functions, but we treat them as if they have no pre-defined name. To generate code for
anonymous functions, we need to approach it differently. The key challenge is that we can't
rely on static function names. Instead, we need to generate a unique label for each anonymous
function at runtime (during code generation). This allows us to refer to the function even though
it has no name in the source code.

Here’s how we can modify the code generation to support anonymous functions:

Steps to Generate Code for Anonymous Functions:

1.​ Assign a Unique Label to Each Anonymous Function: For each anonymous function,
we can generate a unique label dynamically during code generation. For example, we
could name anonymous functions anon_func_1, anon_func_2, etc.
2.​ Adjust Function Call Code: When calling an anonymous function, we use its
dynamically assigned label.

Let’s modify the visit_named_fn and visit_named_app methods to work with anonymous
functions.

The new visit methods:


def visit_fn(self, exp, prog):
"""
Generate code for an anonymous function (Fn).
"""
# Save the address of the function in a variable:

520
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

addr_var = self.next_var_name()

# Generate a unique label for the anonymous function.


# Generates a unique label like 'anon_func_1', 'anon_func_2', etc.
func_addr = self.next_fn_label()
prog.add_inst(AsmModule.Addi(addr_var, "x0", func_addr))

func_after = self.next_fn_label()
prog.add_inst(AsmModule.Jal("x0", func_after))

# Here's where the function starts:


prog.add_label(func_addr)

# Allocate space on the stack for the return address and local variables.
prog.add_inst(AsmModule.Addi("sp", "sp", -4)) # Assuming a word size of 4.
prog.add_inst(AsmModule.Sw("ra", 0, "sp")) # Store the return address.

# Handle the function's formal parameter (parameter passed in register 'a0').


# Store parameter into the formal variable.
prog.add_inst(AsmModule.Add(exp.formal, "a0", "x0"))

# Generate the function body:


return_var = exp.body.accept(self, prog)

# Move the result of the function body into the return register 'a0'.
prog.add_inst(AsmModule.Add("a0", return_var, "x0"))

# Restore the return address and the stack pointer.


prog.add_inst(AsmModule.Lw("ra", 0, "sp"))
prog.add_inst(AsmModule.Addi("sp", "sp", 4))

# Jump back to the caller.


prog.add_inst(AsmModule.Jalr("x0", "ra"))

# Set a label at the end of the function


prog.add_label(func_after)

return addr_var # Return the generated label for this anonymous function.

def visit_fn_app(self, exp, prog):


"""
Generate code for applying an anonymous function.
"""
# The function being applied is an anonymous function, so we need the label:
# Accepting the function will generate or retrieve its label.
func_label = exp.function.accept(self, prog)

# Generate code for the actual parameter (expression) of the function call:
param_value = exp.actual.accept(self, prog)
# Pass the parameter in register a0.
prog.add_inst(AsmModule.Add("a0", param_value, "x0"))

# Jump to the anonymous function's label:


# Static jump to the function using its generated label.
prog.add_inst(AsmModule.Jal("ra", func_label))

521
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

# Retrieve the return value (stored in register 'a0').


ret_var = self.next_var_name()
# Store the return value in a new variable.
prog.add_inst(AsmModule.Add(ret_var, "a0", "x0"))
return ret_var

Key Changes and Explanations:

1.​ visit_fn for Anonymous Functions:


○​ Label Generation: We dynamically generate a unique label for each anonymous
function using self.next_fn_label(). This is crucial because anonymous
functions don’t have names, so we need to assign them labels during code
generation (like anon_func_1, anon_func_2, etc.).
○​ Function Body Generation: The function body is generated just like before, but
now we use the dynamically assigned label for the anonymous function.
○​ Stack Management: We handle the stack the same way as for named functions,
storing and restoring the return address, and making sure the return value is
placed in the a0 register.
2.​ visit_fn_app for Anonymous Function Calls:
○​ Parameter Handling: We generate code for the actual parameter of the function
call and pass it in register a0.
○​ Retrieve Function Label: Since we are dealing with an anonymous function, we
can’t jump to a static name like before. Instead, we generate or retrieve the label
of the anonymous function by calling exp.function.accept(self, prog).
This will return the label that was generated when the anonymous function was
defined.
○​ Jump to the Function: We use the Jal instruction to jump to the function’s label
(whether it's anonymous or named).
○​ Return Value Handling: After the function returns, we store the return value
(from register a0) into a temporary variable for later use.

Helper Method:

We also need a helper method to generate unique labels for anonymous functions. Here’s a
simple version:

def next_fn_label(self):
"""
Generate a unique label for anonymous functions.
"""
if not hasattr(self, '_fn_count'):
self._fn_count = 0
self._fn_count += 1
return f"anon_func_{self._fn_count}"

522
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This method keeps track of how many anonymous functions we’ve seen so far and generates a
unique label for each one.

Conclusion:

With these changes, the code generator can now handle anonymous functions. We generate
unique labels for them and handle function calls dynamically based on those labels. The
approach is similar to named functions, except we don't rely on exp.name and instead generate
labels on the fly.

Can you show an example of assembly that we generate?


Sure. See an example in the figure below.

That's really odd: why is there a jump at the beginning of the function?
The jump at the beginning of the function is required to allow the program to skip over the
function’s code and continue with the main program logic. Here’s why this is necessary:

Jump at the Beginning

1.​ Anonymous Function Definition:

523
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ When you define an anonymous function (fn x => x + 1 in your example),


you generate machine code for that function.
○​ However, the program does not immediately execute the function. Instead,
the function needs to be available in memory for when it is called later. Therefore,
we need to ensure that defining the function does not interfere with the
program’s main execution.
2.​ Control Flow:
○​ By inserting a jump (Jal x0, func_after) at the beginning, the program can
skip over the function’s code and continue with the next instruction, ensuring that
the function is only executed when it’s actually called.
○​ Without this jump, the program would begin executing the function code
immediately upon encountering the function definition, which is not what you
want. The function should only be executed when it's applied later (f 3).
3.​ Program Flow and Main Logic:
○​ In this case, your program starts execution from address 0. When it hits the jal
instruction at line 001, it immediately jumps to the function call logic (starting at
_fn_count_1 in the assembly code), which is the actual main logic of the
program.
○​ The function body _fn_count_0 is inserted after this jump, so that it's available
in memory but not executed until explicitly called.

Example Flow:

●​ The Jal instruction at line 001 makes sure that the program skips over the function
body and jumps to the actual code that calls the function (the main execution part).
●​ Later, when f 3 is evaluated (around lines 111-115), the address of the function is
loaded from v1 and the program jumps back to the function's code (_fn_count_0) to
execute it.

In Summary:

The initial jump (Jal x0, func_after) is there to ensure that the program can define the
function and then skip over it, so the function doesn’t get executed immediately when it’s
defined. This lets the program continue to its main logic and ensures that the function is only
invoked when a function application occurs (f 3 in this case). This is especially important in
assembly-level code where execution flows linearly unless explicitly controlled with jumps.

Could you show an example with two functions, so that we can see how we
find the first executable instruction of the program?
Sure, see below:

524
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The image illustrates the process of finding the first executable instruction of a program in the
context of RISC-V assembly code, focusing on skipping over function definitions.

Key Points:

1.​ Source Code and Assembly Structure:


○​ The source code defines two functions, f and g. Function f takes an argument x
and returns x + 1. Function g takes y and returns y * y.
○​ The code then computes f 3 + g 4.
2.​ Jumping over the Function Body:
○​ The assembly code begins by defining the function f (at address 000), followed
by a jump instruction (jal x0 11) at address 001. This jump is crucial for

525
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

skipping the function body (addresses 002-010) and continuing with the main
program logic, which begins at instruction 011.
3.​ Finding the First Executable Instruction:
○​ After defining the function, the program immediately jumps from address 001 to
address 011, which contains the first executable instruction. This is the point
where the program's main execution begins.
○​ Address 011 (f_1 = addi v1 0) initializes the variable f_1. The subsequent
instructions (012-022) set up and compute the results of applying the functions
f and g.
4.​ Visual Path:
○​ In the image, the dashed arrow between assembly addresses 001 and 011
shows the jump that skips the function body, directly leading to the first
executable instruction (011).
○​ This dashed line represents the control flow path that the program follows to
avoid executing the function code immediately upon definition.
5.​ Summary of the Path:
○​ The program starts execution at address 000, where v1 is initialized.
○​ At address 001, the program encounters the jal x0 11 instruction, which
jumps over the body of f and proceeds to the instruction at address 011,
marking the start of the program’s actual computation.
○​ From there, the program begins applying the functions (f and g) and performing
the required arithmetic operations.

Conclusion:

The jump at address 001 ensures that the function definitions are skipped, and the main
execution starts at address 011, which is the first executable instruction of the program.

But, in practice (like when we compile a typical C program) how does the
CPU find the first instruction to execute?
In practice, when you compile a C program (or any program) to machine code, the CPU doesn't
automatically know where to start executing. There are a few key mechanisms that tell the CPU
which instruction is the first one to execute. Let's break this down:

1. Executable Format:

When you compile a program, the machine code is wrapped in a specific executable format
(e.g., ELF for Linux, PE for Windows, Mach-O for macOS). This format contains not only the
machine instructions but also metadata that includes information such as:

●​ The entry point: This is a special address where the program execution begins.

526
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The program header: Information about memory segments and how they should be
loaded into memory.

In the case of ELF files (used in Linux), this information is stored in the header, where the entry
point address is explicitly defined.

2. The Role of the Operating System:

The operating system plays an important role in starting program execution. When you run a
program (e.g., by typing ./myprogram in a terminal):

●​ The operating system loader reads the executable file (e.g., the ELF file).
●​ It looks for the entry point address in the file's header.
●​ The OS sets up the program's memory (loading sections like code, data, stack, etc.).
●​ Finally, the CPU's program counter (PC) is set to the entry point address, and
execution begins.

3. Entry Point in C Programs:

In C programs, the typical entry point is the main() function. But this isn't the first function to
run! When you compile a C program, the compiler generates extra code that gets executed
before main(). This code is typically part of the runtime library (e.g., crt0.o in Linux). The
general sequence is:

●​ The startup code (from the runtime) sets up things like the stack, heap, and
environment variables.
●​ Once the startup code has initialized the environment, it calls the main() function.

The startup code is located at the entry point that the OS loader finds in the executable file.

4. First Executable Instruction:

The first executable instruction is typically part of the startup code, not the main() function.
For example, on Linux, this is located in crt0.o, which is linked into your program. The
instruction at the entry point might involve setting up the stack or initializing memory, and from
there, control is eventually passed to main().

5. CPU's Program Counter (PC):

The CPU knows which instruction to execute first because:

●​ When the OS starts the program, it loads the program into memory.
●​ The program counter (PC) is set to the entry point (as defined in the executable's
header).

527
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The CPU then fetches the instruction at the address in the PC and starts executing.

Can you show how we can find out the entry point of a program in Linux?
Let's compile a simple C program like this:

int main() {
return 0;
}

And inspect the binary with a tool like readelf:

readelf -h a.out

We will see something like this:

Entry point address: 0x400560

This tells the OS to start execution at memory address 0x400560, which is where the startup
code (from crt0.o) is located. The CPU will start executing instructions at this address,
eventually calling your main() function after the necessary initialization.

Summary:

●​ When you compile a program, the compiler generates machine code wrapped in an
executable format (e.g., ELF).
●​ The executable contains metadata, including the entry point, which tells the operating
system where the first instruction is.
●​ The operating system loader reads the executable and sets the CPU's program
counter (PC) to the entry point address.
●​ The CPU starts executing from the entry point, which is typically part of the startup
code, not directly from main().

That's how the CPU finds the first executable instruction in a compiled program!

Code like (fn x => fn y => x + y) 2 3 creates closures. We had


seen them before. Do you remember what a closure is?
Yes, you're correct! The first function application in the expression (fn x => fn y => x +
y) 2 3 creates a closure.

What is a Closure?

528
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

A closure is a function along with its "environment," which consists of the bindings of free
variables at the time the closure is created. It allows a function to carry its context with it, even
when it is executed outside that context. Here’s how it applies to your example:

Function Definition:​

fn x => fn y => x + y

This is a curried function. It takes one argument x and returns another function:​

fn y => x + y

Notice that in the inner function fn y => x + y, the variable x is free because it is not locally
defined in the inner function. Instead, x is captured from the environment of the outer function.

First Application: When you apply 2 to the outer function:​



(fn x => fn y => x + y) 2

The result is a new function:​



fn y => 2 + y

This is a closure. It contains:

1.​ The function fn y => x + y.


2.​ The environment where x is bound to 2.

Second Application: When you apply 3 to the resulting closure:​



(fn y => 2 + y) 3

The result is: 2 + 3 = 5

Why is the Closure Useful?

Closures are essential in functional programming because they allow functions to:

●​ Capture and retain state (e.g., x in this case).


●​ Be treated as first-class values that can be passed around and executed later with their
context intact.

In the example above, the closure fn y => 2 + y retains the value of x as 2, enabling the
inner function to use it when y is later applied.

529
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How are closures implemented?


Well, there are many different ways to implement closures. Let me introduce you to a technique
called "closure conversion", which is super cool! Closure conversion is a technique used in the
implementation of functional programming languages to manage closures explicitly during
compilation or interpretation. It transforms a program with nested functions (and their free
variables) into one where all functions are "flat" and explicitly carry their environments as
additional parameters. This enables efficient implementation of higher-order functions and
supports features like lexical scoping.

What is Closure Conversion?

In functional languages, a closure consists of:

1.​ A code pointer to the function's definition.


2.​ An environment that maps the function's free variables to their current values.

Closure conversion restructures the program so that:

●​ Every function explicitly includes its environment as an additional argument.


●​ Environments are represented as explicit data structures (e.g., records or tuples).

Steps in Closure Conversion


Identify Free Variables: Free variables are those used inside a function but defined outside it.
For example:​

fn x => fn y => x + y

●​ In fn y => x + y, x is a free variable.


●​ Create an Environment: Collect all the free variables into a structured environment
(e.g., a tuple or record).

Modify the Function: Rewrite the function to explicitly take the environment as an argument.
For example:​

fn y => x + y

becomes:​

fn (env, y) => #x env + y

where env is the environment containing x.

Create a Closure: A closure is then represented as a pair of:

530
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ A reference to the modified function.


●​ The environment containing the free variables.

Adjust Function Applications: Function applications must pass the environment explicitly. For
example:​

let
val closure = (function_pointer, environment)
in
function_pointer(environment, argument)
end

Example of Closure Conversion

Given the following expression:

(fn x => fn y => x + y) 2 3

Step 1: Identify Free Variables

●​ In fn y => x + y, the free variable is x.

Step 2: Create an Environment

●​ When x is bound to 2, the environment becomes something like {x = 2}.

Step 3: Modify the Function

Rewrite the inner function to take the environment explicitly:

(fn (env : {x : int}, y : int) => (#x env) + y);

Step 4: Create a Closure

When applying 2 to the outer function:

(fn x => fn y => x + y) 2

it creates a closure:

closure = (fn (env, y) => (#x env) + y, {x = 2})

Step 5: Adjust Applications

When applying 3 to the closure:

function_pointer(environment, 3)

531
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

becomes:

(fn (env, y) => (#x env) + y)({x = 2}, 3)

resulting in: 2 + 3 = 5. The figure below illustrates the whole process in SML/NJ:

Why is Closure Conversion Important?

1.​ Supports Lexical Scoping: Free variables are captured correctly and used in the
appropriate context.​

2.​ Facilitates Compilation: By converting closures into explicit structures, compilers can
generate efficient code for languages that support higher-order functions.​

3.​ Target Language Compatibility: It simplifies translating functional languages into


intermediate representations or machine code that may not directly support nested
functions.

Is Closure Conversion the same as Lambda Lifting?


Closure conversion and lambda lifting are related techniques in the implementation of
functional programming languages, but they are not the same. They address similar challenges
in managing free variables in functions, but their goals and mechanisms differ significantly.

Closure Conversion

●​ Goal: To explicitly represent closures (functions and their environments) in a way that
can be efficiently implemented in lower-level code, like machine code or intermediate
representations.
●​ How It Works:
○​ Transform functions to take an explicit environment parameter.
○​ Capture free variables in a structured environment (e.g., a tuple or record).

532
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Represent a closure as a pair of the function's code pointer and its environment.
●​ Key Characteristics:
○​ Preserves the lexical scoping of the original program.
○​ Used in runtime implementations, especially when higher-order functions are
common.
○​ Closures remain first-class values (can be passed as arguments, returned, etc.).

Example:​

fn x => fn y => x + y

After closure conversion:​



val closure = (fn (env, y) => (#x env) + y, {x = 2})

Lambda Lifting

●​ Goal: To eliminate nested functions by "lifting" them to the top level, converting a
program into one where all functions are top-level functions.
●​ How It Works:
○​ Identify free variables in nested functions.
○​ Add these free variables as explicit additional parameters to the lifted function.
○​ Transform the function's definition and all calls to it.
●​ Key Characteristics:
○​ Makes nested functions unnecessary.
○​ Results in a "flat" program, where all functions are at the top level.
○​ Often used in static compilation to simplify program structure.

Example:​

let
val f = fn x => fn y => x + y
in
f 2 3
end

After lambda lifting:​



fun lifted_f(x, y) = x + y
lifted_f(2, 3)

Key Differences
Aspect Closure Conversion Lambda Lifting

533
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Purpose Explicitly represent closures at Eliminate nested functions by lifting


runtime. them to top level.

Handling Free Captures free variables in an Pass free variables as additional


Vars environment structure. arguments.

Resulting Functions can still be nested, All functions become top-level.


Structure closures are explicit.

Primary Use Dynamic or runtime contexts (e.g., Static or compiled contexts where
Case interpreter backends, JITs). functions must be standalone.

Relationship Between the Two

●​ Both deal with free variables in functions.


●​ Lambda lifting is a special case of transforming nested functions, but it doesn't use
environments or create closures. It is most useful when you want to flatten functions
statically.
●​ Closure conversion is more general and accommodates runtime environments,
supporting features like passing closures as first-class values.

You can think of lambda lifting as a transformation that removes nested functions, while
closure conversion retains them but makes their dependencies explicit.

Can you show me how closure conversion would work if it were


implemented in C?
The figure below illustrates closure conversion as it would be implemented when translating
SML/NJ code into C:

534
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Elements of Closure Conversion in the Figure

1.​ Free Variables Representation:


○​ The free variable x used in the nested function addX is encapsulated in a struct
(FREE_VARIABLES_addX) in C.
○​ This struct acts as the environment for the closure and is used to capture the
free variables needed for the function's execution.
2.​ Closure Representation:
○​ A closure is represented as a struct (CLOSURE_addX) containing:
■​ A pointer to the environment struct (FREE_VARIABLES_addX).
■​ A function pointer to the actual code (addX).

535
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ This separation allows the environment and code to be linked dynamically.


3.​ Transformed Function (funToAddX):
○​ The function funToAddX now returns a pointer to a dynamically allocated
closure (CLOSURE_addX*).
○​ Inside funToAddX, the environment is allocated, and the captured variable (x) is
stored in the environment.
4.​ Function Execution:
○​ When test calls the function pointer (c->f), it passes the environment (c->t)
and the argument (n).
○​ This ensures the function has access to both its free variables (from the
environment) and its actual arguments.
5.​ Memory Management:
○​ The code explicitly allocates (malloc) and deallocates (free) memory for the
environment and the closure, which is necessary in C since there is no garbage
collector.

Relation to SML/NJ Code

●​ The original SML/NJ function funToAddX returns a nested function addX, capturing x
as a free variable.
●​ In C, this is modeled by creating a closure where:
○​ The free variable x is stored in the environment.
○​ The function pointer (addX) and the environment are bundled into a closure
struct.

Why This Is Closure Conversion

●​ The figure demonstrates how the free variable (x) and function (addX) are explicitly
represented as runtime constructs (structs in C).
●​ This transformation allows higher-order functions and lexical scoping to be implemented
in a lower-level language like C, which lacks native support for closures.

Who invented closure conversion?


Closure conversion, as a formal concept in the compilation of functional programming
languages, has been studied and developed over decades by multiple researchers in the fields
of programming languages and compiler theory. However, it is difficult to attribute the invention
of closure conversion to a single individual because it emerged as a solution to the challenges
of implementing lexical scoping and higher-order functions.

Key Historical Context:

1.​ John C. Reynolds:

536
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ In his seminal 1972 paper "Definitional Interpreters for Higher-Order


Programming Languages", Reynolds described techniques for interpreting and
implementing higher-order functions, including the idea of managing free
variables explicitly. While not using the term "closure conversion," this work laid
the groundwork for understanding closures.
2.​ Lisp and Early Functional Languages:
○​ Early implementations of Lisp in the 1960s and 1970s used mechanisms akin to
closure conversion to support lexical scoping. These implementations needed to
represent environments for nested functions.
3.​ Formalization in Compiler Research:
○​ Closure conversion was formalized as a key transformation in the compilation of
functional languages like ML and Scheme in the 1980s and 1990s. Researchers
such as Guy Steele, Mitchell Wand, and Andrew Appel contributed to
formalizing and optimizing closure conversion in compiler settings.
4.​ Cardelli and Compilers for Functional Languages:
○​ Luca Cardelli and others wrote extensively about the representation of functions
and environments in the context of implementing functional languages,
influencing how closure conversion is taught and understood.

Conclusion:

The concept of closure conversion evolved organically as part of solving practical and
theoretical problems in the implementation of functional programming languages. If you’re
interested in a foundational text, Reynolds' 1972 paper is an excellent place to start for
understanding the theoretical origins of many related concepts.

Is this the same Reynolds from "The Craft of Programming"?


Yes, the John C. Reynolds who wrote The Craft of Programming (1981) is the same renowned
computer scientist who made foundational contributions to programming languages, type theory,
and formal methods. He was a pioneer in many areas of computer science, and his work has
had a lasting impact on the design and implementation of programming languages. Here are
some of his major contributions:

537
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Definitional Interpreters for Higher-Order Programming Languages (1972)

●​ Reynolds' seminal paper introduced the idea of a definitional interpreter, a high-level


model for understanding and specifying the semantics of programming languages. This
paper also formalized concepts related to closures and lexical scoping, which are
fundamental in functional programming.

2. System F (Polymorphic Lambda Calculus)

●​ Independently of Jean-Yves Girard, Reynolds discovered the second-order lambda


calculus (also known as System F), which introduced parametric polymorphism.
●​ System F serves as a theoretical foundation for type systems in programming
languages, influencing languages like Haskell, OCaml, and Rust.

3. Separation Logic

●​ Reynolds, along with Peter O’Hearn and others, developed separation logic, a formalism
for reasoning about programs that manipulate memory.
●​ This work has been instrumental in verifying the correctness of concurrent and
imperative programs, and it is widely used in tools like Facebook’s Infer.

4. Programming Language Semantics

●​ Reynolds introduced and refined key ideas in programming language semantics,


including:
○​ Continuations: His work on continuations has influenced control flow
mechanisms, such as coroutines and exception handling.

538
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Logical Relations: He used logical relations to prove properties of polymorphic


languages, such as type safety and representation independence.

5. The Craft of Programming

●​ This book emphasizes formal reasoning about programs and algorithm design,
promoting correctness-by-construction methodologies. It bridges theoretical computer
science and practical programming.

6. Hygienic Macros

●​ Although not directly attributed to him, his exploration of scoping and bindings laid the
foundation for ideas like hygienic macros in Scheme and other languages.

7. Parametricity and "Theorems for Free!"

●​ Reynolds formalized the concept of parametricity, which states that polymorphic


functions behave uniformly across all their types. This idea led to Wadler’s famous
"Theorems for Free!", allowing for reasoning about program properties directly from their
type signatures.

Legacy:

Reynolds was a towering figure in theoretical computer science, blending mathematical rigor
with practical applications. His work underpins much of modern programming language theory
and has influenced the design of languages and tools for decades. Beyond his technical
contributions, his writings, such as The Craft of Programming, have inspired generations of
computer scientists to think deeply about the principles of programming.

539
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Memory Allocation
Code: https://github.com/pronesto/DCC053_Public/tree/main/17_MemoryAllocation

In our class about variables, we saw the semantics of mutable state. Do


you remember the rules?
Sure! Find below the original rules:

These loc objects represent memory addresses, right? In practice, how


are they allocated in an actual computing system?
Yes, the loc objects represent memory addresses, and where they are allocated depends on
how the variables they represent are handled in the actual system. Here are the typical memory
areas where these addresses (loc) could be allocated:

1. Static Memory:

●​ What it is: Static memory is allocated at compile-time for global or static variables.
These variables persist throughout the entire program's execution.
●​ When used: If your semantics include global variables or variables that are defined
with static storage duration (like those declared outside functions in C or using the
static keyword), their addresses (loc) would refer to locations in static memory.
●​ Advantages: Static memory is simple to manage since it exists throughout the
program's life.
●​ Disadvantages: Limited flexibility; once allocated, static variables remain in memory
until the program exits.

2. Stack Memory:

540
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ What it is: Stack memory is used for local variables within functions. Each function call
creates a new stack frame, which holds its local variables.
●​ When used: The addresses (loc) for local variables in functions (e.g., those in let
expressions) would typically point to locations in stack memory. These variables are
allocated when the function is called and deallocated when the function returns.
●​ Advantages: Stack allocation is very efficient since memory is automatically reclaimed
when functions exit.
●​ Disadvantages: Limited in size; addresses (loc) are only valid as long as the function’s
stack frame exists (i.e., can't return a reference to a local variable).

3. Heap Memory:

●​ What it is: Heap memory is dynamically allocated at runtime. The programmer controls
allocation (e.g., with malloc in C, new in languages like C++ and Java).
●​ When used: If the language supports dynamic memory allocation, then the addresses
(loc) would correspond to heap memory. For example, when you explicitly allocate
memory for a variable, its address will be in the heap.
●​ Advantages: More flexible, as you can control the lifetime and size of objects.
●​ Disadvantages: Managing heap memory requires explicit allocation and deallocation,
which can lead to memory leaks or fragmentation if not handled properly.

Allocation in your semantics:

In the operational semantics we are working with, loc is an abstract representation of memory
locations, but in practice, it would map to one of these memory regions depending on the
variable's scope and lifetime:

●​ Global variables would typically have addresses in static memory.


●​ Local variables (those declared within functions or blocks) would have addresses in
stack memory.
●​ Dynamically allocated variables (if your language supports them) would be in the
heap.

If you’re extending the language semantics to support heap allocation, you would also need to
add rules for memory allocation and deallocation. For instance, for a dynamic allocation
operation (like malloc), you would create a new location (loc) on the heap and update the
store (S). But this is a story for another chapter :)

Could you give me an example of a C program that allocated something in


each of these areas?
Certainly! Below is an example C program that demonstrates allocations in static memory, the
stack, and the heap. I'll explain each type of allocation.

541
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
// Static allocation:
char* s0 = "DCC053 is awesome!";

const size_t SIZE = strlen(s0);

// Stack allocation:
char s1[SIZE + 1];
strcpy(s1, s0);

// Heap allocation:
char* s2 = malloc(SIZE + 1);
strcpy(s2, s0);

printf("Addr = %p, Contents = %s\n", s2, s2);


printf("Addr = %p, Contents = %s\n", s1, s1);
printf("Addr = %p, Contents = %s\n", s0, s0);

free(s2);
return 0;
}

Let’s analyze where each variable (s0, s1, and s2) is allocated based on the provided code and
memory layout options:

1. char* s0 = "DCC053 is awesome!";

●​ Where it is allocated:​
The string literal "DCC053 is awesome!" is stored in static memory, specifically in
the read-only data section (e.g., .rodata), because string literals are immutable and
shared across the program.
●​ Type of memory: Static Memory
●​ Example of address in output: 0x1047cbf78

2. char s1[SIZE + 1];

●​ Where it is allocated:​
The array s1 is declared as a local variable with automatic storage duration. It is
allocated on the stack memory. The contents of s0 are copied into s1 using strcpy.
●​ Type of memory: Stack Memory
●​ Example of address in output: 0x16b637520

3. char* s2 = malloc(SIZE + 1);

542
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Where it is allocated:​
The pointer s2 is stored in the stack memory (since it's a local variable), but the memory
it points to is dynamically allocated using malloc(), which means it resides in the heap
memory. The contents of s0 are copied into this heap-allocated memory using strcpy.
●​ Type of memory: Heap Memory (for the allocated block)
●​ Example of address in output: 0x600002d10020

Summary Table
Variable Storage Type Memory Segment Address in Output

s0 Static (Read-Only) Static Memory (.rodata) 0x1047cbf78

s1 Automatic (Local) Stack Memory 0x16b637520

s2 Dynamic (Heap Block) Heap Memory 0x600002d10020

Important Observations

1.​ String Literals:


○​ "DCC053 is awesome!" is stored in .rodata, which is why s0's address is
from a static, read-only memory section.
2.​ Stack Allocation:
○​ s1 resides on the stack, which is typically located at a much lower memory
address than the heap on modern systems.
3.​ Heap Allocation:
○​ The heap memory is managed dynamically and its address space is separate
from both the stack and the static memory, often at higher memory addresses.

This segmentation explains the program's behavior and memory allocation as reflected in the
output.

Can you explain how memory allocation would happen in the SML program
below?

543
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The explanations in the figure describe how variables are allocated in memory:

●​ Static Allocation (for w):


○​ The figure explains that the address of w will never change, making it suitable for
static allocation. This is correct because w is defined in the outer scope and its
value remains constant (13). In a typical system, this kind of variable would be
stored in static memory (similar to global variables in languages like C), as its
lifetime is tied to the entire program’s execution.
●​ Stack Allocation (for x):
○​ The variable x is defined inside the function g, and its address depends on when
the function is called. This is an appropriate explanation for stack allocation.
Each time g is called, a new activation record is created on the stack, and x is
allocated within that record. Its lifetime is limited to the duration of the function
call. When the function returns, the memory for x is released, and its value is no
longer accessible.
○​ This behavior corresponds to typical function calls where local variables are
stored in the stack frame and get deallocated once the function execution ends.

Overall:

The figure shows the distinction between static allocation (for w) and stack allocation (for x). The
program shows how the memory allocation strategy depends on the variable's lifetime and
scope:

544
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ w: Static allocation, because its lifetime is the entire program.


●​ x: Stack allocation, because its lifetime is tied to the function g's activation on the call
stack.

The explanation aligns with how memory is managed in most programming languages, where
variables with global scope can be statically allocated, while local variables within functions use
stack memory.

Can you explain how allocations happen in practice, using a C program


and its assembly file in ARM?

In the assembly code provided below, we can identify where the variables like aux and answer
are allocated by analyzing the stack operations and memory access patterns. Let's break it
down:

Key Concepts:

●​ sp (stack pointer): This points to the top of the current stack frame.
●​ x29 (frame pointer): This points to the base of the stack frame.
●​ x30 (link register): This stores the return address of the current function.

The stack frame is set up at the beginning of the function with the instruction:

sub sp, sp, #32

This allocates 32 bytes of space on the stack.

545
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Where are the variables stored?

answer (global variable)


The variable answer is a global variable, so it is not stored on the stack. In the assembly code,
you can see this reference:

adrp x9, _answer@PAGE

546
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

str w8, [x9, _answer@PAGEOFF]

●​ adrp x9, _answer@PAGE: Loads the base address of the page containing answer
into register x9.
●​ str w8, [x9, _answer@PAGEOFF]: Stores the value of w8 (which contains m) into
the answer variable.

This means that answer is stored in the global memory (usually part of the data section of the
binary).

m and n (function parameters)

The function parameters m and n are passed in registers w0 and w1, but they are spilled (stored)
on the stack early in the function to free up the registers for other purposes:

stur w0, [x29, #-4] ; Store m at [x29 - 4]


str w1, [sp, #8] ; Store n at [sp + 8]

●​ m is stored at [x29 - 4] (using the frame pointer x29).


●​ n is stored at [sp + 8] (using the stack pointer sp).

This means both m and n are allocated on the stack. m is stored 4 bytes below the frame pointer,
and n is stored at an 8-byte offset from the stack pointer.

aux (local variable)

aux is a local variable that stores the result of m - n. You can see this value being calculated
and stored in the following lines:

subs w8, w8, w9 ; aux = m - n (subtract w9 from w8)


str w8, [sp, #4] ; Store aux at [sp + 4]

●​ The result of m - n is stored in register w8.


●​ aux is stored at [sp + 4], 4 bytes above the base of the stack frame.

This means that aux is also allocated on the stack, specifically at an offset of 4 bytes from the
stack pointer.

Summary:

●​ answer: Global variable, stored in global memory (data section).


●​ m: Stored at [x29 - 4] (4 bytes below the frame pointer), on the stack.
●​ n: Stored at [sp + 8] (8 bytes above the base of the stack), on the stack.

547
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ aux: Stored at [sp + 4] (4 bytes above the base of the stack), on the stack.

These variables are managed in different memory regions, and the stack is used to store both
local variables (aux) and spilled function parameters (m and n). The global variable answer is
stored separately in the program's data section.

Can you guess how the global variable is allocated in the assembly code
we have seen?
We can see the allocation in the following two lines:

.globl _answer ; @answer


.zerofill __DATA,__common,_answer,4,2

These lines indicate that the variable answer is a global variable and gives details about how
it's allocated in memory. Let's break it down:

1. .globl _answer

●​ .globl is an assembler directive that marks answer as a global symbol. This means
that the variable answer can be accessed from other files (not just the one where it's
declared). It's essentially making the symbol visible to the linker, allowing the variable
to be used across different translation units in a C program.
●​ The _answer is the name of the symbol (variable) being defined.

2. .zerofill __DATA,__common,_answer,4,2

This directive is more specific and gives detailed information on how the global variable answer
is laid out in memory.

●​ .zerofill: This tells the assembler to allocate a block of zero-initialized memory.


The memory allocated for answer will be initialized to zero when the program starts.
This is typical of uninitialized global variables in C (like int answer; without an explicit
initial value). The use of zero-initialization happens in the BSS section of a program
(which is often merged into the .data section at runtime).
●​ __DATA: This is the name of the segment where the variable will be allocated. On
macOS and related platforms, global data (including variables) is stored in a __DATA
segment.
●​ __common: This specifies the section within the __DATA segment. The __common
section is where uninitialized global variables are stored. It's equivalent to the BSS
(Block Started by Symbol) section in other systems (like ELF on Linux). In simple
terms, it's a space reserved for uninitialized global variables.

548
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ _answer: This is the name of the symbol being allocated in this section (the answer
variable).
●​ 4: This specifies the size (in bytes) of the allocation. Since answer is an int, and on
most systems an int is 4 bytes, this means answer is allocated 4 bytes of memory.
●​ 2: This is the alignment requirement. It tells the system that answer must be aligned on
a 2-byte boundary. In this case, the alignment is 2 bytes, which is typical for smaller data
types like short, but for an int, the alignment would more commonly be 4 bytes. The
smaller alignment can be a hint that the code was compiled with some specific settings
or optimization criteria (though it’s not uncommon for some systems to relax alignment
constraints).

What it means:

●​ Global: answer is a global variable, accessible across different source files in the
program.
●​ Uninitialized: It is stored in the BSS (or __common) section and will be zero-initialized
when the program starts.
●​ Size: It takes up 4 bytes (consistent with an int).
●​ Alignment: It is aligned on a 2-byte boundary.

In summary, the code reserves space for a 4-byte global integer answer in the program’s data
segment, specifically in an area where zero-initialized variables are stored. When the program
starts, answer will be initialized to 0 by the system before the program runs.

What if you were producing code for RISC-V, how would the allocation of
variables be?
Data allocation in RISC-V is very similar to ARM. The figure below shows the key parts of an
(unoptimized) assembly file:

549
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's go step by step and explain how variables are allocated in this version of the RISC-V
assembly code to help a computer science student understand it.

1. Stack Allocation Basics

In RISC-V, the stack is a region of memory used to store data that doesn't fit in registers or
must persist across function calls. The stack grows downwards (i.e., the stack pointer, sp, is
decremented when allocating space and incremented when releasing space).

550
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the modified assembly code, we use the stack to store local variables (m, n, and aux), as
well as saved values like the return address and frame pointer.

2. Allocating Stack Space

At the beginning of the gcd function, the following code reserves 28 bytes on the stack:

addi sp, sp, -28 # Allocate 28 bytes on the stack

The 28 bytes are used to store:

●​ 4 bytes for the return address (ra).


●​ 4 bytes for the frame pointer (s0).
●​ 4 bytes for m.
●​ 4 bytes for n.
●​ 4 bytes for aux (the result of m - n).

This gives us a total of 28 bytes (4 bytes each * 7 slots).

3. Saving Values on the Stack

Return Address and Frame Pointer:

Before doing anything with the function arguments, we save the return address (ra) and frame
pointer (s0) onto the stack, so we can restore them later when the function returns:

sw ra, 24(sp) # Save return address at the top of the stack (offset 24)
sw s0, 20(sp) # Save frame pointer just below it (offset 20)

●​ The return address (ra) is stored at an offset of 24 bytes from the current stack pointer
(sp).
●​ The frame pointer (s0) is stored at an offset of 20 bytes.

Function Parameters m and n:

In the original program, m and n are passed as function arguments. In RISC-V, these arguments
are passed in registers a0 and a1. But since we want to store them on the stack, we do:

sw a0, 16(sp) # Store m (passed in a0) at offset 16 on the stack


sw a1, 12(sp) # Store n (passed in a1) at offset 12 on the stack

This stores:

●​ m at offset 16 bytes from sp.

551
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ n at offset 12 bytes from sp.

At this point, the stack looks like this:

+----------------------+ <- Address sp + 28 (original sp)


| Return Address (ra) | <- Address sp + 24
+----------------------+
| Frame Pointer (s0) | <- Address sp + 20
+----------------------+
| m (from a0) | <- Address sp + 16
+----------------------+
| n (from a1) | <- Address sp + 12
+----------------------+
| aux | <- Address sp + 8 (space for aux)
+----------------------+

4. Accessing Variables from the Stack

When we need to perform operations with m and n, we first load them back from the stack into
registers for computation:

lw t0, 16(sp) # Load m from stack into register t0


lw t1, 12(sp) # Load n from stack into register t1

Now t0 holds the value of m, and t1 holds the value of n, allowing us to perform comparisons
or calculations.

Handling the aux Variable:

In the program, aux is computed as m - n. After calculating aux, we store it in the stack:

sub t2, t0, t1 # Compute aux = m - n, store result in t2


sw t2, 8(sp) # Save aux to stack at offset 8

Here, t2 holds the value of aux, and it is stored at offset 8 from the stack pointer.

5. Recursive Calls

When making a recursive call, we need to pass updated values of m and n. Instead of storing
these values in registers, we continue using the stack to keep them:

lw t0, 12(sp) # Load n (old n) into t0


lw t1, 8(sp) # Load aux into t1 (new n for the recursive call)
sw t0, 16(sp) # Store the new m (old n) back on the stack at offset 16
sw t1, 12(sp) # Store the new n (aux) back on the stack at offset 12

552
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This way, we keep the values for the recursive call in the stack, so the program's memory stays
well-managed.

6. Restoring Values and Returning

After the recursive call completes, we must restore the saved ra and s0 values from the stack,
so the program can return to the correct location:

lw ra, 24(sp) # Restore return address from stack


lw s0, 20(sp) # Restore frame pointer from stack

Finally, we deallocate the stack space by restoring the stack pointer to its original position:
addi sp, sp, 28 # Restore stack pointer (deallocate 28 bytes)

Summary:

●​ Global variables like answer are stored in memory, and we access them via memory
instructions like lui and sw.
●​ Function arguments (m and n) are passed in registers but saved to the stack at specific
offsets (16(sp) and 12(sp)).
●​ Local variables like aux are stored on the stack (8(sp)).
●​ The stack is used to store data that needs to persist across recursive calls or when
registers are limited.

In the RISC-V version of the code, the stack is used to manage all variables (m, n, and aux),
demonstrating how data can be stored and accessed in a structured manner when registers are
not sufficient or when we need to save values across function calls. This technique is crucial in
managing memory in low-level systems programming.

So, we have heap, stack, static memory and such. How are these things
organized?
The memory layout of a program depends on several factors—including the architecture, the
operating system (OS), and even the compiler. Here's a typical memory layout for a running
program, dividing memory into its main sections: text (where the program code resides), data
(for global/static variables), heap (for dynamic memory allocation), and stack (for local variables
and function call management).

553
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Here is a concise explanation of the role of each memory segment:

1.​ .text: The text segment stores the program's executable instructions. It is typically
read-only and executable to ensure security and prevent accidental modification.
2.​ .rodata: The read-only data segment contains immutable data, such as string literals
and constants. This section is non-writable to protect static data from accidental
changes.
3.​ .data: The data segment holds global and static variables that are initialized explicitly.
These variables persist throughout the program's execution and are writable.
4.​ .bss: The bss segment contains uninitialized global and static variables. These
variables are zero-initialized by default when the program starts.
5.​ Heap: The heap is used for dynamically allocated memory during runtime. Memory in
the heap is managed manually by the program, typically using functions like malloc and
free. It grows upward in memory
6.​ Stack: The stack is used for local variables, function call frames, and return addresses.
It grows downward in memory and is managed automatically as functions are called and
terminated.
7.​ Shared Libraries: Shared libraries, such as dynamically linked functions, are loaded
into memory by the operating system. These reside at higher addresses to keep them
separate from the program's static and dynamic memory.

554
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

8.​ Kernel: In a typical user-space program, the kernel (operating system) resides at the top
of memory. It handles system-level operations such as managing hardware resources
and memory protection.

This memory layout is common for many systems, including those running Unix-like operating
systems. The actual layout may vary depending on the architecture and the operating system,
but this gives a conceptual understanding of how memory is organized.

Why is there a distinction between initialized and non-initialized data


segments?

The distinction between the .data and .bss segments in an executable file is rooted in
efficiency, both in terms of memory usage and the size of the executable itself. Here's why the
distinction is important:

1. Memory Initialization

●​ .data Segment:
○​ Stores initialized global and static variables.
○​ These variables have explicitly defined initial values in the source code (e.g., int
x = 10;).
○​ The initial values need to be stored in the executable file so that they can be
loaded into memory at runtime.
●​ .bss Segment:
○​ Stores uninitialized global and static variables (or variables initialized to zero,
such as int y; or int z = 0;).
○​ Since these variables are zero-initialized by default, there’s no need to store their
values in the executable file. Instead, the operating system simply allocates
zero-filled memory at runtime.

2. Executable File Size

●​ By separating the .bss segment, the executable file avoids storing large blocks of
zeroes that would represent uninitialized data. Instead, the file only includes metadata
(e.g., the size of the .bss segment), which tells the operating system how much
memory to reserve and initialize to zero during program loading.
●​ This approach makes the executable file smaller and more efficient to store and transfer.

3. Runtime Efficiency

●​ At runtime, initializing a large block of memory to zero is typically a fast and


hardware-optimized operation. The operating system can allocate and clear memory for
the .bss in one step.

555
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The .data segment, by contrast, involves copying the predefined initial values from the
executable file into memory, which is more costly than zeroing memory.

4. Historical and Practical Reasons

●​ Historically, this separation was crucial for systems with limited storage and memory
resources. It minimized the size of the program stored on disk or tape while still ensuring
the program could be correctly initialized at runtime.
●​ Modern systems retain this convention because it still provides clarity and efficiency,
especially in systems programming and embedded contexts where executable size and
memory initialization cost are significant concerns.

By distinguishing between the .data and .bss segments, operating systems and compilers
optimize both the storage and the runtime behavior of programs.

Why are string literals placed in a read-only segment?


The rationale for making string literals read-only stems from both practical considerations and
language design principles:

1. Efficiency and Memory Optimization


Avoids Duplication: String literals are often placed in a read-only section of memory shared
across the program. If the same literal appears multiple times, many compilers store only one
instance, saving memory.​

char *str1 = "Hello";


char *str2 = "Hello"; // Both point to the same memory

●​ Reduces Memory Usage: By making string literals read-only, the system can store
them in low-cost, read-only memory regions instead of allocating writable memory for
each instance.

2. Prevents Accidental Modification

●​ String literals represent constant values that are generally not intended to be changed.
Making them immutable enforces this and helps avoid bugs caused by unintentional
modification of shared literals.

Example: Imagine modifying "Hello" globally by mistake:



char *str = "Hello";
str[0] = 'J'; // If writable, "Hello" would become "Jello" everywhere

3. Consistency with Other Constants

556
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ String literals align with the immutability of other constants in C. Just as const int x
= 42; prevents changes to x, making string literals read-only enforces their constancy.

4. Memory Protection and Security

●​ Read-only sections of memory can be marked as non-writable by the operating system,


providing an additional layer of security. This prevents malicious or accidental overwrites
of string literals, which could lead to vulnerabilities.
●​ Example: Buffer overflows targeting writable literals could allow attackers to inject
arbitrary code.

5. Compatibility with Historical Design

●​ In the early days of C, string literals were not explicitly read-only, leading to bugs when
programmers unintentionally modified them. Later implementations standardized this
behavior to prevent such issues while maintaining backward compatibility for programs
that treated literals as char * pointers (though this is discouraged now).

6. Encourages Proper Coding Practices

●​ Immutable string literals encourage developers to adopt better practices, such as using
writable char[] arrays for modifiable strings and keeping literals immutable where they
are meant to be constants.

Summary

Making string literals read-only reflects a balance between efficiency, security, and clarity. It
aligns with the principle of treating literals as constants, prevents inadvertent bugs, and takes
advantage of modern memory management strategies to optimize performance and security.

So, this memory layout depends on what? The architecture? The OS? The
compiler?
Let's break down how each of these influences the layout:

1. Architecture (CPU/Hardware)

●​ Endianness (little-endian or big-endian) affects how multi-byte data (like integers) are
stored, but not the general layout.
●​ Address space size, such as 32-bit or 64-bit, dictates how much memory is available.
For example, a 32-bit system has a maximum addressable memory of 4 GB, while a
64-bit system can theoretically access much larger memory spaces (e.g., terabytes).

557
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Memory management hardware can affect the layout. Some architectures may have
specific memory segments reserved for hardware-level operations or certain kinds of
memory protection (e.g., ARM's memory models vs. x86).

2. Operating System (OS)

●​ The OS manages memory at a high level, including how the text, data, stack, and heap
segments are placed in virtual memory.
●​ Virtual memory and address space layout randomization (ASLR): Modern operating
systems often randomize the locations of the stack, heap, and other sections in memory
to increase security. This means that even if the general layout is the same, the actual
memory addresses can vary every time a program runs.
●​ Some OSes use separate address spaces for user programs and kernel code, while
others (especially embedded systems) may not have such strict separation.​
Examples of OS-specific factors:
○​ Unix/Linux-based systems: Follow the general layout I showed earlier but
include additional features like shared libraries loaded dynamically at runtime.
○​ Windows: Follows a similar layout but has its own memory management
mechanisms, such as how it handles dynamic link libraries (DLLs) and system
calls.

3. Compiler

●​ The compiler decides how local variables, global variables, and constants are stored in
memory, and it may align data in memory for performance reasons (e.g., padding to
ensure memory alignment for faster access).
●​ Optimization levels: Higher optimization settings can lead to more aggressive memory
usage, such as reducing stack space by reusing variables or inlining functions.
●​ Some compilers insert additional data for debugging (like stack canaries to detect
buffer overflows) or profiling, which may change the layout slightly.
●​ The calling convention (decided by the compiler or the OS/architecture) affects how
function parameters and return values are passed—whether they use registers or the
stack, and how the stack frame is managed during function calls.

Examples of Differences:

32-bit vs 64-bit Architectures:

●​ In 32-bit systems, the stack might be closer to the upper memory limit (close to 4GB),
while in 64-bit systems, there’s much more room, so the stack and heap might be placed
further apart.

Operating System Example:

558
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Windows: For 32-bit Windows programs, there is a special segment called the PE
header used for managing executable formats, which might not exist on Linux-based
systems.

Compiler Example:

●​ GCC vs Clang: Both are C compilers but might make different decisions about the order
in which data is laid out, how much padding is used, or where variables are placed.

Summary

●​ Architecture dictates the fundamental memory capabilities (address space size,


memory management hardware, etc.).
●​ Operating system manages memory at a higher level, influencing the layout of
user-space programs, whether virtual memory is used, and if features like ASLR or
memory protection are active.
●​ Compiler generates the specific machine code and data layouts that are executed on
the hardware, deciding how to organize the stack, heap, and global data.

Despite these variations, the general memory layout (Text, Data, Heap, Stack) remains similar,
but the specifics—like exact memory addresses, alignment, and optimizations—can differ based
on these factors.

Can you print addresses within the kernel space?


Accessing the kernel memory is not allowed in normal user programs for security reasons. To
interact with kernel space, you would need to write a kernel module or use system calls, but that
is outside the scope of regular C programs due to privilege restrictions. It is possible to know
where the kernel memory starts in a system, although the exact address can depend on the
architecture, operating system, and whether certain security features like Kernel Address
Space Layout Randomization (KASLR) are enabled.

General Rules for Kernel Memory Addresses

In most systems, the kernel is mapped to the upper portion of the virtual memory space, leaving
the lower part for user space. The distinction is enforced by the operating system, and
user-space programs are not allowed to access kernel memory directly.

Typical Address Ranges

●​ 32-bit systems: The kernel memory typically starts at 0xC0000000. The upper 1 GB of
memory (from 0xC0000000 to 0xFFFFFFFF) is reserved for the kernel, while the lower
3 GB (from 0x00000000 to 0xBFFFFFFF) is used by user-space programs.

559
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ 64-bit systems: The kernel memory typically starts higher up, often above the
canonical address boundary of 0xFFFFFFFF80000000 or similar, depending on the
architecture.

Linux Systems

●​ In a 64-bit Linux system:


○​ User space: Memory ranges below 0x00007FFFFFFFFFFF.
○​ Kernel space: Memory ranges above 0xFFFF800000000000 (on many
systems).

KASLR

●​ With Kernel Address Space Layout Randomization (KASLR) enabled, the start of the
kernel memory will be randomized at boot time to a different location each time. This is
a security feature that makes it harder for attackers to predict kernel addresses and
exploit vulnerabilities.
○​ On systems with KASLR, you may not be able to predict or see the kernel
address range as a regular user.

How is the memory layout defined then? I mean, in a Linux-like operating


system?
The memory layout of a program (or even an entire system) is defined by several factors, and
the linker script (commonly named a .ld file) plays a key role in specifying how the memory is
organized for a program.

1. What Defines the Memory Layout?

A typical system's memory layout is influenced by several layers, including:

●​ Processor architecture (e.g., x86, ARM, RISC-V): This defines how virtual and physical
memory are mapped and the range of addresses.
●​ Operating system (OS): The OS manages the division of memory between user-space
and kernel-space, including virtual memory management.
●​ Linker script (.ld file): This defines how a program's sections (like .text, .data,
.bss, etc.) are placed in memory when the program is compiled and linked.

2. Role of the Linker Script (.ld File)

The linker script is a file that the linker (like GNU ld) uses to control the memory layout of a
program. It is typically used in bare-metal or embedded systems where you need explicit control
over memory layout, but it can also be used in standard applications to customize the layout.

560
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the .ld file, you define how the different segments of your program (like the code, data, and
stack) are placed in memory. This is particularly important for low-level programming, such as
operating systems, embedded systems, and bootloaders, where precise control of memory is
needed.

3. Example Memory Layout in a .ld File

Here's an example of what a basic linker script (memory.ld) might look like:

/* Define the memory regions of the system */


MEMORY
{
/* Define RAM and ROM (or FLASH) memory regions */
/* Read-only memory for code */
ROM (rx) : ORIGIN = 0x08000000, LENGTH = 512K
/* Read-write memory for stack, heap, data */
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
}

/* Define the entry point of the program (usually the reset handler) */
ENTRY(_start)

SECTIONS
{
/* .text section: Code goes into the ROM region (executable, read-only) */
.text : {
*(.text) /* Place all .text sections (code) here */
*(.rodata) /* Place read-only data here (like constants) */
_etext = .; /* Mark the end of the .text section */
} > ROM

/* .data section: Initialized data goes into RAM (read-write) */


.data : {
_sdata = .; /* Start of initialized data */
*(.data) /* All .data sections */
_edata = .; /* End of initialized data */
} > RAM AT > ROM /* Loaded from ROM at runtime but placed in RAM */

/* .bss section: Uninitialized data (zeroed at startup) goes into RAM */


.bss : {
_sbss = .; /* Start of uninitialized data */
*(.bss) /* All .bss sections */
*(COMMON) /* Common symbols */
_ebss = .; /* End of uninitialized data */
} > RAM

/* Stack setup: Place the stack at the end of RAM */


_estack = ORIGIN(RAM) + LENGTH(RAM); /* Top of stack (end of RAM) */
.stack : {
_sstack = .; /* Start of stack */
. = _sstack + 4K; /* Allocate 4KB for stack */
_estack = .; /* End of stack */
} > RAM

561
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

/* Heap setup: Place the heap below the stack */


_sheap = _ebss; /* Heap starts after the .bss section */
.heap : {
_sheap = .; /* Start of heap */
. = _sheap + 16K; /* Allocate 16KB for heap */
_eheap = .; /* End of heap */
} > RAM

/* Provide symbols for C runtime initialization */


_sidata = LOADADDR(.data); /* Initial data location in ROM */
_sdata = ADDR(.data); /* Address of .data in RAM */
}

Explanation of Key Elements

1.​ MEMORY Block


○​ This block defines the memory regions available to the program.
○​ ROM: This is the region where the program code (.text section) and read-only
data are placed. In embedded systems, this is often FLASH or some non-volatile
memory.
○​ RAM: This is where the stack, heap, and initialized/uninitialized variables reside.
It's defined as read-write-execute (rwx), which means it can be used for
dynamic data.
2.​ SECTIONS Block
○​ This block controls how different sections of the program are placed in memory.
○​ .text Section: This is where the program’s code (e.g., functions) and read-only
data (e.g., constants) go. The section is placed in ROM (executable and
read-only). The symbol _etext marks the end of the section.
○​ .data Section: This section holds initialized variables (global or static). These
variables are initially stored in ROM, but at runtime, they are copied to RAM. The
AT > ROM directive ensures that the initial values are loaded from ROM.
○​ .bss Section: This section contains uninitialized variables that are automatically
zeroed at program startup. It's placed in RAM.
3.​ Stack Setup
○​ The stack is placed at the top of the RAM (the highest addresses), which is
typical in many systems.
○​ In this example, the script allocates 4KB for the stack (_sstack to _estack).
○​ The stack grows downward in memory, which means the top of the stack is at
the end of RAM, and it expands into lower addresses as needed.
4.​ Heap Setup
○​ The heap is placed below the stack in RAM. It starts right after the .bss section,
which contains the uninitialized data.
○​ The example script allocates 16KB for the heap (_sheap to _eheap).
○​ The heap grows upward in memory, so it expands towards the stack as the
program dynamically allocates memory.

562
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5.​ Symbols for Runtime Initialization


○​ _sidata: Points to the start of initialized data in ROM (used to copy data to
RAM at startup).
○​ _sdata: Points to the start of the .data section in RAM (the runtime location of
initialized data).
○​ These symbols help the C runtime (or startup code) to set up the memory for the
program, particularly initializing the .data section and zeroing the .bss section.

Additional Notes:

●​ The stack and heap are placed at opposite ends of the RAM. This setup allows them to
grow towards each other, ensuring that unused space in the middle of RAM is available
for both stack and heap expansion.
●​ The .text section is marked read-only in ROM, and the .data and .bss sections are
placed in RAM to allow modification at runtime.
●​ The AT > ROM directive ensures that the initialized data stored in ROM is copied to
RAM when the program starts.

Customizations You Can Make

●​ Stack size: You can adjust the stack size by changing the size in the .stack section
(e.g., 4K).
●​ Heap size: You can adjust the heap size similarly in the .heap section (e.g., 16K).
●​ Memory regions: You can define other memory regions as needed for different
platforms (e.g., for embedded systems with different types of RAM or ROM).

Final Thoughts

A linker script allows you to explicitly control how memory is allocated in your program. It's
necessary for systems where memory is limited or needs to be managed carefully, such as in
embedded systems or bare-metal programming. In high-level applications, the toolchain
usually manages these details automatically, but understanding the linker script gives you more
control over your system's memory layout.

How can I see the .ld file in my Linux distro, assuming I am using Ubuntu
or something like that?
Ubuntu (and most Linux distributions) use linker scripts (.ld files), but these are typically
provided by the system's linker (usually ld from the GNU Binutils) and the C library (e.g.,
glibc or musl), and they aren't usually modified directly in everyday use. Instead, they're
embedded in the toolchain or in the standard libraries.

To locate linker scripts on Ubuntu, you can follow these steps:

563
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Find the Linker Script for Your Toolchain (GCC/Clang)

The default linker script is usually provided by ld. To see which script is being used by default,
you can pass the --verbose flag to the ld linker:

ld --verbose

This will output a lot of information, including the default memory layout and the location of the
system's default linker script. Look for lines similar to:

using internal linker script:


==============================
/* Script content here */

This will print the default linker script, which is embedded within the toolchain, but it’s not an .ld
file you typically edit directly. However, this shows the default configuration that the linker uses.

2. Modifying or Creating Your Own Linker Script

If you want to modify or create your own custom linker script, you can create a .ld file in your
project directory, and then tell the linker to use it by passing the -T option when compiling:

gcc -o myprogram myprogram.o -T mylinkerscript.ld

This will tell GCC (or Clang) to use your custom .ld file.

Summary

●​ You can see the default linker script by running ld --verbose.


●​ You can search for .ld files using find / -name "*.ld".
●​ Common locations include /usr/lib/gcc/, /usr/lib/ldscripts/, or project
directories.
●​ You can create your own linker script and tell the linker to use it with the -T option.

How does the compiler know which variables are global and local?
The compiler distinguishes between global and local variables based on their scope and
placement within the source code, following the rules of the programming language (like C,
C++, etc.). Here's how it works:

1. Global Variables:

564
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Declared Outside Any Function: Global variables are declared outside of all functions,
usually at the top of the file, and have a file-wide or program-wide scope.
●​ Storage Class: Global variables are stored in the data segment (for initialized global
variables) or the BSS segment (for uninitialized ones) of the memory. These segments
are managed by the operating system and persist for the lifetime of the program.
●​ Linker Role: The linker is responsible for managing global variables across different
source files if they are declared as extern or have external linkage.
●​ Visibility: Global variables are visible throughout the entire file (or the entire program, if
extern) unless explicitly marked as static, which restricts their visibility to the current
file.

Example of Global Variable:

int global_var = 10; // Global variable

void someFunction() {
global_var = 20; // Can access global variable here
}

2. Local Variables:

●​ Declared Inside a Function: Local variables are declared inside a function or a block
(like an if or for block) and have a function-level or block-level scope.
●​ Storage Class: Local variables are typically stored on the stack, and their lifetime is
restricted to the execution of the function or block in which they are declared. When the
function is called, space is allocated on the stack, and when the function returns, the
space is deallocated.
●​ Compiler Role: The compiler tracks local variables within the function's symbol table. It
assigns them stack locations (based on the architecture) and ensures they're valid only
within the function or block.

Example of Local Variable:

void someFunction() {
int local_var = 5; // Local variable
// Only accessible inside this function
}

How the Compiler Differentiates:

1.​ Parsing the Source Code:


○​ During the parsing phase, the compiler reads the source code and identifies the
scope of each variable.

565
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If a variable is declared outside any function or block, the compiler flags it as a


global variable.
○​ If a variable is declared inside a function or block, the compiler flags it as a local
variable.
2.​ Symbol Table:
○​ The compiler maintains a symbol table, which holds information about variables,
their types, scopes, memory locations, and more.
○​ Global variables are entered into a global symbol table, while local variables
are placed in a function-level symbol table.
3.​ Variable Lifetime:
○​ Global variables have static storage duration, meaning they exist for the entire
life of the program.
○​ Local variables have automatic storage duration, meaning they exist only
during the function's execution.
4.​ Variable Addressing:
○​ Global variables are assigned fixed memory addresses by the linker, often in
the data or BSS segment.
○​ Local variables are allocated space dynamically on the stack, based on the
stack pointer and frame pointer.

Can you write a visitor that collects all the global variables in our SML-like
language (assuming all the variables have different names)?
The program below implements a visitor pattern to classify global variables in a functional
language with constructs like Let, Fn (functions), App (function applications), and Var
(variables).

from abc import ABC, abstractmethod

class Expression(ABC):
@abstractmethod
def accept(self, visitor, arg):
raise NotImplementedError

class Var(Expression):
def __init__(self, identifier):
self.identifier = identifier
def accept(self, visitor, arg):
return visitor.visit_var(self, arg)

class Num(Expression):
def __init__(self, num):
self.num = num
def accept(self, visitor, arg):
return visitor.visit_num(self, arg)

566
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class Let(Expression):
def __init__(self, identifier, exp_def, exp_body):
self.identifier = identifier
self.exp_def = exp_def
self.exp_body = exp_body
def accept(self, visitor, arg):
return visitor.visit_let(self, arg)

class Fn(Expression):
def __init__(self, formal, body):
self.formal = formal
self.body = body
def accept(self, visitor, arg):
return visitor.visit_fn(self, arg)

class App(Expression):
def __init__(self, function, actual):
self.function = function
self.actual = actual
def accept(self, visitor, arg):
return visitor.visit_app(self, arg)

class Visitor(ABC):
@abstractmethod
def visit_var(self, exp, arg):
pass

@abstractmethod
def visit_num(self, exp, arg):
pass

@abstractmethod
def visit_let(self, exp, arg):
pass

@abstractmethod
def visit_fn(self, exp, arg):
pass

@abstractmethod
def visit_app(self, exp, arg):
pass

class VisitorLocClassifier(Visitor):
def visit_var(self, exp, arg):
"""
>>> e0 = Var('v0')
>>> e0.accept(VisitorLocClassifier(), None)
set()

567
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

"""
return set()

def visit_num(self, exp, arg):


"""
>>> e0 = Num(0)
>>> e0.accept(VisitorLocClassifier(), None)
set()
"""
return set()

def visit_let(self, exp, arg):


"""
>>> e0 = Let('v', Num(0), Var('v'))
>>> e0.accept(VisitorLocClassifier(), None)
{'v'}

>>> e0 = Fn('w', Var('w'))


>>> e1 = Let('v', e0, e0)
>>> e1.accept(VisitorLocClassifier(), None)
{'v'}
"""
globs = exp.exp_def.accept(self, arg)
return {exp.identifier} | globs | exp.exp_body.accept(self, arg)

def visit_fn(self, exp, arg):


"""
>>> e0 = Fn('v0', Var('v0'))
>>> e0.accept(VisitorLocClassifier(), None)
set()
"""
return set()

def visit_app(self, exp, arg):


"""
>>> e0 = Fn('w', Var('w'))
>>> e1 = App(e0, Var('x'))
>>> e1.accept(VisitorLocClassifier(), None)
set()

>>> e0 = Let('v', Fn('w', Var('w')), Var('v'))


>>> e1 = App(e0, Num(0))
>>> e1.accept(VisitorLocClassifier(), None)
{'v'}

>>> e0 = Let('v', Fn('w', Var('w')), Var('v'))


>>> e1 = Let('x', Num(0), Var('x'))
>>> e2 = App(e0, e1)
>>> sorted(e2.accept(VisitorLocClassifier(), None))

568
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

['v', 'x']
"""
globs = exp.function.accept(self, arg)
return globs | exp.actual.accept(self, arg)

In our language, variables declared outside of functions are considered global variables, while
those declared inside functions are local variables. The goal of this visitor is to collect the set
of variables that are global by traversing the structure of the program (which is represented as
an Abstract Syntax Tree, or AST).

Key Concepts

1.​ Expressions: Your program is built out of expressions like:


○​ Var(identifier): Refers to a variable, e.g., x.
○​ Num(num): Represents a number, e.g., 42.
○​ Let(identifier, exp_def, exp_body): Defines a variable identifier
with value exp_def and uses it in exp_body.
○​ Fn(formal, body): Defines an anonymous function with parameter formal
and body body.
○​ App(function, actual): Represents the application of function to the
argument actual.
2.​ Global Variables: In this visitor, any variable defined in a Let that is not inside a Fn is
treated as global. Function parameters are local to the function, and they are not
considered global.

The Visitor’s Logic

The VisitorLocClassifier class implements the logic to walk through each part of the
program and collect global variables.

1.​ visit_var(self, exp, arg): For a Var, it does nothing because the classification
only cares about definitions of variables, not where they are used. So it returns an
empty set (set()).
2.​ visit_num(self, exp, arg): For a Num, since it's just a number, there are no
variables to classify. Like visit_var, it returns an empty set.
3.​ visit_let(self, exp, arg): This is the key function for classifying global
variables.
○​ A Let introduces a new variable identifier in the global scope.
○​ The visitor traverses both exp_def (the definition) and exp_body (the body
where the variable is used).
○​ The identifier introduced by the Let is added to the set of global variables.

569
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ visit_fn(self, exp, arg): A Fn (function) defines a local variable formal (the
function parameter). No variables in the function body are considered global, so this
function returns an empty set.
5.​ visit_app(self, exp, arg): For an App, the visitor checks the function being
applied and the actual argument. It collects any global variables found in these
expressions.

How It Classifies Globals

●​ When the visitor encounters a Let, it assumes that the variable is global, unless it is
inside a function. Thus, variables defined in the top-level Let expressions are global.
●​ Function parameters and variables inside a function body are local and not included in
the set of global variables.

Example Breakdown

Let’s walk through an example:

e0 = Let('v', Num(0), Var('v'))


globs = e0.accept(VisitorLocClassifier(), None)

●​ This defines v as 0 and then uses v in the body.


●​ The Let introduces the variable v, which is global since it's not inside a function.
●​ The result will be {'v'} because v is globally defined.

Conclusion

This visitor traverses the program’s structure, collects all variables that are declared in Let
expressions (and thus global), and ignores function parameters and locally scoped variables
within functions. It ensures that global variables are recognized correctly and doesn’t classify
local variables by mistake.

Why is the assumption that variables have different names necessary?


The reason your code requires all variables to have different names is because of how it
accumulates global variables in a set without distinguishing between different scopes. Let me
explain this in detail.

How the Code Collects Variables

1.​ The visitor accumulates variable names in a set, which automatically handles duplicates.
For example, if the same variable name appears multiple times, it will only be stored
once in the set.

570
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ However, the problem arises when you have shadowing—that is, the same variable
name is reused in different scopes. In your language, a variable declared in an inner
scope (such as within a function or a nested Let expression) could have the same name
as a variable declared in an outer scope. When this happens:
○​ The inner variable shadows the outer one, meaning the inner variable is the only
one visible within the inner scope.
○​ The outer variable is inaccessible while the inner one is in scope.

Example of Shadowing

Here’s an example where variables have the same name but different scopes:

let
val x = 1 (* Outer x: global *)
in
let
val x = 2 (* Inner x: local to this scope, shadows outer x *)
in
x (* Refers to the inner x *)
end
end

●​ The outer x = 1 is global.


●​ The inner x = 2 is local to the inner scope and shadows the outer x.
●​ When the visitor encounters both definitions of x, it doesn’t distinguish between the inner
and outer x. As a result, it may incorrectly classify both x variables as global.

Why Unique Names Are Assumed

The current approach doesn't account for scope properly. It simply adds every variable name
it encounters in a Let expression to the set of global variables, without checking if the variable
is shadowing an existing one.

●​ If two variables have the same name but are in different scopes, the visitor will treat
them as the same variable because they share the same name.
●​ In reality, the inner variable should be treated as local, while the outer variable remains
global. But since the visitor only tracks names, it cannot make this distinction.

Consequences of Non-Unique Names

If you have variables with the same name in different scopes, the visitor might:

1.​ Incorrectly classify local variables as global, because it cannot tell when a name is
local (shadowing an outer variable).

571
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Lose track of the original global variable if it is shadowed by a local one, since both
will be merged into a single entry in the set.

How to Handle This Problem: Scope Awareness

To handle shadowing and scope properly, you would need to:

●​ Track scopes explicitly in your visitor. When entering a new scope (like in a function or
nested Let), you would need to:
1.​ Keep track of the currently visible variables.
2.​ Distinguish between variables declared in the current scope and those declared
in outer scopes.

A typical way to do this is by using a symbol table or some form of environment that keeps
track of variables and their scopes as the visitor traverses the program. Another approach is
simply to rename all the variables, to ensure that all the definitions have a different name. That's
relatively easy to implement in a language like our dialect of SML.

Summary

●​ The assumption that all variable names are unique is necessary because the visitor
doesn’t distinguish between variables in different scopes.
●​ If variable names are reused (shadowing), the visitor will treat them as the same
variable, which can lead to misclassification.
●​ To handle this more robustly, you would need to add scope-awareness to your visitor,
which can be done using a mechanism like a symbol table.

Ok, but how can we generate code for the variables? Something that we
will need is to map these variables to memory addresses, right?
Right! And we can do that by adding this capability to our code generator, e.g.:

class Program:
...
def get_address(name):
...
def set_address(name, addr):
...

The get_address and set_address methods will be necessary for managing the memory
locations of variables during code generation. Let's break down their roles:

1. set_address(name, addr)

572
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This method is responsible for associating a variable's name (name) with a memory address
(addr). When generating code for expressions that introduce new variables (like in
visit_let), you need to allocate space for the variable, and this method stores the address
where that variable is stored (e.g., on the stack or in a static region).

Why is it necessary?

●​ Variable Storage: As your program generates code, variables need to be assigned


specific addresses in memory (such as stack offsets for local variables or fixed
addresses for global/static variables). set_address records where each variable is
stored so that other parts of the code can access this information.
●​ Scope Management: Each time a new variable is introduced (e.g., in visit_let), it
gets a new memory address. Without this method, you wouldn't have a way to track
which variable is stored where, making it impossible to generate correct code for variable
access.

2. get_address(name)

This method retrieves the memory address associated with a given variable name (name).
When generating code to access a variable (e.g., in visit_var), you need to know the
memory location of that variable, and get_address provides that information.

Why is it necessary?

●​ Variable Access: To load or store a value in a variable, you need to know its location in
memory. get_address looks up the address associated with the variable and ensures
the code can access the correct location when referring to the variable.
●​ Correct Code Generation: If you don't know where a variable is stored, you can't
generate the right instructions to manipulate it (e.g., reading it from or writing it to the
stack). This method ensures the code generator knows exactly where each variable
lives.

Example Workflow:

When you encounter a Let expression (e.g., let x = ...), visit_let allocates memory for
x, computes its address, and calls set_address("x", addr). Later, when x is referenced
(e.g., in a Var expression), visit_var calls get_address("x") to fetch its memory
address, which is used to generate the load/store instruction.

Without these methods, there would be no way to track where variables are stored, which would
make it impossible to generate correct assembly instructions for variable access.

Summary:

573
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ set_address: Records where a variable is stored in memory.


●​ get_address: Retrieves the stored memory location for a variable when needed.

These methods are essential for managing the mapping between variable names and their
corresponding memory addresses during code generation.

And how is the address of each variable determined?


Assuming that we already know if a variable is local or global, then the responsibility of
determining where each variable is located (i.e., where in memory it's stored) falls to methods
like visit_let. These methods handle the introduction of new variables and decide whether
those variables are local (stored on virtual registers) or static (with a fixed location, such as in
global memory).

Let's break down their roles in determining memory locations:

1. visit_let (for let bindings):

When a new variable is introduced via a let expression, visit_let is responsible for:

●​ Checking if the variable is local or static: This is done using the


prog.is_static_variable(exp.identifier) method. If it's static, the variable is
stored in a static memory region, but if it's local, it's stored on the stack.
●​ Allocating space: If the variable is local, visit_let adjusts the stack pointer (e.g.,
addi sp, sp, -4) to allocate space for the variable on the stack. If it's static, it
doesn't need to adjust the stack, but it will associate the variable with a static memory
address.
●​ Setting the address: After deciding where the variable is stored, visit_let calls
prog.set_address(exp.identifier, addr) to store the memory location of the
variable. The addr is the stack offset for local variables or a static address for global
variables.

The Workflow:

1.​ Let Expression:


○​ Static variable: If a variable in a let expression is static, visit_let
associates it with a fixed address in memory (e.g., a global memory address). It
then calls prog.set_address with this address.
○​ Local variable: If it's a local variable, visit_let allocates space on the stack
by adjusting the stack pointer and assigns a stack offset to the variable. It stores
this offset using prog.set_address.
2.​ Variable Access:

574
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ When the variable is later referenced in the code (via Var nodes), visit_var
retrieves its memory address using prog.get_address and generates the
appropriate load or store instruction to access it.

Why is this necessary?

Variables in programs need to be mapped to specific memory locations so that their values can
be stored and retrieved. The stack is typically used for local variables (in let bindings and
function parameters), while global/static variables are assigned fixed memory locations.
visit_let is the place where this decision-making happens.

In summary:

●​ visit_let handles allocating memory for let-bound variables and assigns them a
stack or static memory address. It uses prog.set_address to store this information so
that later access (in visit_var) can retrieve the correct memory location using
prog.get_address.

Can you recap how the code generated for a function would look like?
Sure. The figure below shows an example already seen in the last class:

In this example:

●​ The function takes a single argument x, which is passed in a0.


●​ The jump instructions ensure the function is skipped until it’s called explicitly, and the
stack is correctly restored after the function finishes execution.

575
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

But, why are we only saving the return address on the stack? What about
the local variables and the parameters that we might have in the function
body?
The reason you're only saving one slot on the stack (for the return address) during function code
generation is because the local variables aren't directly allocated space on the stack at this
stage. Instead, they are stored in what you called "virtual registers." These virtual registers
represent placeholders for values that will later be mapped to either physical registers or stack
slots during a phase called register allocation.

Here's a more detailed breakdown:

Virtual Registers

In the code generation phase, compilers typically don't immediately assign physical registers or
stack locations to variables. Instead, variables are represented as virtual registers, which
provide a flexible abstraction for storing intermediate values without worrying about physical
resources like registers or memory. Virtual registers are unlimited in number, unlike physical
registers, which are limited by the hardware.

Register Allocation

Later in the compilation process, during register allocation, the compiler analyzes the virtual
registers and decides where to store each value:

1.​ Physical Registers: If there are enough physical registers available on the machine, the
values in virtual registers can be directly mapped to physical registers, which is faster for
computation.
2.​ Stack (spill slots): If there aren’t enough physical registers, some of the values will be
"spilled" to the stack. This means that the compiler reserves space on the stack to store
certain local variables or temporary values that cannot fit in the physical registers.

Why Saving the Return Address Then?

The return address must always be saved because, when the function finishes, it needs to jump
back to where it was called from. For the local variables and intermediate values, at this stage,
they are treated as virtual registers. Their exact location (whether they end up in physical
registers or the stack) will be determined during the register allocation phase.

And why can't we save return addresses in virtual registers?


The reason we save the return address on the stack, rather than using virtual registers, has
more to do with function calling conventions and the need to handle function calls and

576
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

returns safely, especially when considering recursive calls, function pointers, and variable
lifetimes, as we had seen in our last class. Here are the key reasons:

1. Function Calling Conventions

Most computer architectures (like RISC-V) follow a defined calling convention to ensure that
function calls work reliably, especially when multiple functions call each other. This convention
dictates:

●​ Where the function argument(s) should be placed (often in specific registers or on the
stack).
●​ Where the return address is saved (usually in a specific register, like ra in RISC-V).
●​ Where the return value should be placed (typically in a register like a0 in RISC-V).

The calling convention guarantees that a function can expect its argument in a known location,
and the caller can expect the return address and result to be in known locations. This is critical
because the function needs to know where to jump back to after it completes execution. Since
functions can call other functions (or themselves recursively), the return address needs to be
reliably saved and restored.

Since the calling convention needs to be adhered to regardless of the function’s implementation,
the return address and argument are typically placed in physical registers or the stack, not
virtual registers.

How would you generate code for the let block, assuming that some
variables are statically allocated?
Here’s the refined visit_let method:

def visit_let(self, exp, arg):


# Generate code for the definition of the let variable (exp_def)
d_name = exp.exp_def.accept(self, arg)

# Determine if the variable is static or local


if self.prog.is_static_variable(exp.identifier):
# Static variable, assign a global address
addr = self.prog.get_next_static_addr(exp.identifier)
self.set_addr(exp.identifier, addr)
# Store d_name at the static address
self.prog.add_inst(AsmModule.Sw(d_name, 0, addr))
else:
# Local variable, store in virtual register
# Move value into virtual register
self.prog.add_inst(AsmModule.Addi(exp.identifier, d_name, 0))

577
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

# Now generate code for the body of the let expression


v_name = exp.exp_body.accept(self, arg)
return v_name

Key Steps in visit_let

1.​ Code for exp_def: We generate code for the definition part of the let expression first,
and d_name holds the result (a virtual register) for that value.
2.​ Static vs. Local Variable:
○​ Static variables: We are checking if the variable should be allocated in static
memory (e.g., a global variable). This part is fine because for static variables, we
would store them in a fixed location in memory.
○​ Local variables: For local variables, we opt to use a virtual register. This is a
major simplification, as it avoids allocating space on the stack when the register
allocation phase can later decide the best placement for the variable.
3.​ Code for the body (exp_body): We generate code for the body of the let expression
after handling the binding, which is exactly the correct order.

But now we have two locations for variables: either they are in virtual
registers, or they are statically allocated. How can you produce code for
variables then?
Here a new version of the visit_var method, which generates code for variables:

def visit_var(self, exp, arg):


if self.prog.is_static_variable(exp.identifier):
# If it's a static variable, load it from its
# absolute address in memory
addr = self.prog.get_address(exp.identifier)

# Load the value into a fresh virtual register


value_register = self.next_var_name()
self.prog.add_inst(AsmModule.Lw(value_register, 0, addr))
return value_register

else:
# Local variable, already in a virtual register,
# just return its register name
return exp.identifier

The code you’re looking at generates RISC-V assembly instructions for loading the value of a
variable. To understand it, let's break it down into two main cases: static variables and local
variables. Each is treated differently during code generation.

578
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Static Variables

A static variable is a variable whose memory location is fixed at a specific address in memory,
meaning it’s not stored in a register but in global memory. To access it, you need to generate
instructions that load the value from the memory address.

2. Local Variables

A local variable, on the other hand, is stored in a virtual register, which means it’s already in a
temporary storage location within the CPU. No memory load is required. The only task is to use
the variable’s virtual register directly, without needing any additional instructions to fetch it.

For local variables, the code simply returns the name of the register where the variable is
stored, because its value is already available. No memory access is needed, which makes local
variable access faster than static variables.

How the Code Works

1.​ Check if the variable is static or local:​


The method first checks if the variable is static by calling is_static_variable. If it's
static, the process of loading its value from memory begins. If not, it's a local variable,
and the method just returns its register.
2.​ For static variables:
○​ The address of the static variable is retrieved using get_address.
○​ The Lw instruction is generated to load the value from memory into a virtual
register.
○​ The method then returns the register containing the loaded value.
3.​ For local variables:​
If the variable is local, there’s no need to generate load instructions. The variable is
already in a register, so the method directly returns the register name.

Summary

This code handles variables in two ways: static variables require loading their value from
memory using load instructions (Lw), while local variables are accessed directly from a virtual
register. This differentiation allows the program to optimize access depending on how the
variable is stored.

How does the loader "know" the addresses of these static variables, the
address of the stack, the heap, etc?
This info is actually packaged into the executable object, in a representation called The
Executable and Linkable Format.

579
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. What Does an Executable Need?

To run a program, the operating system (OS) loader needs specific information about the
executable, including:

●​ Program entry point: The address where the program counter (PC) should start
execution (typically the start of the main function or initialization code).
●​ Memory layout:
○​ Where code (e.g., .text segment) should be loaded.
○​ Where initialized data (e.g., .data segment) and uninitialized data (e.g., .bss
segment) should be loaded.
○​ Where writable sections like the heap will be allocated.
●​ Dynamic linking information (if the executable depends on shared libraries, like
libc).
●​ Permissions: Whether certain memory regions are read-only, writable, or executable.

This information must be organized in a standard way so the OS can properly load the program
into memory.

2. Why Use a Specific Format Like ELF?

The ELF format is essentially a "protocol" that provides the rules and structure for packaging
this information into a binary file. When the OS loader encounters an ELF file, it knows exactly
where to look for specific details.

Key Features of ELF:

●​ ELF Header:
○​ Describes the type of file (executable, shared library, or object file).
○​ Specifies the target architecture (e.g., x86-64, ARM) and entry point address.
●​ Program Headers:
○​ Tell the loader how to map the executable's segments (e.g., .text, .data,
.bss) into memory, their sizes, and permissions.
○​ These are critical for runtime execution.
●​ Section Headers:
○​ Provide detailed information for the linker and debugger (e.g., symbolic names,
relocation entries), but are typically ignored by the loader.
●​ Dynamic Section:
○​ Contains information for the dynamic linker (e.g., library dependencies and
symbol resolution).

3. Role of the OS Loader

When you execute a program:

580
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ The OS loader reads the ELF header to verify it's an ELF file and determines what kind
of file it is.
2.​ It consults the program headers to map the segments into memory:
○​ .text into an executable region.
○​ .data into a writable region.
○​ .bss into a zero-initialized region.
3.​ It initializes the stack and heap:
○​ Stack starts at a predefined high address and grows downward.
○​ Heap starts just above the program's data and grows upward.
4.​ It sets the PC to the entry point specified in the ELF header.
5.​ If the program uses shared libraries, the loader invokes the dynamic linker (also
specified in the ELF file).

4. Why Not Just Use a Simple Binary?

Without a format like ELF, the loader wouldn't know how to:

●​ Locate and map the program’s code and data into memory.
●​ Handle initialization (e.g., zeroing the .bss segment).
●​ Link libraries or resolve symbols dynamically.
●​ Enforce memory permissions (e.g., executable code should not be writable).

ELF provides a standardized way to encode all this information, ensuring compatibility across
different systems and simplifying the design of OS loaders.

5. Analogy

Think of ELF as the blueprint or instruction manual for the OS loader:

●​ It tells the loader what to do with each part of the binary.


●​ Without this blueprint, the loader would have no way of knowing how to correctly set up
the program in memory.

This modularity and standardization make ELF (and similar formats like PE on Windows or
Mach-O on macOS) indispensable in modern computing.

Is it correct to say that the "ELF" representation is the last thing the
compiler produces?
Well, more or less. In Linux, the ELF (Executable and Linkable Format) is typically the last
representation that the compiler toolchain produces. However, let me break it down to
clarify the role of ELF and where it fits in the compilation process:

1. What Does the Compiler Produce?

581
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The compiler proper (e.g., gcc, clang) does not directly produce the final ELF
executable. Instead, it generates:
○​ Assembly Code: Textual representation of machine instructions.
○​ Object Files: Binary-encoded machine code for individual source files, typically
in ELF format.
●​ Object files are intermediate products, not yet complete executables.

2. Role of the Linker

●​ After the compiler generates object files, the linker (ld) takes over.
●​ The linker combines multiple object files, resolves external symbols (e.g., function calls
to libraries), and produces the final ELF executable file.
●​ This final ELF file includes all the necessary sections: code, data, symbol tables, and
metadata.

3. Why ELF Is the "Last Representation"

●​ Once the linker generates the ELF file, it is ready to be loaded and executed by the
operating system.
●​ At this stage:
○​ The executable includes instructions for how it should be loaded into memory
(e.g., .text, .data, and .bss sections).
○​ The OS loader reads this ELF file and maps the segments into memory for
execution.
●​ Hence, ELF is the final output of the compiler toolchain before execution.

4. Alternative Outputs

●​ While ELF is the default output format for Linux, other formats can also be used,
depending on the target system. For instance:
○​ PE (Portable Executable) is used in Windows.
○​ Mach-O is used in macOS.
○​ Flat binaries are possible (e.g., in embedded systems) but lack ELF-like
metadata.

Conclusion

Yes, in Linux, ELF is the final representation produced by the compiler toolchain, but it's
important to note that the full process involves multiple steps: the compiler generates object files
in ELF format, and the linker produces the final ELF executable. This standardized format
ensures compatibility with the Linux OS loader, making it a crucial part of the overall process.
Here's a figure illustrating the whole process:

582
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How did this ELF representation come to be?


The ELF (Executable and Linkable Format) was invented by a group of developers at UNIX
System Laboratories (USL), specifically as part of the development of System V Release 4
(SVR4), a version of UNIX released in the late 1980s. The design was influenced by earlier
executable formats but aimed to address limitations and provide a more flexible and extensible
standard.

Key Context and Timeline

1.​ Before ELF:


○​ Early UNIX systems used simple executable formats like a.out, which became
increasingly inadequate as systems grew more complex.
○​ Other formats like COFF (Common Object File Format) were introduced in
UNIX System V Release 3, but COFF had limitations in supporting modern
features like shared libraries and dynamic linking.
2.​ The Need for ELF:
○​ System V Release 4 was a major effort to unify UNIX systems, merging features
from UNIX System V, BSD, and Xenix.
○​ A new executable format was needed to provide better support for shared
libraries, dynamic linking, and portability across architectures.
3.​ Introduction of ELF:
○​ ELF was introduced in 1989 as the new standard binary format for UNIX System
V Release 4.
○​ It was designed to be extensible, cross-platform, and suitable for both
executables and object files.

Key Contributors

●​ ELF was developed under the AT&T UNIX System Laboratories (USL), a division of
AT&T responsible for advancing UNIX.
●​ While specific individuals involved in ELF’s creation are not often named, it was part of a
collaborative effort involving leading engineers at USL and partners like Sun
Microsystems, which contributed to SVR4 development.

583
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Adoption and Standardization

●​ ELF became widely adopted beyond UNIX System V:


○​ It was embraced by the Linux kernel in the mid-1990s, replacing a.out as the
default format.
○​ Other operating systems like BSDs, Solaris, and many embedded systems also
adopted ELF.
●​ The format was later standardized by the Tool Interface Standards (TIS) committee,
ensuring consistency and compatibility across systems.

Legacy

ELF’s robust design has made it the de facto standard for UNIX-like operating systems, with its
influence extending into modern computing. It is still in active use today, evolving to
accommodate new features like support for modern architectures and advanced linking
techniques.

Is it fair to say that the UNIX Operating System has then shaped the way
memory is organized today, even in non-UNIX systems?
Yes, it is absolutely fair to say that the development of UNIX has profoundly influenced the way
memory is organized in modern operating systems, including those that are not directly based
on UNIX, like Windows. Let's break this down:

1. Memory Organization Principles Originating from UNIX

UNIX introduced many foundational concepts in memory organization that are now ubiquitous,
including:

●​ Segmentation of Memory:​

○​ The division of executable memory into sections like .text (code), .data
(initialized variables), and .bss (uninitialized variables).
○​ This segmentation allows efficient memory management, separation of concerns,
and enforcement of permissions (e.g., read-only for .text).
●​ Process Address Space:​

○​ UNIX popularized the idea of a process having its own virtual address space,
divided into:
■​ Code Segment: For executable instructions.
■​ Data Segment: For global and static variables.
■​ Heap: For dynamically allocated memory.
■​ Stack: For function call frames and local variables.

584
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Dynamic Linking and Shared Libraries:​

○​ Concepts like ld.so and shared libraries were refined in UNIX, enabling
modular applications and efficient memory use through shared code.

2. Influence on Other Operating Systems

While some systems are not UNIX-based, they have adopted or adapted many of its memory
management principles due to their effectiveness.

Windows

●​ Segmented Memory:​

○​ Windows PE (Portable Executable) format divides memory into sections similar


to ELF, such as .text, .data, and .bss-like regions, with specific permissions
and roles.
○​ This approach was influenced by similar practices in UNIX.
●​ Virtual Memory:​

○​ Windows adopts virtual memory and process isolation in ways conceptually


similar to UNIX. Each process in Windows has its own virtual address space,
segmented into areas for code, data, heap, and stack.
●​ Dynamic Linking:​

○​ Windows' DLL (Dynamic-Link Library) mechanism parallels UNIX's shared


libraries, though implemented differently.

Other Operating Systems (e.g., macOS, Embedded Systems, etc.)

●​ macOS uses the Mach-O binary format, but its memory organization principles are
largely derived from UNIX, since macOS is based on Darwin, a UNIX-like OS.
●​ Embedded operating systems often adopt simplified versions of these concepts, such as
distinct sections for code and data, because the modular design is versatile even for
constrained systems.

3. Broader Legacy of UNIX

UNIX's influence extends beyond its specific implementation:

●​ Standardization:
○​ The POSIX standard, derived from UNIX, defines memory-related APIs (e.g.,
mmap, brk, sbrk) that are used across many operating systems.
●​ Interoperability:

585
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Many programming tools (compilers, linkers, debuggers) and languages are


designed with UNIX-like systems in mind, shaping how memory is laid out
regardless of the underlying OS.

4. Divergence in Windows

While Windows has adopted many concepts pioneered by UNIX, there are key differences:

●​ PE Format:
○​ Unlike ELF, Windows' PE format includes additional constructs for features like
delay-loading of libraries and COM components.
●​ Registry-Based Configuration:
○​ Windows relies more on the Registry for dynamic linking configurations, whereas
UNIX systems often use configuration files and environment variables.

5. Conclusion

UNIX played a foundational role in shaping modern memory organization. Its principles of
memory segmentation, process isolation, and dynamic linking have become best practices in
operating system design, influencing even systems like Windows that are architecturally distinct.
This widespread adoption underscores the robustness and utility of concepts developed during
the evolution of UNIX.

So, do we owe much of memory organization to Ken Thompson's work?


Yes, Ken Thompson played a very important role in shaping how memory is organized today,
as he was one of the original creators of UNIX. His work laid the groundwork for many of the
concepts in memory organization that have become standard across operating systems.

586
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. The UNIX Operating System

Ken Thompson, along with Dennis Ritchie, created UNIX in the late 1960s at Bell Labs. Many
foundational ideas in modern memory management were first implemented or refined in UNIX,
including:

Process Memory Layout:

●​ UNIX introduced the idea of a process address space with distinct sections:
○​ Code Segment: Executable instructions.
○​ Data Segment: Global and static variables.
○​ Stack Segment: For function calls and local variables.
○​ This segmentation allowed better memory protection and management,
influencing how later systems, including Linux, macOS, and even Windows,
manage process memory.

File and Process Abstraction:

●​ Thompson's minimalist design philosophy led to simple and extensible abstractions,


including how executables are loaded into memory:
○​ The UNIX loader (exec) interprets executable files, maps memory segments
(e.g., .text, .data), and starts execution from the program's entry point.
○​ These ideas evolved into formats like a.out, and later ELF, which are used to
describe how memory should be organized for executables.

587
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Tools and Philosophies That Shaped Memory Management

Ken Thompson also contributed to the tools and philosophies that underpin memory
organization:

●​ Simplicity and Modularity:


○​ Thompson emphasized building small, composable utilities. This approach
influenced how executable formats were designed to describe memory layout
simply and flexibly.
●​ First Linkers and Loaders:
○​ The early UNIX linker and loader were part of the foundation for modern
compilers and loaders, which handle memory layout based on executable
formats.

3. Influence Beyond UNIX

Ken Thompson's work indirectly influenced systems that were not UNIX-based, including
Windows:

●​ The segmentation of memory into code, data, and stack regions became a standard
that even Windows followed in its PE format.
●​ His development of dynamic linking concepts (via shared libraries) informed how
modern systems load and manage memory for shared code.

4. Specific Memory Innovations Linked to Thompson

While Thompson did not invent all aspects of memory management, his work shaped how these
ideas were implemented:

●​ The concept of .bss for uninitialized variables originates from early UNIX.
●​ The UNIX exec system call, which organizes memory for processes during execution,
was designed under his direction.

5. Legacy

Ken Thompson's legacy is foundational to modern computing:

●​ His work on UNIX not only influenced memory organization directly but also set a design
ethos that carried through to other systems.
●​ Memory segmentation and executable loading mechanisms have evolved, but they still
echo the principles established in the original UNIX.

Conclusion

588
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Ken Thompson's contributions were fundamental in defining the memory organization principles
used today. While he may not have directly worked on modern formats like ELF or the Windows
PE format, his work on UNIX established the groundwork for concepts like memory
segmentation, process isolation, and dynamic linking—concepts that persist in almost every
modern operating system.

589
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Pointers and Aggregate Types


Code: https://github.com/pronesto/DCC053_Public/tree/main/18_Aggregates

In our class on let-bindings we saw the semantics of a language with


mutable variables. Could you add pointers to this language?
Below we have the original semantic rules:

To augment the given operational semantics with pointers, we can add rules that handle taking
the address of variables (&v) and dereferencing pointers (!v). Here's how you can extend the
semantics in your figure:

1. Taking the address of a variable (&v):

This operation should return the memory location (loc) of a variable.

●​ The rule EAdr states that, given the context C mapping the variable name to location
loc, taking the address of name results in the memory location loc.

2. Dereferencing a pointer (!v):

Dereferencing a pointer v means retrieving the value stored at a memory location. Notice that
we are using the SML/NJ syntax to dereference variables (with an exclamation mark).

590
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The rule EPtr states that, if S stores the value v at memory location loc, dereferencing
loc (i.e., !loc) will retrieve v.

Can you add new types of nodes to our abstract syntax tree, to represent
the operations of taking the address and dereferencing variables?
To add the two new operations, address of (EAdr in the figure) and dereference (EDrf), we can
define two new classes that represent these operations and update the visitor to handle them.
Here’s how to extend your code to include these operations:

Notice that in this implementation, we are keeping the context and the memory (the store) as
internal states of the visitor object, instead of passing them as arguments of the visit method.
This design choice has both pros and cons, and which one is "better" often depends on the

591
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

specific context, requirements, and programming style. Let's break down some key
considerations:

Benefits of Storing State in the Visitor Object (Object-Oriented Approach)

1.​ Simplified Method Signatures: By encapsulating context and store as part of the
visitor's state, we no longer need to pass them explicitly to each visit method, resulting in
cleaner and shorter method signatures.
2.​ Readability and Maintainability: Having the context and store as internal state of
the visitor can make the code more readable, especially in larger programs. Developers
can see that these values are shared across methods, and there's less
parameter-passing noise in the code.
3.​ Encapsulation: Encapsulating state within the visitor keeps the state management more
contained. This reduces the risk of accidentally altering the context or store outside
of the visitor’s logic, which is helpful for maintaining consistency in state-dependent
computations.
4.​ Easier to Extend: Adding new methods that use context and store becomes
simpler, as you don't need to keep passing these around. It can also facilitate adding
more state-tracking features to the visitor, such as logging or tracking the evaluation
steps.
5.​ Statefulness: If you want to perform a sequence of evaluations where each evaluation
builds on the previous state, this approach naturally supports it. For example, in an
interpreter, it makes sense for the visitor to remember previous variable bindings and
memory updates across multiple evaluation calls.

Downsides of Storing State in the Visitor (Compared to Functional Passing)

1.​ Reduced Transparency: In functional design, passing context and store explicitly
makes data flow very transparent. Anyone reading the code can easily trace how state is
passed and modified at each step. In the object-oriented approach, the state is "hidden"
within the object, which can sometimes make it harder to track changes in state during
evaluation.
2.​ Less Pure / Functional: Functional programming favors immutability and explicit
state-passing, which can help avoid unintended side effects. By keeping context and
store inside the visitor, the design becomes less functional, potentially introducing side
effects if methods modify the state in unexpected ways.
3.​ Harder to Reuse Visitor for Multiple Evaluations: If you need to evaluate multiple
independent expressions in parallel or in a multi-threaded context, this approach can be
less flexible. Since the visitor holds the state, you'd need to create separate visitor
instances for each independent evaluation or reset the state each time. The functional
design, by contrast, allows for easy parallelism by passing explicit, independent states.
4.​ Potential for Hidden Dependencies: When state is kept internally, it can become a
hidden dependency. If you need to pass in an initial context or store or retrieve the final

592
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

state, it requires additional methods or properties on the visitor, which could add
complexity.

Which Approach to Choose?

There isn’t a definitive answer, as both designs have their use cases. Generally:

●​ If state persistence across multiple operations is important (like in an interpreter or


REPL), the object-oriented approach can be advantageous. It lets you naturally
accumulate and carry forward changes in state, keeping the code organized and easier
to extend.
●​ If statelessness, parallelism, or functional purity is a priority (like in a compiler pass
or batch evaluator), the functional approach with explicit state-passing is likely more
suitable. It promotes transparency, allows for easy debugging, and reduces side effects,
making it easier to reason about the evaluation.

In this particular example, the object-oriented design seems appropriate because:

1.​ Interpreter-style evaluation: You’re evaluating expressions in a way that may involve
updating variables and dereferencing locations in memory. Keeping context and
store as part of the visitor’s internal state aligns with the dynamic nature of such
evaluations.
2.​ Flexible memory model: Since you're managing a simulated memory with locations
and values, the object-oriented design supports this abstraction naturally. Memory
updates feel more intuitive when encapsulated as internal state rather than being passed
around.
3.​ Readability: Your code has cleaner method signatures without repeated state
arguments. This makes the evaluator’s purpose clear, which can improve readability in
larger projects.

Ultimately, it’s a trade-off. The object-oriented approach may be better for this setting, but if you
encounter situations where independent evaluations are needed or debugging state flow
becomes difficult, revisiting a functional design might be beneficial.

Can you create some tests for our language with pointers?
Sure, see some tests in the table below:

let x = 10 in x end let x = 1 in x := 2 end

>>> e0 = Let("x", Num(10), Var("x")) >>> e = Let("x", Num(1), Assign(Var("x"), Num(2)))


>>> v = EvalVisitor() >>> v = EvalVisitor()
>>> e0.accept(v, None) >>> e.accept(v, None)
10 2

let x = 42 in &x end let x = 1 in let y = &x in !y end end

593
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> e = Let("x", Num(42), AddressOf("x")) >>> e0 = Let("y", AddressOf("x"), Dereference(Var("y")))


>>> v = EvalVisitor() >>> e1 = Let("x", Num(1), e0)
>>> e.accept(v, None) >>> v = EvalVisitor()
0 >>> e1.accept(v, None)
1

!(let x = 42 in &x end) let x = 1 in let y = 2 in &x := y end + x end

>>> e = Dereference(Let("x", Num(42), AddressOf("x"))) >>> e0 = Let("y", Num(2), Assign(AddressOf("x"),


>>> v = EvalVisitor() Var("y")))
>>> e.accept(v, None) >>> e1 = Let("x", Num(1), Add(e0, Var("x")))
42 >>> v = EvalVisitor()
>>> e1.accept(v, None)
4

Pointers in our language are just numbers; hence, could we have


assignments like 0 := y? Would that be a "memory violation"?
Yes, we can have this kind of assignment in our language, e.g.:

let x = 1 in let y = 2 in 0 := y end + x end


>>> e0 = Let("y", Num(2), Assign(Num(0), Var("y")))
>>> e1 = Let("x", Num(1), Add(e0, Var("x")))
>>> v = EvalVisitor()
>>> e1.accept(v, None)

And it would be very easy to have memory violations in this way, just like we have them in C or
in C++. For instance, consider the program below:

#include <stdio.h>

void function() {
long v[1];
v[3] = v[3] + 12;
}

int main() {
int x;
x = 13;
function();
x++;
printf("%d\n", x);
}

This program might skip the increment of variable x in the main function, as the figure below
shows:

594
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explanation of the Behavior

1.​ Out-of-Bounds Access: In function(), the array v is declared with only one element
(v[0]). When you access v[3], you’re going out of bounds, which is undefined
behavior in C. The C standard does not enforce bounds-checking on arrays, so
accessing v[3] might access or modify memory that happens to be nearby, such as the
memory location of x in main.
2.​ Skipping x++: On certain systems and compilers, the v[3] access in function might
actually modify x in main if they happen to be located close together on the stack. When
v[3] = v[3] + 12 executes, it could inadvertently overwrite the value of x before the
increment operation (x++). As a result, x++ might not behave as expected because x
has already been altered by function.
3.​ Platform-Dependent Behavior: This issue might not be immediately apparent on every
system or compiler, as it depends on how memory is laid out in each environment. Some
platforms, like OSX on ARM64 (e.g., Apple M1), might be more susceptible to this
specific behavior due to differences in memory alignment or stack organization.

Comparison to Our Python Implementation

This is indeed similar to what can happen in the Python evaluator with pointers. If you
accidentally assign to an unintended memory location (e.g., 0 := y in your evaluator), you
might overwrite a different part of the program’s state, leading to unpredictable results.

Why is it possible to have this kind of behavior in C, but not in other


programming languages like Java?
One of the reasons C allows this kind of behavior is because its type system and memory model
offer direct access to memory through pointers, and it permits pointer arithmetic (adding an
offset to a pointer to compute a new address). This design is powerful for systems programming
but also risky because it allows low-level memory manipulation that can easily lead to errors like
buffer overflows, segmentation faults, or undefined behavior.

595
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In contrast, languages like Java do not allow direct manipulation of memory addresses. Java’s
memory model abstracts away pointers and direct memory access, preventing programmers
from obtaining and manipulating raw memory addresses. In Java:

1.​ References vs. Pointers: Java uses references instead of pointers. A reference is an
abstract handle to an object in memory, but you cannot retrieve the actual memory
address or manipulate it directly. References in Java are "opaque" — you can’t perform
arithmetic on them or access memory at arbitrary locations.
2.​ Type Safety: Java’s type system is designed to prevent accessing out-of-bounds
memory or misusing object references. You cannot, for example, cast an integer into a
reference or add an integer offset to a reference, as you can with pointers in C.
3.​ Automatic Memory Management: Java has garbage collection, which further abstracts
memory management from the programmer. In C, you’re responsible for managing
memory (allocating and freeing it), while in Java, the runtime environment handles
memory allocation and deallocation.
4.​ Strict Array Bounds Checking: Java arrays are objects, and accessing an element
outside the array’s bounds results in an ArrayIndexOutOfBoundsException. This
bounds checking is enforced at runtime, preventing the kind of out-of-bounds access
seen in C.

Why This Difference Matters

The strict separation between integers and references in Java's type system, along with the
absence of pointer arithmetic, prevents the kind of vulnerabilities common in C programs. In C,
the ability to manipulate pointers and treat integers as addresses (or vice versa) gives
programmers direct control over memory, but it also makes the language more prone to errors
and security risks.

In C, you might write code like:

int* ptr = (int*)0x1000; // Arbitrary address


*ptr = 42; // Undefined behavior: writing to an arbitrary address

In Java, there is no way to directly translate this code because you cannot manipulate raw
memory addresses, nor can you typecast integers into references. This restriction makes Java
safer by design, as it prevents many memory-related issues at the cost of the low-level control
that C provides.

Pros and Cons

●​ C’s Flexibility: C’s design allows for high-performance code and is well-suited for
low-level programming, like operating systems and embedded systems, where you need
fine-grained control over memory. However, this flexibility requires careful programming
to avoid errors and security vulnerabilities.

596
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Java’s Safety: Java’s restrictions make it easier to avoid memory errors, which is why
it’s popular for application development, particularly in environments where safety is
essential. However, the lack of direct memory access can limit performance
optimizations and prevent certain low-level operations that are possible in C.

Conclusion

So, yes — Java’s type system and memory model are designed specifically to prevent the kind
of direct memory manipulation that’s common in C. This difference is intentional and contributes
to Java's safety and reliability at the cost of some flexibility and performance.

Well, it seems that we have just added C-style arrays to the language, as in
let x = 1 in &(x)+1 := 2 end?
Yes, you've created a mechanism that resembles C-style arrays! By allowing expressions like
&(x) + 1 := 2, you've essentially enabled pointer arithmetic, where &(x) (the address of x)
can be treated as a base address, and adding an offset (+ 1) simulates accessing an array
element at that offset.

In C, arrays and pointers are closely related, as demonstrated by the program below:

#include <stdio.h>
int main() {
int v[3];
v[2] = 3;
printf("%d\n", v[2]);
*(v + 2) = 5;
printf("%d\n", v[2]);
*(2 + v) = 7;
printf("%d\n", v[2]);
2[v] = 11;
printf("%d\n", v[2]);
}

Here’s how your code achieves similar functionality to C arrays:

1.​ Pointer Arithmetic: In C, v[2], *(v + 2), *(2 + v), and 2[v] are all equivalent
ways to access the third element of the array v (arrays are zero-indexed). This works
because in C, v[i] is defined as *(v + i). The array name v itself acts as a pointer
to the start of the array, and pointer arithmetic allows moving to different indices.
2.​ Offset Simulation: In your language, &(x) + 1 := 2 does something similar by
allowing addition on the address of a variable (x), so the result &(x) + 1 represents an
offset address. The assignment := 2 then writes 2 to this computed address, just as
v[2] = 3 writes to the third element in a C array.

597
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Indexing Notation Flexibility: In C, different syntaxes like 2[v] are valid because
a[b] is actually syntactic sugar for *(a + b). This flexibility comes from the fact that
a[b] and b[a] both translate to pointer arithmetic, and C treats both expressions as
equivalent. This dual interpretation allows quirky syntax like 2[v], even though it’s rarely
used.

Similarities to Arrays

By combining address-of (&) and addition, your language now supports what is effectively array
indexing, as in C. Here’s how each part maps:

●​ Base Address: &(x) functions like a base pointer or starting address of an "array."
●​ Offset Access: The addition &(x) + 1 allows moving to an offset, simulating x[1] or
an array-like structure.
●​ Assignment at Offset: Assign(Add(AddressOf("x"), Num(1)), Num(2))
writes 2 to the computed offset, just like assigning a value to an array index.

Limitations Compared to Real Arrays

While this approach gives you a limited form of arrays through pointer arithmetic, it does lack
some of the full functionality of true arrays, such as:

●​ Bounds Checking: In C, as in your language, there's no inherent bounds checking, so


you can easily access out-of-bounds memory by specifying a large offset, leading to
undefined behavior.
●​ Contiguous Memory: In C, arrays are contiguous memory blocks by definition, but in
your setup, you are merely working with pointers and offsets. There’s no guarantee that
subsequent variables in memory would behave like adjacent elements in an array.

Could you generate code for these new constructs we've added to our
language?

To extend your CodeGenVisitor to handle AddressOf, Assign, and Dereference


constructs, we need to add support for the corresponding RISC-V instructions, specifically sw
(store word) and lw (load word), as well as the logic for generating memory addresses.

Here's a breakdown of how to approach each construct in your visitor:

1.​ AddressOf: This construct should generate the address of a variable. In RISC-V,
addresses are usually handled with pointers in memory. Here, we'll simulate this by
mapping variables to memory addresses within the program's environment.

598
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Assign: This construct will store a value in the memory address specified by an address
expression. It will use the sw instruction to store a value into memory at a computed
address.
3.​ Dereference: This construct will retrieve the value from a memory address. It will use
the lw instruction to load a value from memory at a specified address.

Here’s how you could implement these methods in the CodeGenVisitor class:

Additional Assumptions

●​ prog.new_address is assumed to allocate a new memory address for a variable and


manage variable-to-address mappings.
●​ We assume prog.get_address(variable_name) returns the memory address
associated with a variable.
●​ The sw and lw instructions use a base register and an offset. Here, the offset is set to 0
for simplicity, as we’re directly accessing the specified address.

To add support for variable address allocation and retrieval, we can introduce new_address
and get_address methods in the Program class. These methods will help manage memory
addresses for variables in the program environment. Here’s how we could implement them:

1.​ new_address: This method will allocate a new memory address for each variable in the
program. We can use a dictionary (self.__address_map) to store variable-to-address

599
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

mappings. Every time we call new_address, it will assign a unique address to a


variable and update a counter (self.__next_free_address) to track the next
available memory address.
2.​ get_address: This method will retrieve the previously assigned memory address for a
given variable.

In this new code generator, we still have virtual registers, but they are
mostly temporaries created during code generation. Every variable is now
mapped to memory, right?
Yes, that's exactly right! In this setup, variables are treated similarly to how LLVM handles
"alloca" instructions for stack-allocated variables. By mapping variables directly to memory
addresses, we avoid needing to assign registers to persistent variables; instead, we use virtual
registers as temporary placeholders during code generation.

Here's a closer comparison:

1.​ Memory-Mapped Variables: Each variable gets a unique memory address as soon as
it's defined. This mirrors LLVM's approach when we take the address of a variable in
languages like C, where the variable must reside in memory to support address-based
operations. This setup is particularly useful for languages or scenarios where variables
might persist beyond the scope of a single operation or where we need to support
address-based manipulations.
2.​ Temporary Registers: Virtual registers remain for temporary values that are only
needed briefly during instruction evaluation. These temporary registers help to simplify
expression evaluation and intermediate calculations before moving results to memory if
needed. In LLVM, these would correspond to SSA (Single Static Assignment) values,
which only exist during a particular stage in code generation and don’t persist in memory.

This architecture allows you to manage both transient values efficiently with registers and more
persistent variables using memory, striking a balance between register-based operations and
memory management. It does resemble LLVM's strategy, where values are assigned to
registers or stack-allocated in memory depending on their usage.

Then, with base address plus offsets, we can implement O(1) access to
continuous arrays, right?
Yes, you're correct! Every access in the last line of this program is O(1) because of the way
arrays are laid out in memory with base addresses and fixed offsets:

int static_array() {
int x[4] = {0, 1, 2, 3};
int y[3][2] = {{0, 1}, {2, 3}, {4, 5}};
int z[4][3][2] = {{{0, 1}, {2, 3}, {4, 5}},

600
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

{{0, 1}, {2, 3}, {4, 5}},


{{0, 1}, {2, 3}, {4, 5}},
{{0, 1}, {2, 3}, {4, 5}}};
return x[2] + y[2][1] + z[3][2][1];
}

Here's how array accesses work:

1.​ Base Address + Offset Calculation: For any statically-sized, multi-dimensional array in
C, memory is allocated as a single contiguous block. This layout allows each element to
be accessed in O(1) time by calculating an offset from the base address. The compiler
computes this offset based on the array's dimensions and the index values, so there's no
need to traverse the array to find elements.
2.​ Accessing Elements in Constant Time:
○​ For x[2], the compiler calculates the memory address by adding 2×sizeof(int) to
the base address of x.
○​ For y[2][1], it calculates the address as: base of y + (2×2+1) × sizeof(int).
Here, 2 and 1 are the indices, and 2 is the second dimension size of y.
○​ For z[3][2][1], the address is calculated by: base of z + (3×3×2+2×2+1) ×
sizeof(int) where 3, 2, and 1 are the indices, and 4, 3, and 2 are the sizes of
each dimension in z.
3.​ Efficiency with Contiguous Memory: Since all elements are stored contiguously, each
element's position is computed directly, making all accesses O(1). This efficiency in
memory access is one reason why array indexing in C is so fast.

In summary, the base address plus offset strategy allows each element access to be
constant-time, making operations like those in x[2] + y[2][1] + z[3][2][1] O(1).

Can you give me a general expression to access the elements of arrays?


In a multidimensional array, the offset for an element is calculated based on the array's
dimensions and the indices provided for each dimension. Here’s the general formula to access
an element in a multidimensional array:

General Formula for Element Access

For an n-dimensional array A with dimensions d1,d2,…,dn, the memory address of the element
at indices [i1,i2,…,in] is given by:

601
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

where:

●​ Base Address(A) is the starting memory address of the array A.


●​ ik​is the index along the k-th dimension.
●​ dk is the size (number of elements) in the k-th dimension.
●​ Element Size is the size in bytes of each element in the array (e.g., 4 bytes for int on
many systems).

Explanation and Example

This formula works by "flattening" the multi-dimensional array into a single contiguous block in
memory. The innermost dimension contributes the smallest increments, while the outermost
dimension contributes the largest.

Example with a 3D Array

Consider a 3D array A with dimensions d1=4, d2=3, and d3=2 (i.e., int A[4][3][2] in C).
Let’s find the address of A[i1][i2][i3].

The address calculation for A[i1][i2][i3] would be:

Address(A[i1][i2][i3])=Base Address(A)+(i1×(d2×d3)+i2×d3+i3)×Element Size

In this case, each index contributes its part:

●​ i1​is scaled by d2×d3​,


●​ i2​is scaled by d3, and
●​ i3​contributes directly.

This formula can be applied to any dimension count, making it versatile for accessing elements
in any multidimensional array stored in a contiguous block of memory. The figure below
illustrates how the formula is applied on different array shapes:

602
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you show how the indices are computed using an actual assembly
file?
Sure! Let's consider the C file below, and the corresponding ARM assembly file produced by
clang -O1:

603
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This ARM assembly file demonstrates the computation of element addresses in a


multidimensional array by using the formula we discussed earlier. Let’s walk through the code
line by line to see how it’s accessing the elements x[a], y[b][c], and z[d][e][f].

Breakdown of the Assembly Code

The function static_array receives several parameters, including dimensions A, B, and C


and the indices a, b, c, d, e, and f to access elements in arrays x, y, and z.

The parameters are passed on the stack, and the assembly code performs calculations to
access the correct memory locations for x[a], y[b][c], and z[d][e][f].

1. Setup and Parameter Loading


ldpsw x9, x8, [sp, #8] ; Load d and e from stack into x9 and x8
ldpsw x11, x10, [sp] ; Load f and c from stack into x11 and x10
mov w12, w1 ; w12 = B (second dimension size)
mov w13, w2 ; w13 = C (third dimension size)

The dimensions A, B, C, and indices a, b, c, d, e, f are loaded from the stack into registers.

2. Accessing x[a]

ldr w14, [x3, w6, sxtw #2] ; w14 = x[a], using x3 (base address of x) and w6 (a)

●​ This line uses the base address of x in x3 and the index a (stored in w6).

604
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The sxtw #2 operation scales the index by 4 bytes (assuming int is 4 bytes) to
calculate the byte offset.
●​ The element x[a] is loaded into w14.

3. Accessing y[b][c]

sxtw x15, w7 ; x15 = b (sign-extend b to 64-bit)


mul x12, x15, x12 ; x12 = b * B
add x12, x4, x12, lsl #2 ; x12 = &y[b][0]
ldr w11, [x12, x11, lsl #2] ; w11 = y[b][c]

●​ Here, b is multiplied by B to calculate the row offset in the 2D array y.


●​ The result is added to the base address of y (stored in x4) to get &y[b][0].
●​ c (stored in x11) is then used to index into this row, giving us y[b][c], which is loaded
into w11.

4. Accessing z[d][e][f]

umull x12, w2, w1 ; x12 = B * C (size of each layer in z)


mul x10, x12, x10 ; x10 = d * B * C
add x10, x5, x10, lsl #2 ; x10 = &z[d][0][0]
mul x9, x9, x13 ; x9 = e * C
add x9, x10, x9, lsl #2 ; x9 = &z[d][e][0]
ldr w8, [x9, x8, lsl #2] ; w8 = z[d][e][f]

●​ This sequence calculates the address of z[d][e][f].


○​ First, B * C is computed and stored in x12.
○​ d is multiplied by B * C to compute the offset for d layers.
○​ Adding this to x5 (the base of z) with the scaled offset gives &z[d][0][0].
○​ Then, e is multiplied by C and added to obtain &z[d][e][0].
○​ Finally, f (scaled by 4 bytes) indexes into this sub-array, giving z[d][e][f] in
w8.

5. Final Computation

add w0, w11, w8 ; Add y[b][c] and z[d][e][f]


add w0, w0, w14 ; Add x[a]
ret

●​ y[b][c], z[d][e][f], and x[a] are summed up and stored in w0, which is the return
register for the result.

Observing the Array Formula in Action

605
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Each access follows the general formula we discussed, where indices are multiplied by the
sizes of the sub-dimensions to calculate their byte offsets within the memory. This structure
allows each access to be performed in constant time, with multiplications and additions that are
derived from the array dimensions.

Summary of Access Patterns

●​ 1D array x[a]: Offset is simply a * sizeof(int).


●​ 2D array y[b][c]: Offset is b * B + c, scaled by sizeof(int).
●​ 3D array z[d][e][f]: Offset is d * (B * C) + e * C + f, scaled by
sizeof(int).

This example highlights how each array element access in C can be efficiently compiled into
ARM assembly, using address arithmetic based on the array dimensions and index values.

All that assumes that elements in each row are stored consecutively in
memory. Is that really the case?
Yes, C does store arrays in a row-major order, meaning that elements in each row are stored
consecutively in memory. In row-major order, an entire row is laid out in memory before moving
to the next row. For example, a 2D array y[A][B] in C is stored such that y[0][0], y[0][1],
..., y[0][B-1] are followed by y[1][0], y[1][1], and so on. This ordering is a convention in
C and C-derived languages (like C++, Java, and Python for their internal implementations).

Column-Major Order

In contrast, column-major order stores elements in each column consecutively. So, in a 2D array
y[A][B], y[0][0], y[1][0], ..., y[A-1][0] would be stored first, followed by y[0][1],
y[1][1], and so on. This arrangement is common in languages where mathematical and
scientific computing is prioritized. The figure below will illustrate the difference:

Languages That Use Column-Major Order

●​ Fortran: Fortran, widely used in scientific computing, stores arrays in column-major


order. This makes it more efficient for operations that involve columns, which are
common in scientific and mathematical applications.

606
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ MATLAB: MATLAB also uses column-major order, as it’s geared toward matrix
operations where column-oriented access patterns are often more efficient.

Why Different Orders?

Row-major vs. column-major ordering can affect performance based on the type of
computations. For example:

●​ Row-major order is advantageous in languages like C because it aligns well with


common memory access patterns for iterative operations, especially when looping
through each element in a row.
●​ Column-major order is beneficial for operations on columns, as it reduces cache misses
when accessing entire columns.

In languages like Fortran or MATLAB, where matrix and linear algebra operations are common,
column-major order is preferred because it optimizes memory access patterns for these
applications.

So, that's why the order that we use to iterate over arrays is so important,
right? I suppose that has an impact on cache locality.
That's correct! The program below illustrates this difference:

607
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The difference in timing between row-major and column-major summation in your program is
due to the way data is accessed in memory and how it interacts with the cache.

Why Row-Major is Faster

In C, arrays are stored in row-major order, meaning that elements in each row are stored
consecutively in memory. When you iterate over the array in row-major order (with the outer
loop iterating over rows and the inner loop iterating over columns), you're accessing elements in
the same row consecutively, which aligns with how the data is stored in memory.

Because modern CPUs load data from memory in blocks (cache lines), accessing memory
sequentially in the same row means fewer cache misses. The CPU can load an entire cache
line containing multiple elements at once, and as long as the accesses remain within that line,
the elements are readily available in cache.

Why Column-Major is Slower

608
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

When you switch to column-major order (with the outer loop iterating over columns and the
inner loop iterating over rows), you're effectively jumping from one row to the next for each
element in the column. This results in accessing memory locations that are not contiguous,
causing frequent cache misses. Every time you move to a new row in the column, the CPU has
to load a new cache line, leading to more memory accesses and slower performance.

Explanation of the Timing Differences

In your example:

●​ Row-major order (first case) takes about 0.003 seconds.


●​ Column-major order (second case) takes about 0.039 seconds, which is significantly
slower.

This difference is due to two factors. First, the increased number of cache misses in the
column-major traversal, where the CPU has to fetch data from main memory more often.
Second, the possibility of vectorizing the code in the row-major version of the program. We shall
talk a bit about vectorization in our class about compiler optimizations.

General Advice for Performance

When working with large arrays in languages that use row-major order (like C), accessing data
in a row-major pattern (inner loop over columns, outer loop over rows) is typically much faster
due to better cache utilization. In contrast, languages like Fortran, which store arrays in
column-major order, would benefit from column-major access patterns. Understanding the
memory layout and access patterns in your chosen language can have a substantial impact on
performance, especially for large data sets.

But even if we change the order of the i, j, k loops we observe a


performance difference in the matrix multiplication below. Why?
Yes, there is a performance difference between the two versions of matrix multiplication in the
program below:

609
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The difference in performance between the IKJ and IJK loop orders likely arises from how
each ordering affects memory access patterns, which can influence cache efficiency.

Here's a breakdown of why IKJ might be faster than IJK:

1.​ Cache Efficiency:


○​ In the IKJ order, each element of C[i][j] is computed by accumulating
products for each k without changing the i or j indices until the calculation is
complete.
○​ This means that C[i][j] is repeatedly accessed in the innermost loop, so it
stays in the cache longer, reducing memory access time for that element.
○​ In contrast, the IJK order involves accessing B[k][j] in the innermost loop,
which could lead to less efficient cache usage since it requires fetching new
columns of B more frequently.
2.​ Matrix Layout in Memory:
○​ In C, matrices are stored in row-major order, meaning consecutive elements in
the same row are stored adjacently in memory.
○​ The IKJ order benefits from this layout because it accesses elements of A
(which are in rows) in a more sequential manner.

610
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Accessing elements in A row by row, as done in IKJ, can minimize cache misses
compared to IJK, which might access elements in a less contiguous manner
depending on the size of Y and the row layout.
3.​ Loop Interchange Optimization:
○​ Some loop orders naturally lend themselves to compiler optimizations. The IKJ
loop order may better align with optimizations that the compiler can apply, leading
to faster execution times.
4.​ Dependency Patterns:
○​ By accessing C[i][j] in the innermost loop, IKJ minimizes dependencies on
other memory locations outside of the current calculation. In IJK, the
accumulation in C[i][j] happens after the innermost loop completes, which
might introduce slight inefficiencies.

In summary, IKJ is generally more cache-friendly, particularly for large matrices, because it
minimizes cache misses by reusing C[i][j] and accessing A in a sequential memory order.
This could explain why it performs faster than the IJK ordering in your tests.

Can compilers interchange loops automatically?


Yes, they can, at least in theory :) But not many compilers do it. For instance, LLVM, in its
normal distribution, will not interchange loops. Yet, it's possible to do it using Polly, which is an
extension of LLVM, or with Pluto, which is a polyhedral-based optimizer.

Changing the order of loops, as in switching from i, j, k to i, k, j, is indeed a classic


compiler optimization called loop interchange. This optimization is primarily used to improve
cache locality and data access patterns, which can significantly enhance performance in
memory-bound operations, such as matrix multiplication.

When Can We Use Loop Interchange?

Loop interchange is not always safe and can only be applied when certain conditions are met.
Specifically, we can interchange loops only if it does not alter the semantics (i.e., the result) of
the code. The main considerations are:

1.​ Data Dependencies:


○​ The loops must be independent in terms of data access. If changing the loop
order introduces or violates a dependency between iterations, the result of the
computation might change, making the optimization incorrect.
○​ For example, if one iteration of a loop writes to a variable that a subsequent
iteration reads, interchanging the loops would violate the dependency, leading to
incorrect results.
○​ Dependencies can be true dependencies (also called "flow dependencies"),
anti-dependencies, or output dependencies.

611
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Array Access Patterns:


○​ For loop interchange to be beneficial, we want to rearrange the loops to access
memory in a more cache-friendly way, ideally with stride-1 access patterns. If the
innermost loop iterates over contiguous elements in memory, this can
significantly reduce cache misses.
○​ Interchanging loops can sometimes hurt performance if it introduces
non-contiguous memory access in the innermost loop.
3.​ Loop Bounds:
○​ If the loop bounds are dependent on each other, loop interchange may not be
feasible. For example, if the inner loop’s range depends on the outer loop’s index
(like for j = 0; j < i; j++), then interchanging them would change the
loop limits and result in incorrect behavior.

Example: When Loop Interchange is Safe and Unsafe

Let’s look at a simple example where loop interchange is safe:

for (int i = 0; i < N; i++) {


for (int j = 0; j < M; j++) {
C[i][j] = A[i][j] + B[i][j];
}
}

In this example, we can safely interchange the i and j loops, as each iteration only accesses
A[i][j], B[i][j], and C[i][j], all of which are independent of each other.

Now, consider an example where loop interchange would be unsafe:

for (int i = 0; i < N; i++) {


for (int j = 0; j < i; j++) {
C[i][j] = C[i-1][j] + A[i][j];
}
}

In this case:

●​ The range of j depends on i, so interchanging the loops would change the number of
iterations in each loop, resulting in incorrect behavior.
●​ Furthermore, there’s a data dependency on C[i-1][j], which means each iteration
relies on the previous row of C being computed. This dependency would be violated if we
interchanged i and j.

Summary

612
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Loop interchange is a powerful optimization technique but must be used carefully.


●​ It is safe to apply only when there are no dependencies that would be broken by
reordering.
●​ It can be beneficial for improving cache usage by achieving a better memory access
pattern.
●​ Dependency analysis is often required to determine whether loop interchange is safe
and whether it would actually improve performance.

Modern compilers can sometimes perform loop interchange automatically if they detect it is both
safe and beneficial, but understanding these principles allows programmers to manually apply it
where necessary or to guide the compiler to optimize effectively.

Are there other optimizations that also improve cache locality?


Yes, tiling (also called blocking) is another important loop optimization technique that improves
cache locality. Tiling is used to improve the cache utilization by breaking down a loop's
iteration space into smaller "tiles" or "blocks." Each tile is processed as a unit, allowing data to
stay in the cache longer and reducing the number of cache misses. This is especially useful for
computations with large datasets, such as matrix multiplication.

How Tiling Works

To understand tiling, let’s look at a matrix multiplication example. In a naive implementation, the
entire matrices A and B might be accessed in ways that cause many cache misses, especially if
the matrices are too large to fit in the cache. By tiling, we divide the computation into smaller
blocks that fit in the cache, reducing the need to reload data.

For instance, consider a matrix multiplication loop:

for (int i = 0; i < N; i++) {


for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}

With tiling, we process the matrices in sub-blocks, so that data loaded into the cache for one tile
can be reused within that tile before moving to the next one. A tiled version might look like this:

int TILE_SIZE = 32; // Example tile size


for (int i = 0; i < N; i += TILE_SIZE) {
for (int j = 0; j < N; j += TILE_SIZE) {
for (int k = 0; k < N; k += TILE_SIZE) {
// Process a tile of size TILE_SIZE x TILE_SIZE
for (int ii = i; ii < i + TILE_SIZE && ii < N; ii++) {
for (int jj = j; jj < j + TILE_SIZE && jj < N; jj++) {

613
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

for (int kk = k; kk < k + TILE_SIZE && kk < N; kk++) {


C[ii][jj] += A[ii][kk] * B[kk][jj];
}
}
}
}
}
}

In this example:

●​ We split the loops into tiles of size TILE_SIZE x TILE_SIZE, processing each
sub-block independently.
●​ By processing smaller chunks that fit in cache, we increase the likelihood that A, B, and
C values will stay in the cache until they are no longer needed.

When Can a Compiler Apply Tiling?

For a compiler to apply tiling automatically, it needs to determine whether tiling is both safe and
beneficial for the specific code being optimized. Here are the main considerations:

1.​ Loop Nest Structure:


○​ Tiling is most beneficial for nested loops that access large arrays or matrices.
○​ The compiler looks for loops that have regular access patterns and exhibit spatial
or temporal locality (i.e., repeatedly accessing the same data or contiguous data).
2.​ Data Dependencies:
○​ Tiling must respect data dependencies within loops. For instance, if an iteration
of a loop depends on results from previous iterations, tiling might break this
dependency, leading to incorrect results.
○​ In matrix multiplication, the calculations for each element in C are independent,
so tiling is safe. However, in other computations with dependencies between loop
iterations, tiling may not be feasible.
3.​ Cache Size and Architecture:
○​ To decide on a tile size, the compiler would ideally know the cache size and
cache line size. Tiling is typically configured to fit within the L1 or L2 cache to
minimize cache misses.
○​ Tile sizes may vary based on the architecture. Compilers that perform tiling often
have heuristics to determine tile sizes based on the target hardware.
4.​ Automatic Loop Transformation:
○​ Some modern compilers can apply tiling automatically as part of loop nest
optimization techniques, especially when targeting specific hardware (e.g.,
Intel’s ICC, LLVM with Polly).
○​ Compiler flags for loop optimizations may trigger tiling; for instance, flags like -O3
often enable loop transformations, though the success of tiling depends on the
compiler’s capabilities and the loop structure.

614
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Benefits and Challenges of Tiling

Benefits:

●​ Improved Cache Utilization: Tiling enhances data locality, allowing data to remain in
the cache longer. This reduces cache misses and improves memory bandwidth usage.
●​ Higher Performance: By accessing data in blocks, tiling can significantly speed up
computations on large data sets, particularly for memory-bound applications like matrix
multiplication.

Challenges:

●​ Choosing the Optimal Tile Size: The ideal tile size depends on the hardware’s cache
size, cache line size, and associativity. Picking a tile size that’s too large or too small can
negate the benefits of tiling.
●​ Increased Code Complexity: Tiling can make loop structures more complex, which
may complicate debugging and maintenance. Some compilers may avoid tiling if the
code complexity outweighs the performance benefits.
●​ Alignment Constraints: In some cases, tiled loops may introduce alignment issues,
particularly if tile boundaries do not align perfectly with data structures. This can
introduce overhead when handling edge cases.

Summary

Tiling is a powerful optimization for improving cache efficiency in programs with large,
multi-dimensional data. While compilers can apply tiling automatically, they must carefully
analyze data dependencies, loop structures, and cache characteristics to determine when tiling
is both safe and beneficial. Some high-level compilers and libraries (like BLAS for linear
algebra) use tiling extensively, but compilers like GCC or LLVM typically require complex
analyses or profile information to apply it effectively.

Can you show how tiling would work on our matrix multiplication example?
Sure we can apply tiling on matrix multiplication. The code below shows a tiled version of matrix
multiplication, where only the innermost loop is tiled:

615
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Explanation for IKJ Tiled Version

●​ We introduce an additional outer loop jj to tile the j loop in chunks of 32.


●​ This jj loop divides the j dimension into blocks of TILE_SIZE, and the innermost j
loop processes each block.
●​ This tiling order is beneficial as it keeps C[i][j] in the cache for longer and accesses
A[i][k] and B[k][j] in a cache-friendly manner.

Usage Notes

Both of these versions use tiles to improve cache efficiency by accessing chunks of memory in a
way that increases the likelihood of reuse within each tile, reducing cache misses. Adjusting the
tile size can further improve performance depending on the matrix size and cache structure of
your system.

616
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Are C-like structs implemented just like arrays?


Yes, in C, structs are typically stored as contiguous blocks of memory, much like arrays.
However, there are some important differences between how structs and arrays are organized
in memory:

1. Contiguous Memory Layout

●​ When you declare a struct, the fields are stored in memory sequentially, in the order they
are declared. For example, in this struct:

struct Point {
int x;
int y;
};

●​ if Point is stored at a memory address 0x1000, then x will be stored at 0x1000, and y
will follow immediately after at 0x1004 (assuming int is 4 bytes).

2. Memory Layout with Padding

●​ Unlike arrays, structs may have padding added between fields to align them according
to the system's memory alignment requirements. This alignment helps improve
performance, as misaligned data accesses can be slower on many architectures.
●​ For instance, consider this struct:

struct Mixed {
char a;
int b;
char c;
};

●​ Depending on the compiler and platform, this struct might have padding between a and
b, as well as between b and c, resulting in a larger memory footprint than the raw sum of
each field's size. For example, a might occupy 1 byte, then 3 bytes of padding might
follow, and b will start on a 4-byte boundary.

3. Array vs. Struct Memory Access

●​ Arrays: All elements are of the same type and size, so the address of any element can
be computed by simple arithmetic: address = base_address + index *
element_size.

617
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Structs: The fields within a struct can have different sizes, so the address of each field is
specific to its offset within the struct. The offset is determined by the order of fields and
any padding applied by the compiler.

4. Accessing Structs in Memory

●​ Structs are accessed by field names, not by indices as with arrays. This makes working
with structs more flexible when different data types are involved.
●​ For example, given struct Point p;, you access fields with p.x and p.y, and you
don’t have to worry about calculating the offsets manually.

5. Arrays of Structs

●​ When you create an array of structs, such as struct Point points[10];, each
Point in the array is stored consecutively in memory, much like an array of arrays would
be. However, within each struct instance, padding might still be present, which can
impact the memory layout.

Summary

While both arrays and structs in C are stored contiguously in memory, the key differences are:

●​ Arrays are contiguous sequences of identical elements, while structs can contain
different types and may include padding for alignment.
●​ Arrays allow element access by index, while structs use field names for access, with
fixed offsets for each field.

This contiguous layout and predictable access pattern make structs similar to arrays in terms of
memory organization but with additional complexity due to padding and mixed data types.

That means that accesses to fields within a struct are O(1), just like in
arrays?
Accesses to fields within a struct are indeed O(1), just like array accesses. And, as you
mentioned, struct field access can sometimes be even faster than array access because there’s
no need for multiplication.

Here’s why:

1.​ Known Displacement at Compile Time:


○​ When the compiler compiles code with structs, it calculates the offsets
(displacements) of each field within the struct. This information is fixed at compile
time, so when you access p.x or p.y in a struct Point p, the compiler

618
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

translates that to a memory address calculation based on a fixed offset from the
base address of p.
○​ For example, if x is at offset 0 and y is at offset 4 in struct Point, accessing
p.y will simply involve fetching the value at p + 4 (no need for runtime offset
calculations).
2.​ Array Access Involves Multiplication:
○​ In arrays, even though access is also O(1), the memory address of an element is
calculated at runtime as base_address + index * element_size. The
index * element_size multiplication is necessary because the element size
is not fixed at compile time (the index can vary at runtime). This multiplication can
add a tiny bit of overhead, especially in cases where the element size is not a
power of two and cannot be optimized by the compiler.
3.​ Efficient Struct Access:
○​ Struct access is often as fast as, or even slightly faster than, array access
because the compiler directly knows the memory offsets of fields. For example,
accessing p.x or p.y in struct Point doesn’t require index calculations or
multiplications, just simple addition based on precomputed offsets.

Example Comparison

Consider this C code:

struct Point {
int x;
int y;
};

struct Point p;
p.x = 10; // Access using a fixed offset

The compiler directly translates p.x access to an instruction that refers to p’s base address with
an offset, without any need for multiplication.

In contrast, for an array:

int arr[10];
arr[i] = 10; // Access with runtime index calculation

The compiler must compute the address as base_address + i * sizeof(int), where i


varies at runtime.

Summary

619
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Both struct and array accesses are O(1), but struct field access is often slightly
simpler and potentially faster, as it avoids runtime multiplication.
●​ No Multiplication: For structs, the offsets are determined at compile time, so each field
access is a simple addition rather than an index-based calculation.

This is part of why structs are well-suited for data with fixed layouts where different types are
used, as they allow fast, predictable access patterns without the need for indexing or runtime
calculations.

So, does that mean that you can access the fields of a struct using an
offset, instead of the field's name?
Yes, that's possible. The program below is initializing the fields of a struct by manipulating
memory directly, bypassing the usual field names:

In this example, we are leveraging struct memory layout and field offsets to directly modify
the contents of a struct in memory.

Here's a breakdown of what's happening:

620
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Understanding Memory Layout:


○​ In your struct, the fields (char c, long l, and int i) are laid out in memory at
specific offsets based on the data type sizes and alignment rules.
○​ The image shows how these fields are placed in memory with gaps (padding) in
between. This padding is there to ensure each field starts at an address aligned
according to its size.
2.​ Accessing Struct Fields by Offset:
○​ Because struct fields are stored at known, fixed offsets from the struct's base
address, you can use pointer arithmetic to modify these fields directly.
○​ For instance, if you know the base address of the struct s, you can add offsets to
reach s.c, s.l, and s.i without referencing the field names.
3.​ Explanation of Offsets:
○​ In this example, the offsets are manually determined based on the memory
alignment shown in the image (e.g., c at offset 0, l at offset 8, i at offset 16).
○​ This approach works because the compiler arranges struct fields in a consistent
order, and you’re directly setting values by interpreting different memory regions
as char, long, and int.
4.​ Limitations and Risks:
○​ While this technique works in this controlled example, it's risky in general
because offsets may vary across different systems due to compiler-specific
alignment rules or platform differences.
○​ This approach is also error-prone and not portable, as different systems or
compilers might use different padding and alignment strategies.

Using this technique, you're effectively doing "manual memory management," treating the struct
as a raw memory block. This is powerful but requires precise knowledge of struct layout and
should be used carefully.

Is it possible to instruct the compiler to stop using padding when generating


code for structs?
Yes, you can instruct the compiler to eliminate padding in a struct by using compiler-specific
directives or attributes that control the memory alignment of the struct. This way, the struct
fields are packed tightly together without any padding between them.

Here’s how you can do it in GCC/Clang and MSVC:

1. GCC/Clang: Using __attribute__((packed))

For GCC and Clang, you can use the __attribute__((packed)) directive to pack the
struct, which tells the compiler to eliminate padding between fields.

#include <stdio.h>

621
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

struct __attribute__((packed)) MyStruct {


char c;
long l;
int i;
};

int main() {
struct MyStruct s;
char *base = (char *)&s;

// Initialize the fields without using field names


*base = 'A'; // s.c = 'A'
*(long *)(base + 1) = 2; // s.l = 2 (offset 1, no padding)
*(int *)(base + 9) = 3; // s.i = 3 (offset 9, no padding)

printf("s.c = %c\n", s.c);


printf("s.l = %ld\n", s.l);
printf("s.i = %d\n", s.i);

return 0;
}

In this version:

●​ s.c is at offset 0 (1 byte).


●​ s.l is at offset 1 (no padding).
●​ s.i is at offset 9 (no padding).

The total size of MyStruct will now be 13 bytes, which is the sum of the sizes of the fields: 1 +
8 + 4 = 13.

Important Considerations

1.​ Performance Implications:


○​ Packing structs can cause performance issues on many systems. Accessing
misaligned fields (like a long starting at an odd address) may require extra
processing, potentially making access slower. Some CPUs do not support
misaligned accesses directly, which can lead to additional instructions or even
exceptions.
2.​ Portability:
○​ While this approach works on most platforms, it’s essential to test your code on
the target system to ensure compatibility, as some architectures have strict
alignment requirements.
3.​ Use Cases:

622
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Packed structs are often used in low-level programming, particularly in network


protocols and file formats where memory layout must be precise.

Using packing, you eliminate padding, allowing exact control over memory layout, but it’s best
used sparingly due to the potential performance impact.

Is there a general theory that helps us understand how loops can be safely
transformed?
Yes, there is one such theory: the Polyhedral Model, or Theory of Polytopes! This is a
mathematical framework often used to reason about and optimize loop nests in programs. It
provides a way to model computations and data accesses in loops as geometric objects,
enabling systematic exploration of transformations to improve performance on modern
hardware. Let's break it down and relate it to your question about loop transformations.

What is the Polyhedral Model?

At its core, the polyhedral model represents iterations of loop nests and their data accesses
using integer points within multi-dimensional spaces. These spaces, called iteration domains,
are bounded by systems of linear inequalities.

1.​ Iteration Domain: Each loop nest is modeled as a convex polytope, where each
dimension corresponds to a loop index.​

○​ For example, a loop for (i=0; i<N; i++) corresponds to a 1D domain {i


| 0 ≤ i < N}.
○​ A nested loop for (i=0; i<N; i++) for (j=0; j<M; j++) corresponds
to a 2D domain {(i, j) | 0 ≤ i < N, 0 ≤ j < M}.
2.​ Access Functions: Array accesses are modeled as affine functions of the loop indices.
For instance, an access A[i + j] would be represented as a linear transformation
mapping the iteration domain to memory locations.​

3.​ Dependence Relations: Dependencies between iterations are captured as constraints


in this geometric space, ensuring correctness during transformations.

How Can It Help Model Loop Transformations?

Loop transformations can be seen as geometric manipulations of these polytopes. Some


examples:

1.​ Loop Fusion: Merging two loops into a single loop can be visualized as combining their
iteration domains into a single polytope. This can improve data locality by reducing the
distance between dependent operations.​

623
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Loop Tiling (or Blocking): Divides the iteration space into smaller, rectangular tiles
(sub-polytopes) to improve cache reuse by focusing computations on smaller blocks of
data at a time.​

3.​ Loop Skewing: Modifies the iteration domain by applying an affine transformation to
change the execution order, useful for parallelism or eliminating dependencies.​

4.​ Loop Parallelization: Splits the iteration domain into independent regions
(sub-polytopes) that can be executed in parallel, guided by the dependence analysis.​

5.​ Loop Interchange: Rotates the polytope, changing the nesting order of loops. This can
optimize memory access patterns.​

6.​ Loop Unrolling: Expands iterations within the polytope to expose more instructions for
optimization or pipelining.

Advantages of the Polyhedral Model

●​ Systematic Exploration: By formulating loop transformations as affine transformations


on polytopes, the model provides a rigorous, algebraic way to explore many
optimizations.
●​ Dependence Preservation: Dependences between loop iterations are explicitly
modeled, ensuring correctness during transformations.
●​ Hardware Mapping: It allows reasoning about data locality, parallelism, and
vectorization in a unified manner.
●​ Automation: Tools like Polly (LLVM), Pluto, and isl (Integer Set Library) use the
polyhedral model to automate loop transformations.

How Different Transformations Fit Within the Model

Many loop transformations can indeed be explained with the polyhedral model because they
boil down to manipulating iteration domains or access functions. By visualizing loops
geometrically, we gain clarity on how changes impact dependencies, parallelism, and
performance.

Who invented this "Theory of Polyhedra"?


It's hard to determine who, precisely, invented the Polytope Model. One of the key figures in this
field is Paul Feautrier. He is one of the pioneers in applying the Polyhedral Model to compiler
optimization, particularly for modeling and transforming loop nests. His work laid much of the
theoretical and practical groundwork for the polyhedral approach in optimizing high-performance
computing (HPC) applications.

624
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Paul Feautrier's Contributions to the Polyhedral Model

1.​ Dependence Analysis:


○​ Feautrier developed advanced techniques for dependence analysis in loop
nests, formulating it in terms of linear constraints and systems of equations.
○​ His work on dependence relations in the polyhedral model allows compilers to
understand how iterations of loops depend on one another, ensuring
transformations preserve program correctness.
2.​ Affine Scheduling:
○​ He introduced methods to determine affine schedules, which are mappings from
iteration domains to execution times. These schedules enable compilers to
reorder iterations for parallel execution or improved memory access patterns.
3.​ Automatic Parallelization:
○​ Feautrier’s research provided critical insights into how loops could be
automatically parallelized by analyzing the dependence structure in the
polyhedral model.
○​ He proposed algorithms that identify parts of computations that can safely run in
parallel, contributing to the development of parallelizing compilers.
4.​ Tools and Techniques:
○​ His work inspired or directly contributed to the development of tools like Pluto,
which uses polyhedral transformations to optimize programs for parallel
architectures.

Other Key Contributions to Computer Science

Beyond the polyhedral model, Paul Feautrier made important contributions to broader areas in
computer science:

1.​ Static Analysis:

625
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Feautrier worked on static analysis techniques to infer properties of programs,


which are crucial for compiler optimizations and program verification.
2.​ Formal Methods:
○​ He contributed to the formal modeling of program behaviors, which helps ensure
correctness in transformations and optimizations.
3.​ Automatic Parallelization and HPC:
○​ His insights into dependence analysis and scheduling have influenced how
compilers optimize scientific computations on modern architectures, particularly
those that exploit parallelism and vectorization.
4.​ Academic Influence:
○​ Feautrier's publications have had a significant impact on academic and practical
advancements in compilers. He has mentored many researchers who have
further developed the polyhedral model and related fields.

Legacy

Paul Feautrier's work is foundational in the field of loop optimization and automatic
parallelization, particularly in the context of polyhedral theory. His ideas continue to underpin
many modern compiler frameworks that aim to optimize performance-critical code for parallel,
vector, and memory-efficient execution.

626
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Code Generation for Object-Oriented Features


Code: https://github.com/pronesto/DCC053_Public/tree/main/20_ObjectOrientation

What is object-oriented programming?


Object-oriented programming (OOP) is a programming paradigm that encourages us to think of
data as “intelligent” entities that “know” how to perform actions related to their own nature.
Instead of thinking of data as passive elements that we act upon, in OOP, we think of them as
objects that encapsulate both data and behaviors (or methods) specific to them. This way, each
type has its own “knowledge” or “responsibility” for handling tasks. So yes, in a way, OOP does
"anthropomorphize" data, giving types their own responsibilities, which is a helpful perspective
for understanding why the paradigm is structured the way it is.

Can you give me an example that compares procedural and object-oriented


codes?
Sure! Below we have two versions of a program that manipulates linked lists. The table below
compares them qualitatively:

Comparison
Aspect Procedural Style Object-Oriented Style

Data and Data (Node) is passive; Data and behavior are encapsulated
Behavior functions operate on it. in objects.

Error Handling Explicit checks (NULL) with Exceptions for invalid operations.
manual errors.

Extendability Adding new types of nodes is Adding new node types is easy via
cumbersome. inheritance.

Memory Manual allocation and Smart pointers (std::shared_ptr)


Management deallocation. manage memory.

Both implementations achieve the same functionality, but the OO approach adheres to the
principle of “intelligent types” and makes extending or modifying the program more intuitive. On
the other hand, the procedural approach may be simpler and more efficient in cases where
polymorphism and extensibility are not needed.

627
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Object-oriented programming naturally leads to encapsulation. Why?


Before we discuss why that's so, first, let's define what encapsulation means. Encapsulation
means bundling the data (attributes) and the methods (functions) that operate on the data into
a single unit — the object — and restricting direct access to some of the object's components.
This is a key feature of OOP and provides several benefits.

Why is Encapsulation Natural in OOP?

In OOP, because objects know how to perform actions related to their type, the
implementation details of these actions can be hidden from the outside world. This is different
from procedural programming, where external functions operate on data structures, and you
must ensure that those functions are always used correctly.

628
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let’s illustrate this with an example:

Without Encapsulation (Procedural Approach)

In a procedural style, the internal structure of the type is exposed, and external functions must
manipulate the data directly. For example:

#include <stdio.h>

typedef struct {
int elements[100];
int size;
} List;

// External function to get the size


int size(List* list) {
return list->size;
}

// Main function
int main() {
List myList = {{1, 2, 3}, 3};
printf("Size: %d\n", size(&myList));
return 0;
}

Here:

●​ The user has access to the entire structure of List.


●​ They could bypass the size() function entirely and directly access list->size,
possibly causing errors.

With Encapsulation (Object-Oriented Approach)

In OOP, the details of how the size is stored or computed can be hidden from the user:

#include <iostream>

class List {
private:
int elements[100];
int size;

public:
List() : size(0) {}

629
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

void add(int element) {


if (size < 100) {
elements[size++] = element;
}
}

int getSize() const {


return size;
}
};

// Main function
int main() {
List myList;
myList.add(1);
myList.add(2);
myList.add(3);
std::cout << "Size: " << myList.getSize() << std::endl;
return 0;
}

Here:

●​ The data (elements and size) is private, meaning the user cannot access it directly.
●​ The type itself (List) knows how to handle its data through the methods add() and
getSize().

Key Takeaway

Encapsulation is so natural in OOP because all actions on a type are internal to the type
itself. The type acts as a self-contained unit that “knows” how to handle its data. This design
principle:

●​ Keeps the implementation details private,


●​ Protects the integrity of the data,
●​ And makes the program easier to understand, extend, and maintain.

By encapsulating both data and behavior, OOP simplifies the process of designing robust and
reusable components.

But why is encapsulation so important?


Encapsulation reduces the cost of modifying software! When you modify software, there are
typically two major costs involved:

630
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Changing the module you want to modify: This is the direct cost of updating the part
of the code that implements a feature.
2.​ Adjusting the effects of the modification on other modules: This happens when your
change affects other parts of the program, causing a cascade of updates.

Encapsulation, a key principle of object-oriented programming, helps eliminate the second cost
by isolating the implementation details of a module from the rest of the program. This is
achieved by exposing a stable interface while hiding the inner workings of a module.

Example: How Strings Work in C

In C, the length of a string is determined by finding the position of the special character \0 (null
terminator) in memory. Programs have two main ways to find a string’s length:

1.​ Using strlen():​


This is the official function provided by the C standard library. It abstracts away the
details of how the length is calculated.
2.​ Manually looping through the string:​
Some developers directly iterate over the characters in the string, looking for \0.

Here’s the problem with direct iteration:

●​ It relies on the implementation detail that strings in C are null-terminated.


●​ If the C language were ever updated to use a different string representation (e.g., using
a length prefix or a different terminator like \1), these programs would break because
they bypassed the abstraction provided by strlen().

In contrast, programs that consistently use strlen() would remain unaffected. Why? Because
the interface (strlen) stays the same, even if the implementation of how the length is
determined changes.

Encapsulation in Action

Encapsulation is a way to enforce this separation between interface and implementation. When
a program interacts only with a public interface — like strlen() — and doesn’t rely on
hidden details, we gain several benefits:

1.​ Stable Interfaces, Flexible Implementations​


Interfaces (like strlen) change infrequently, while implementations might evolve
frequently. Encapsulation ensures that we can improve or replace the implementation
without breaking other parts of the program.
2.​ Reduced Ripple Effect​
Changes in one module don’t propagate to others because the details are hidden behind
a stable interface. This reduces the cost and risk of modifying software.

631
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Future-Proof Design​


A well-encapsulated design ensures that programs are prepared for future changes or
optimizations. For example, if we decided to use a different way to represent strings,
only the strlen function would need to change.

Analogy: Encapsulation as a Black Box

Think of encapsulation as a black box:

●​ From the outside, you only see the buttons and levers (the public interface).
●​ You don’t know or care what machinery is inside the box or how it works.
●​ As long as the buttons and levers don’t change, the black box can be updated,
optimized, or completely redesigned internally, and everything that interacts with it will
continue to work.

Now imagine a system without a black box: if every user could tinker with the internal machinery
directly, any change would require you to ensure that all users’ code still works. This is what
happens in C when people bypass strlen() and rely on \0.

Encapsulation in Object-Oriented Programming

Encapsulation is built into object-oriented programming languages by design. For instance:

class String {
private:
char* data;
size_t length;

public:
size_t getLength() const {
return length;
}
};

Here:

●​ The data and length are private, so no external code can directly modify or access
them.
●​ The program must use getLength() to retrieve the length, which means the
implementation of how length is stored can change freely (e.g., we could calculate it
dynamically or store it as a property).

By enforcing this structure, object-oriented languages make it easier to write code that is robust
to changes and cheaper to maintain.

632
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Takeaway

Encapsulation hides implementation details behind a stable interface, which:

1.​ Reduces the cost of modifying software.


2.​ Prevents unintended dependencies on internal details.
3.​ Makes it possible to update or optimize implementations without affecting other parts of
the program.

Encapsulation is a powerful tool that allows us to write software that is easier to maintain, adapt,
and future-proof.

What are the essential key features of object-oriented languages?


Object-oriented programming (OOP) is a paradigm that revolves around objects — entities that
encapsulate both data and behavior. While different languages and practitioners emphasize
various aspects of OOP, two features stand out as indispensable:

1. Instantiable Namespaces (Classes or Structs)

At the heart of OOP is the ability to create instantiable namespaces — constructs such as
classes or structs that combine data and methods. These namespaces:

●​ Define a blueprint for creating objects (instances of the namespace).


●​ Allow programmers to associate data (attributes) and behavior (methods) within a
cohesive unit.
●​ Encapsulate functionality, making code more modular, reusable, and easier to reason
about.

For example, in a class like Car, we might bind attributes such as speed and methods such as
accelerate():

class Car {
public:
int speed;
void accelerate() { speed += 10; }
};

Each instance of Car represents a specific object with its own state and behaviors.

2. Late Binding

Late binding is the ability to determine, at runtime, the specific implementation of a behavior
(method) to invoke on an object. This feature enables polymorphism, one of the most powerful
tools in OOP:

633
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ It allows different objects to respond in their own way to the same message (method
call).
●​ Programmers can write code that operates on general types (e.g., base classes or
interfaces) without needing to know the specific type of the object at compile time.

For example:

class Animal {
public:
virtual void makeSound() { std::cout << "Some generic sound\n"; }
};

class Dog : public Animal {


public:
void makeSound() override { std::cout << "Woof!\n"; }
};

class Cat : public Animal {


public:
void makeSound() override { std::cout << "Meow!\n"; }
};

void interactWithAnimal(Animal* a) {
a->makeSound(); // Late binding decides the behavior at runtime.
}

Here, calling interactWithAnimal with a Dog or Cat object leads to different outputs
because the actual method implementation is determined during execution.

Why Are These Features Essential?

These two features work together to enable the key benefits of OOP:

●​ Instantiable namespaces let us group data and methods into objects, representing
real-world entities and encapsulating complexity.
●​ Late binding ensures flexibility by allowing programs to handle different types of objects
in a unified way, supporting extensibility and polymorphism.

Without either of these, the fundamental advantages of OOP — such as code reuse, modularity,
and flexibility — are severely diminished. While other concepts like inheritance or encapsulation
are common in OOP, they are not strictly essential and depend on the specific language or
design.

Key Takeaway

The essential features of object-oriented programming are:

634
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Instantiable namespaces for creating objects that combine data and behavior.
2.​ Late binding for determining object-specific behavior dynamically at runtime.

These features form the foundation of OOP and distinguish it from other programming
paradigms. They allow us to design systems that are more modular, adaptable, and expressive,
providing a powerful way to model the world in code.

By "Instantiable Namespaces" you mean that the OO language must offer


some syntax like class or struct?
No! Nothing like that :P

The essence of object-oriented programming (OOP) lies in the encapsulation of data and
behavior, rather than in the specific syntax or constructs provided by a language. While
languages like C++, Java, and Python provide first-class support for object-oriented features
through constructs such as classes and structs, the core idea of combining state (data) with
behavior (functions) into "objects" can be implemented in many ways, even in languages that
don't explicitly support OOP. The figure below shows some examples:

635
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Insights from These Examples:

1.​ Closures as Objects:


○​ Closures are powerful tools that naturally encapsulate state and behavior.
○​ In the case of Hush, SML/NJ, and JavaScript, closures enable you to create
instances of objects that maintain private state and expose behavior through
functions.
2.​ Dictionaries as Objects:
○​ In languages like Python and Hush, dictionaries can serve as a lightweight
alternative to classes. They allow you to map keys (fields and methods) to values
(data and functions).

636
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ This approach aligns closely with prototype-based programming, as seen in


JavaScript's earlier paradigms.
3.​ Syntax vs. Semantics:
○​ The syntax for defining "instantiable namespaces" (e.g., class in C++ or
Python) simplifies the programmer's task and adds readability but isn't
fundamental to achieving OOP's principles.
○​ As your examples show, any language that provides functions and some form of
mutable state can support OOP concepts.
4.​ Flexibility and Trade-offs:
○​ Using closures and dictionaries offers flexibility, but it often lacks the syntactic
sugar and tooling support (e.g., IDE autocompletion, static type checking) that
comes with formal class definitions.
○​ Languages with first-class OOP support provide features like inheritance,
polymorphism, and encapsulation mechanisms out of the box, which can be
more challenging to replicate manually.

Broader Implications:

The ability to simulate OOP in non-OOP languages reinforces the idea that OOP is a paradigm,
not a language feature. It showcases the universality of fundamental programming concepts,
such as:

●​ Encapsulation: Keeping implementation details private while exposing necessary


behavior.
●​ State: Representing mutable data tied to an instance.
●​ Behavior: Associating actions or operations with that data.

In practice, though, the choice to use a formal OOP syntax often depends on factors like ease of
maintenance, team familiarity with the paradigm, and available tooling. However, as our
example demonstrated, languages with even minimal syntactical support for functions and data
structures can implement OOP effectively.

Let's create a simple class in C++, and see how it would look like in C?
Well, there is not much to it, if the class has no virtual methods. Here's an equivalent
implementation of a simple class in C++ and in C:

637
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Breakdown of the C Code:

1.​ The Animal Struct:


○​ In C, we use a struct to represent an object. The struct contains the internal
state (age), which is equivalent to the private member variable _age in the C++
class.
○​ There are no method pointers here because we don't need them if the methods
aren't virtual. Instead, the functions that operate on the Animal struct are called
directly.
2.​ Methods as Functions:
○​ In the C version, the inc_age and get_age functions take a pointer to the
Animal struct as their first argument (this). This allows them to modify or
access the internal state of the Animal object.
○​ animal_inc_age(Animal* this) increments the age of the Animal object
by accessing this->age.
○​ animal_get_age(Animal* this) returns the age of the Animal object by
accessing this->age.
3.​ Constructor-like Function (new_animal):
○​ Since C doesn't have constructors, we create a constructor-like function,
new_animal, that dynamically allocates memory for a new Animal object using
malloc.
○​ It sets the age of the Animal object and returns a pointer to the newly created
Animal.
4.​ Main Function:

638
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ In main, we create a new Animal object by calling new_animal(5), which


initializes the age to 5.
○​ We then call animal_get_age to print the initial age and animal_inc_age to
increment the age.
○​ Finally, we print the updated age.

That does not explain virtual methods though. What are virtual methods?
Virtual methods are a key feature of object-oriented programming that allow programs to
achieve dynamic dispatch, enabling flexibility, extensibility, and elegant design.

What Are Virtual Methods?

A virtual method is a function or method in a base class that can be overridden by derived
classes. When a method is marked as virtual, the method call is determined at runtime, based
on the type of the actual object (not the type of the reference or pointer). This is what we call
dynamic dispatch.

How Do Virtual Methods Work?

●​ If a method is virtual, the runtime system decides which implementation of the method
to invoke based on the actual type of the object.
●​ If a method is not virtual, the compiler determines the method to call at compile time,
based on the type of the reference or pointer.

Examples

C++ (Methods Can Be Virtual or Non-Virtual)

In C++, you must explicitly declare a method as virtual in the base class for it to support
dynamic dispatch. If you don't, the method is resolved statically (at compile time).

#include <iostream>
using namespace std;

class Animal {
public:
virtual void speak() { cout << "Animal speaks" << endl; }
void walk() { cout << "Animal walks" << endl; } // Non-virtual method
};

class Dog : public Animal {


public:
void speak() override { cout << "Dog barks" << endl; } // Overrides speak
void walk() { cout << "Dog walks" << endl; } // Overrides walk
};

639
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

int main() {
Animal* animal = new Dog();
animal->speak(); // Dog barks (dynamic dispatch)
animal->walk(); // Animal walks (static dispatch)
delete animal;
}

Java (All Methods Are Virtual by Default)

In Java, all methods are virtual by default (unless declared final or static). This means
method calls are resolved dynamically.

class Animal {
// Every method is virtual by default:
void speak() { System.out.println("Animal speaks"); }
void walk() { System.out.println("Animal walks"); }
}

class Dog extends Animal {


@Override
void speak() { System.out.println("Dog barks"); }
@Override
void walk() { System.out.println("Dog walks"); }
}

public class Main {


public static void main(String[] args) {
Animal animal = new Dog();
animal.speak(); // Dog barks
animal.walk(); // Dog walks
}
}

Advantages of Virtual Methods

1.​ Subtyping Polymorphism with Dynamic Dispatch:


○​ Virtual methods allow the same interface to work with objects of different types.
This is a cornerstone of OOP because it enables generic and reusable code.
○​ For example, you can create a collection of Animal objects and call speak() on
each, even if the actual types are different (e.g., Dog, Cat, etc.).
2.​ Extensibility:
○​ Virtual methods make it easy to add new behavior to a system without modifying
existing code. You can add new subclasses that override existing virtual methods
without breaking the code that uses the base class.

640
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Encapsulation:
○​ Virtual methods help encapsulate the details of specific implementations while
exposing a common interface. This makes code easier to read and maintain.
4.​ Elegant Design:
○​ Virtual methods embody the "Open/Closed Principle": systems should be open to
extension but closed to modification. You can define a base class and extend its
functionality without changing its code.

Beauty of Virtual Methods

The beauty of virtual methods lies in their ability to simplify complex systems while maintaining
flexibility. Consider a simple example: a graphics application.

class Shape {
public:
virtual void draw() const = 0; // Pure virtual method
};

class Circle : public Shape {


public:
void draw() const override { cout << "Drawing Circle" << endl; }
};

class Square : public Shape {


public:
void draw() const override { cout << "Drawing Square" << endl; }
};

void render(const Shape& shape) {


shape.draw(); // Dynamic dispatch
}

int main() {
Circle c;
Square s;
render(c); // Drawing Circle
render(s); // Drawing Square
}

Without virtual methods, you would need complex conditionals (if or switch) to check the
type of the object and call the appropriate method, violating key principles of OOP. With virtual
methods, adding new shapes is effortless—just define a new subclass!

Trade-offs

641
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Performance:
○​ Virtual methods involve a slight runtime cost because they rely on a vtable (a
table of function pointers) for dynamic dispatch. However, this cost is negligible in
most cases.
2.​ Explicit Control (C++ vs. Java):
○​ In C++, you have fine-grained control over which methods are virtual, allowing for
optimization where dynamic behavior isn't needed.
○​ In Java, while the always-virtual approach simplifies the language, it may
introduce unnecessary overhead when dynamic dispatch isn't required.

Conclusion

Virtual methods are a powerful mechanism that enable polymorphism and make object-oriented
design both flexible and extensible. Whether controlled explicitly (as in C++) or implicitly (as in
Java), they allow for more modular, maintainable, and scalable systems. They elegantly balance
the need for abstraction with the ability to extend functionality, embodying the true spirit of
object-oriented programming!

Good! But how can we implement virtual methods?


With virtual methods the implementation of classes is very different, and it is beautiful! The key
idea here is that virtual methods allow the actual method to be determined at runtime, which is
the core idea behind dynamic dispatch.

In C++, virtual methods are implemented using vtables (virtual function tables). A vtable is
essentially a lookup table containing pointers to the virtual methods of a class. In C, we simulate
this concept by using function pointers within a struct. There are several possible
implementations, in this case: either we can add to the object struct a single pointer to a table of
function pointers (as C++ actually does), or we can add the function pointers directly to the
struct that represents the objects. Let's follow the second approach:

642
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's walk through the C code that simulates the C++ virtual method behavior.

●​ Virtual Methods (inc_age, get_age): These methods are marked virtual, which
means that they can be overridden in subclasses and that the actual method invoked will
be determined at runtime, based on the actual type of the object. This is the essence of
dynamic dispatch in object-oriented programming.
●​ vtable (implicit): C++ automatically creates a vtable for the Animal class to hold
pointers to the virtual methods. When we call inc_age or get_age on an object, the
correct method is looked up in the vtable at runtime.

Breakdown of the C Code

1.​ Struct Definition (Animal):


○​ The Animal struct holds the internal state (age), just like in the C++ class.
○​ In addition to the state, there are method pointers:
■​ void (*inc_age)(struct Animal* this) points to the inc_age
function.
■​ unsigned short (*get_age)(struct Animal* this) points to
the get_age function.
○​ These method pointers are similar to the vtable in C++—they allow us to
dynamically choose which function to call at runtime.
2.​ Methods (animal_inc_age, animal_get_age):
○​ animal_inc_age increments the age of the Animal object.

643
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ animal_get_age returns the current age of the Animal object.


○​ These functions are assigned to the method pointers when the object is created.
3.​ Constructor (new_animal):
○​ The new_animal function allocates memory for an Animal object and initializes
its internal state (age).
○​ It also sets the method pointers (inc_age and get_age) to point to the
appropriate functions (animal_inc_age and animal_get_age).
○​ This is where we mimic the vtable setup that would be automatically done in
C++.
4.​ Method Calls in main:
○​ In main, we create an Animal object by calling new_animal.
○​ We use the method pointers to call the methods:
■​ a->get_age(a) calls the get_age method.
■​ a->inc_age(a) calls the inc_age method.
○​ These method calls are direct because, in this example, the methods are not
being overridden by any subclasses (but in a more complex example, this
mechanism allows polymorphic behavior).

Why This Works:

●​ Polymorphism Simulation in C: By using function pointers, we can simulate virtual


methods and polymorphism in C. If we were to introduce a subclass, we could
override the method pointers to point to different functions, achieving polymorphism. This
is essentially what happens behind the scenes in C++ with the vtable.
●​ Flexibility: This approach makes it easy to change the behavior of the Animal object at
runtime, just like virtual methods in C++.

Conclusion:

In this C code, the method pointers (inc_age and get_age) inside the Animal struct mimic
the behavior of virtual methods in C++. These pointers allow us to dynamically call the
appropriate function, which is similar to how C++ determines the method to call through the
vtable. By using this technique, you can implement object-oriented behavior, like polymorphism,
even in C, which doesn't have built-in support for classes or virtual methods.

How are methods implemented?


At its core, a method is just a function where the first argument is a reference to the object
(often called this or self, depending on the language). This design allows the function to
operate on the object's internal state or invoke other methods.

How It Works:

644
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Functions vs. Methods:


○​ A function is a standalone block of code.
○​ A method is a function that is "associated" with an object. This association is
achieved by passing the object itself as the first argument.
2.​ In C-like Implementation:
○​ In C++, this association is implicit: the this pointer is passed automatically when
you call a method.
○​ In C, you explicitly pass the object reference (e.g., Animal* this) as the first
argument, making it behave like a method.

Key Insight:

●​ Object-Oriented vs. Functional Thinking: Methods in object-oriented programming are


syntactic sugar over this pattern. By always passing the object reference as the first
argument, the method has access to the object’s state and can manipulate or query it.
●​ Polymorphism Layer: When virtual methods or dynamic dispatch come into play, the
process becomes a bit more intricate, as the system (e.g., the C++ runtime) needs to
look up the correct function pointer (from a vtable or similar mechanism). But at its root,
it’s still a function call with an object reference.

This simplicity is what makes the idea of methods elegant: it’s a natural extension of the concept
of a function, tailored to interact with objects!

So, can we get late binding with these pointers to functions?


Late binding refers to the process where the method (or function) to be invoked is determined
at runtime based on the actual type of the object, rather than at compile-time. This allows
different types of objects to implement their own versions of a method, even if they share the
same interface. The code below illustrates how we can simulate late binding in our C example:

645
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Both Animal and Elf share the same interface (Animal struct with inc_age and
get_age function pointers).
●​ However, the behavior of inc_age differs for Animal and Elf because the inc_age
pointer is assigned different functions at runtime (animal_inc_age for Animal and
elf_inc_age for Elf).

Explanation of the Code

1. Struct Definition and Function Pointers

The Animal struct has:

●​ An age field to store the object's state.


●​ Two function pointers, inc_age and get_age, which represent the "virtual methods."

2. Animal and Elf Implementations

●​ animal_inc_age: Increments the age by 1 (for standard animals).


●​ elf_inc_age: Increments the age by 7 (because elves age differently).
●​ Both use the same get_age implementation, which simply returns the current age.

3. Dynamic Assignment

When creating an instance of Animal or Elf:

646
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ new_animal assigns the animal_inc_age function to the inc_age pointer.


●​ new_elf assigns the elf_inc_age function to the inc_age pointer.

At runtime, when you call inc_age, the program dynamically decides which function to execute
based on the specific object type (Animal or Elf).

4. Main Function Demonstration

●​ An Animal instance is created, and its inc_age behaves as expected (increment by 1).
●​ An Elf instance is created, and its inc_age behaves differently (increment by 7),
demonstrating late binding.

How Late Binding Works in This Code

1.​ Common Interface: Both Animal and Elf are represented by the same struct type
(Animal), which acts as a common interface.
2.​ Dynamic Dispatch:
○​ At runtime, the program uses the function pointer stored in the Animal struct to
determine which inc_age method to call.
○​ This decision is not made at compile-time but rather at runtime, based on the
specific object being pointed to (Animal or Elf).
3.​ Flexibility:
○​ This mechanism allows the program to treat both Animal and Elf objects
uniformly (as Animal*) while preserving their distinct behaviors.

Why Late Binding is Useful

Late binding is crucial for implementing dynamic dispatch, which is a key concept in
object-oriented programming. It allows different types of objects to define their own behavior
while still being treated as instances of a common base type. In this example:

●​ The same interface (Animal) can represent different types of animals.


●​ The correct behavior is invoked dynamically based on the actual type of the object,
showcasing the essence of late binding.

So, does this virtual table really take some space then, in the class that
contains a virtual method?
Yes, it does! Consider the example below:

647
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The reason an instance of MyString2 is larger than an instance of MyString1 lies in the
presence of a virtual method and the implementation of virtual tables (vtables) in C++.

What Happens When a Class Has Virtual Methods?

1.​ Virtual Table (vtable):


○​ When a class contains virtual methods, the compiler generates a vtable for that
class.
○​ The vtable is a data structure that stores pointers to the virtual methods of the
class.
2.​ Pointer to the vtable (vptr):
○​ Each instance of a class with virtual methods contains a hidden pointer,
commonly called the vptr.
○​ This pointer points to the vtable of the class and is used at runtime for dynamic
method dispatch.
3.​ Extra Memory for vptr:
○​ The vptr increases the size of an object.
○​ On most platforms, the vptr is typically the size of a pointer (e.g., 8 bytes on a
64-bit system).

Key Observations

1.​ Dynamic Dispatch:


○​ The vptr is used for dynamic method dispatch, allowing the correct
implementation of getLength() to be called even if the object is accessed via a
pointer to a base class.
2.​ No Virtual Method in MyString1:

648
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Since MyString1 has no virtual methods, it doesn’t need a vtable or a vptr, so


its size only reflects its data members.
3.​ Alignment and Padding:
○​ On most systems, the object size is rounded to align with memory boundaries
(e.g., 8-byte alignment on a 64-bit system). However, this doesn't affect the
overall explanation here since both sizes align naturally.

Why Does This Matter?

The presence of virtual methods introduces an overhead to enable dynamic dispatch, but it’s a
tradeoff for the powerful dynamic dispatch mechanism that virtual methods provide. This is why
objects with virtual methods are generally larger than those without.

How is inheritance implemented? I mean, efficiently implemented, as in


C++, Java and such?
There are multiple ways to implement inheritance. But there is a little secret for the efficient
implementation: a shared object layout! Let me explain with a comparison between C++ and a
similar implementation in C, showing how code would be generated for the C++ version:

649
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Shared Layout

In C++:

●​ When a derived class inherits from a base class, the base class's members are placed
at the beginning of the derived class's memory layout. This ensures that the base

650
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

class's fields and methods are at consistent offsets, regardless of whether the object is
accessed as a base class or a derived class.

In the C code:

●​ We manually ensure that Elf starts with the same layout as Animal. This is achieved
by including the same fields (age and method pointers) in Elf as in Animal, which
guarantees that casting between Animal* and Elf* works correctly.

Dynamic Dispatch (Virtual Methods)

In C++:

●​ Dynamic dispatch is achieved through a vtable and vptr mechanism. The base class
has a hidden pointer (vptr) to its vtable, and the derived class overrides the relevant
entries in the vtable.

In your C code:

●​ We achieve dynamic dispatch by explicitly including function pointers (inc_age and


get_age) in the Animal struct. These function pointers can be assigned different
implementations in the base (animal_inc_age) and derived (elf_inc_age)
"classes."

Adding Derived-Class-Specific Methods

In C++:

●​ Derived classes can add new fields and methods while still preserving the base class
layout.

In the C code:

●​ Elf extends Animal by adding new fields (name) and methods (set_name and
get_name). You ensure compatibility by keeping the Animal fields and method pointers
at the same offsets in Elf.

In C++:

●​ The compiler automatically manages the layout of base and derived classes, as well as
dynamic dispatch via vtables.

In C:

●​ We have to manually:

651
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Ensure layout compatibility between base and derived "classes."


○​ Implement dynamic dispatch through explicit function pointers.
○​ Take care of type safety when casting between base and derived types.

Conclusion

Our approach to inheritance in C mimics C++'s behavior quite closely:

●​ Shared layout ensures compatibility.


●​ Function pointers simulate virtual methods and dynamic dispatch.
●​ Additional fields and methods in Elf represent derived-class-specific features.

However, in C++, the compiler automates all of this, making the code cleaner, safer, and more
maintainable!

One of the key features of Object-Oriented Programming is the Liskov


Substitution Principle. What's this?
The Liskov Substitution Principle (LSP) is one of the fundamental principles of
object-oriented programming (OOP). It states:

"Objects of a superclass should be replaceable with objects of a subclass


without altering the correctness of the program."

In simpler terms, if a class S is a subclass of class T, then anywhere you can use an object of
type T, you should be able to use an object of type S without breaking the program.

This principle ensures that inheritance promotes true substitutability, enabling polymorphism and
modular code design.

Key Requirements of LSP

For LSP to hold, the following must be true:

1.​ Behavioral Consistency:


○​ A subclass should maintain the expected behavior of its superclass.
○​ A subclass can add new behaviors but must not violate the behaviors guaranteed
by the superclass.
2.​ No Strengthening Preconditions:
○​ A subclass must not impose stricter requirements on input arguments than its
superclass.
3.​ No Weakening Postconditions:
○​ A subclass must fulfill at least the guarantees of the superclass's methods.
4.​ Invariants:

652
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The subclass must preserve the invariants of the superclass (conditions that
must always be true for an object's state).

Real-World Significance

Adhering to LSP ensures:

1.​ Code Reusability: Subclasses can be used in place of their superclasses without
introducing bugs.
2.​ Polymorphism: The code remains modular and flexible, supporting late binding
effectively.
3.​ Robust Design: It avoids surprises for developers, as derived classes behave
consistently with their base classes.

Violating the Principle

Adhering to the Liskov Substitution Principle (LSP) is not an automatic property of every
class hierarchy. A class hierarchy can fail to meet LSP when a subclass violates the behavioral
expectations established by its superclass. Such violations typically arise when the subclass
alters the semantics of inherited methods or introduces constraints that are incompatible with
the parent class's guarantees, as in the example below:

653
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the example, the Sparrow class adheres to LSP because it maintains the expected behavior
of its superclass: calling fly() on a Sparrow instance produces the same type of behavior
one would expect from any Bird. However, the Penguin class violates LSP because it
redefines fly() in a way that contradicts the behavior promised by the Bird class. Specifically,
instead of performing an action (flying), the method throws an exception. This divergence
means that a Penguin cannot be substituted for a Bird in contexts where the fly() method
is expected to work without failure.

So, this trick of having the layout of the superclass being a prefix of the
layout of the subclass enables the substitution principle, right?
Yes, the idea of having the superclass's layout as a prefix of the subclass's layout is the
cornerstone of how a lower-level implementation of an object-oriented language enables this
form of subtyping.

Subtyping in C Through Struct Layout

In object-oriented programming, subtyping means that an object of a subclass can be used


wherever an object of its superclass is expected. In languages like C++, this is achieved with

654
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

inheritance and polymorphism, supported directly by the language. In your C implementation,


we simulate inheritance and subtyping by carefully designing the layout of structs. In this way,
we can run code like below:

The key idea is structural compatibility: the fields of the superclass (Animal) are arranged as
a prefix of the subclass (Elf). This means the memory layout of an Elf object begins with the
same fields (in the same order) as an Animal. As a result:

1.​ Pointer Compatibility: When a pointer to an Elf is cast to a pointer to an Animal, the
casted pointer points to the Animal-compatible part of the Elf. The fields and method
pointers of Animal remain accessible as if the object were just an Animal.
2.​ Polymorphism Through Function Pointers: In your implementation, each struct
contains function pointers for its methods (e.g., get_age and inc_age). This design
allows overriding superclass methods in the subclass (e.g., Elf overrides inc_age).
When a function like get_age is called on an Animal pointer, it uses the method
pointer stored in the object, whether the object is an Animal or an Elf. This achieves
polymorphic behavior.

Summary

The C implementation achieves subtyping by designing structs with the superclass as a prefix of
the subclass. Function pointers enable polymorphism, allowing subclass methods to override
superclass methods. This design is a simulation of inheritance and demonstrates how principles
from object-oriented programming can be implemented in a procedural language like C.

So, in the end, we can find the target of any method call with two
dereferences. Is that really the case?
This principle is indeed true for statically typed languages that support virtual method tables
(v-tables).

Here’s a breakdown of why this works:

655
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Virtual Table Mechanism

1.​ Class Layout and v-tables:


○​ Each class has a virtual table that contains pointers to its virtual methods. The
table is essentially a static array of function pointers, one for each virtual method.
○​ Objects of the class store a pointer to their class's v-table. This pointer is often
referred to as the v-pointer and is stored at a fixed offset in the object
(commonly the first field).
2.​ Polymorphic Calls:
○​ When you make a polymorphic method call like d.eat(), the compiler translates
it into steps:
1.​ Dereference the v-pointer in the object to get the address of the v-table.
2.​ Use a constant offset (determined at compile time) to index into the
v-table and fetch the method's address.
3.​ Call the method using the retrieved function pointer.

Efficiency

This two-step process ensures that finding the correct function to execute in a polymorphic call
involves just two pointer dereferences:

1.​ The first dereference retrieves the address of the v-table from the object.
2.​ The second dereference retrieves the function pointer from the v-table.

Why This is Constant-Time

●​ The offset for each virtual method in the v-table is fixed at compile time, so there’s no
runtime searching involved.
●​ This constant-time access is why virtual method calls are highly efficient and well-suited
for object-oriented programming.

Example:

656
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure illustrates that:

●​ For d.eat(), the compiler knows the offset of eat in the v-table.
●​ Regardless of whether d is an Animal, Mammal, or Dog, the mechanism ensures the
correct eat implementation is called dynamically (based on the actual type of the
object).

Comparison to Other Mechanisms

If virtual tables weren’t used, polymorphic method calls could require more complex
mechanisms, such as:

●​ Dynamic lookups in a method resolution table.


●​ Explicit runtime type checks.

These approaches would be slower compared to the constant-time v-table method.

Caveat: Indirection Overhead

657
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Although efficient, the two-pointer dereference adds a slight overhead compared to direct calls
(non-virtual methods), which don't require table lookups. However, the trade-off is often
acceptable for the flexibility of polymorphism.

In conclusion, polymorphic method calls in languages that use virtual tables (like C++, Java,
etc.) require just two pointer dereferences, ensuring they remain both flexible and efficient.

How did object-oriented languages were invented?


The invention of object-oriented (OO) programming was a gradual process influenced by ideas
in mathematics, computer science, and software engineering. Here's a concise history of how
OO languages came to be:

1. Roots in Simula and Simulation (1960s)

●​ Simula (1962, Norway):


○​ Created by Ole-Johan Dahl and Kristen Nygaard at the Norwegian Computing
Center.
○​ Designed for simulation programming, it introduced objects as a way to model
"real-world entities" (like customers, banks, or vehicles) in simulations.
○​ Objects combined data (state) and methods (behavior), laying the groundwork
for object-oriented programming.
●​ Core Innovations in Simula:
○​ Classes and Objects: Defined reusable templates (classes) and instances
(objects).
○​ Inheritance: Allowed classes to be extended and specialized.
○​ Encapsulation: Bundled data and behavior within objects, hiding internal details.

2. The Rise of Smalltalk (1970s)

●​ Alan Kay and Smalltalk (1972, Xerox PARC):


○​ Alan Kay, along with others like Dan Ingalls and Adele Goldberg, developed
Smalltalk, the first fully object-oriented language.
○​ Inspired by biological metaphors (cells interacting via messages), Smalltalk
emphasized message passing: objects communicate by sending and receiving
messages.
○​ Smalltalk popularized dynamic typing and the idea that "everything is an
object," including numbers, strings, and even classes.
●​ Key Ideas in Smalltalk:
○​ Polymorphism: Objects respond to the same message differently based on their
class.
○​ Graphical User Interfaces (GUIs): Smalltalk's environment influenced modern
GUIs, including those in macOS and Windows.

658
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Alan Kay coined the term "object-oriented programming" and envisioned OO as a tool for
user empowerment and problem-solving.

3. Influence of Procedural Languages (1970s-1980s)

●​ Many early OO languages blended object-oriented features with procedural


programming.
●​ C++ (1983, Bjarne Stroustrup):
○​ Extended the procedural language C by adding classes, objects, inheritance, and
polymorphism.
○​ Provided static typing and performance close to C, making it suitable for systems
programming.
●​ Objective-C (1984, Brad Cox):
○​ Combined C with Smalltalk-style message passing, influencing macOS and iOS
development.

4. Mainstream Adoption (1990s)

●​ The 1990s saw OO programming become mainstream, with languages like:


○​ Java (1995): Simplified C++ with garbage collection, a virtual machine (JVM),
and a "write once, run anywhere" philosophy.
○​ Python (1991): Made OO accessible by blending it with scripting and dynamic
typing.
○​ Ruby (1995): Focused on simplicity and developer happiness, heavily inspired by
Smalltalk.
●​ OO became the dominant paradigm for large-scale software development, driven by:
○​ Code Reusability: Inheritance and polymorphism allowed better modularity and
reuse.
○​ Encapsulation: Made complex systems easier to manage by hiding
implementation details.

5. OO Meets Functional Programming (2000s and Beyond)

●​ Modern OO languages like Scala, Kotlin, and Swift integrate OO with functional
programming (FP) concepts like immutability and higher-order functions.
●​ Languages like Python and JavaScript offer hybrid paradigms, blending OO,
procedural, and functional styles.

Why Was OO Invented?

The rise of OO programming was driven by the need to:

1.​ Model complex systems in a way that mirrors the real world (e.g., simulations).
2.​ Reduce software complexity by promoting modularity and code reuse.

659
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Enable easier collaboration and maintenance of large codebases.

Alan Kay summarized it best:

"The big idea is 'messaging.' The key in making great and growable systems is to
have the right metaphors."

Final Thought

OO programming evolved from a need to create natural, intuitive models of systems. Over time,
it grew into a flexible paradigm that balances structure and creativity, making it one of the most
influential concepts in software development.

Who invented the term "Object-Oriented" as an adjective for programming


languages?
It seems that the term "object-oriented" was applied to a programming language for the first time
in the MIT CSG Memo 137 (April 1976) by Anita K. Jones and Barbara H. Liskov; a similar text
was published in the IEEE Transactions on Software Engineering later in the year (see see
https://ieeexplore.ieee.org/document/1702384).

So, Barbara Liskov is very important in the history of programming


languages. What did she do?
Barbara Liskov is one of the most influential figures in the history of programming languages
and computer science. Her contributions laid the groundwork for critical concepts in software
engineering and programming language design, many of which we use today. Here's an
overview of her key contributions:

660
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Data Abstraction and CLU (1970s)

●​ What She Did:


○​ Barbara Liskov designed and developed CLU, a programming language created
at MIT in the 1970s.
○​ CLU introduced data abstraction, a revolutionary concept at the time. This
allowed programmers to define abstract data types (ADTs), encapsulating data
and associated operations within a single structure.
●​ Key Innovations in CLU:
○​ Abstract Data Types (ADTs): Enabled programmers to separate an object’s
implementation from its interface, promoting modularity.
○​ Iterators: Introduced the concept of "yield" for iterating over collections, which
later influenced languages like Python.
○​ Parameterized Types (Generics): Allowed functions and data structures to
operate on any type, which inspired later features in Java, C++, and other
modern languages.
○​ Exceptions: CLU included a robust mechanism for handling errors.
●​ Impact:
○​ CLU didn’t gain widespread use but heavily influenced modern object-oriented
languages such as C++, Java, and Python.

661
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The focus on modularity and abstraction became a core principle in software


engineering.

2. The Liskov Substitution Principle (LSP)

●​ What It Is:
○​ LSP, introduced in a 1987 conference paper with Jeanette Wing, states:​
"If S is a subtype of T, then objects of type T may be replaced with objects of type
S without altering the correctness of the program."
●​ Why It Matters:
○​ This principle formalized a key aspect of object-oriented design: ensuring that
subclasses can be substituted for their parent classes without breaking
functionality.
○​ LSP is a cornerstone of object-oriented programming (OOP) and is part of the
SOLID principles, a set of guidelines for designing maintainable software.
●​ Example:
○​ If a class Bird has a method fly, then a subclass Penguin must either support
the concept of fly or not be considered a subclass of Bird. Violating LSP leads
to unexpected behavior.

3. Distributed Systems and Argus

●​ What She Did:


○​ In the 1980s, Liskov developed Argus, a programming language designed for
distributed systems.
○​ Argus extended CLU and introduced constructs to simplify the development of
distributed programs, focusing on fault tolerance and reliability.
●​ Key Features:
○​ Guardians: Encapsulated objects in distributed systems that persisted across
failures.
○​ Atomic Transactions: Enabled systems to maintain consistency in the face of
partial failures, laying the groundwork for modern distributed databases and
systems.
●​ Impact:
○​ Argus was ahead of its time and provided concepts used in contemporary
distributed systems and cloud computing.

4. Awards and Recognitions

●​ Turing Award (2008):


○​ She received the ACM Turing Award, often called the "Nobel Prize of
Computing," for her contributions to programming languages, software
engineering, and distributed systems.
●​ National Academy of Engineering:

662
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Liskov is a member of the NAE, recognizing her as a pioneer in computer


science.
●​ IEEE John von Neumann Medal (2004):
○​ For her fundamental contributions to computer programming languages.

5. Broader Influence

●​ Education:
○​ Liskov is a professor at MIT and has mentored generations of computer
scientists.
●​ Philosophy:
○​ She advocated for modularity, abstraction, and correctness in software
design, which became universal principles in software engineering.

Why She’s Important

Barbara Liskov’s work fundamentally shaped how we design and think about software.
Concepts like modularity, abstraction, OOP principles, and fault tolerance in distributed
systems owe much to her innovations. Even though her languages (CLU, Argus) aren’t widely
used today, their ideas became the bedrock of modern programming languages and practices.

663
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Heap Allocation
Code: https://github.com/pronesto/DCC053_Public/tree/main/21_HeapAllocation

How are variables, in a typical C program, allocated in memory?


In a typical C program, variables can be allocated in three main ways: static, stack, and heap
allocation. Each method has unique characteristics, purposes, and lifetimes:

1. Static Allocation

●​ What it is: Variables are allocated a fixed memory location during program compilation
and remain in that location for the program's entire runtime.
●​ Where it happens: In the data or BSS segments of memory.
●​ Lifetime: The entire program execution.
●​ Scope: Can be local to a file, block, or global (depending on the declaration).

Examples:

Global variables: Declared outside of all functions.​

int globalVar = 10; // Always available during runtime.

Static local variables: Declared with the static keyword inside a function; they retain their
value across function calls:​

void myFunction() {
static int counter = 0; // Retains value between calls.
counter++;
}

●​ Advantages:
○​ Predictable, fixed lifetime.
○​ No dynamic management required.
●​ Disadvantages:
○​ Memory is reserved for the entire program, even if the variable is not actively
used.

2. Stack Allocation

●​ What it is: Variables are allocated in a last-in, first-out (LIFO) manner from the program's
stack when a function is called. They are automatically deallocated when the function
returns.
●​ Where it happens: In the stack segment of memory.
●​ Lifetime: Limited to the scope of the function in which they are declared.

664
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Scope: Local to the block or function.

Examples:

Local variables: Declared inside a function or block without the static keyword.​

void myFunction() {
// Automatically deallocated when function exits.
int localVar = 5;
}

●​ Advantages:
○​ Fast allocation and deallocation (just moving the stack pointer).
○​ Memory is automatically managed.
●​ Disadvantages:
○​ Limited memory size (stack overflow can occur if too many variables are
allocated).
○​ Variables cannot outlive their function scope (no persistence).

3. Heap Allocation

●​ What it is: Memory is manually allocated and freed by the programmer at runtime using
functions like malloc(), calloc(), realloc(), and free().
●​ Where it happens: In the heap segment of memory.
●​ Lifetime: Controlled by the programmer; it persists until explicitly deallocated with
free().
●​ Scope: Determined by the pointer used to access the allocated memory.

Examples:

Dynamic memory allocation for arrays or objects:​

// Allocate space for 10 integers.


int* ptr = malloc(10 * sizeof(int));
if (ptr != NULL) {
ptr[0] = 42; // Use the allocated memory.
free(ptr); // Free memory when done.
}

●​ Advantages:
○​ Flexibility to allocate memory as needed.
○​ Lifetime of variables can extend beyond the function scope that allocated them.
●​ Disadvantages:
○​ Slower than stack allocation due to manual management.

665
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Prone to errors like memory leaks (forgetting to free memory) or dangling


pointers (using freed memory).

Comparison
Aspect Static Allocation Stack Allocation Heap Allocation

Managed by Compiler Compiler/runtime Programmer

Lifetime Entire program Function scope Manual (until free())

Speed Fast Fast Slower (runtime calls)

Use case Persistent variables Temporary/local vars Dynamic memory


needs

Common Fixed size at compile-time Stack overflow Memory leaks


issues

Why do programming languages need a heap?


A primary reason why we need a heap is to allocate memory for data that outlives the functions
or blocks that create it. However, this isn't the only reason. Here's a detailed breakdown of why
the heap is necessary:

1. Data with a Lifetime Beyond Function Scope

When you create local variables using stack allocation, they are automatically deallocated
once the function exits. If you need data that persists after the function ends, you can't use the
stack:

int* createArray() {
// Allocate memory on the heap.
int* arr = malloc(10 * sizeof(int));
// The data persists even after this function exits.
return arr;
}

Without the heap, you wouldn't be able to return dynamically allocated arrays or objects.

2. Dynamically Sized Data

The size of stack-allocated variables must generally be known at compile time (though some
compilers support Variable Length Arrays). For cases where the size of the data is determined
at runtime, the heap is essential.

666
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

int n;

// Read size at runtime.


scanf("%d", &n);
// Heap allocation for dynamic size.
int* arr = malloc(n * sizeof(int));

3. Sharing Data Across Different Scopes

Heap-allocated memory allows data to be shared between functions, threads, or even across
program components. As long as you have a reference (pointer) to the memory, you can access
it:

void setData(int** ptr) {


// Allocate memory on the heap.
*ptr = malloc(sizeof(int));
**ptr = 42;
}

void useData() {
int* data;
setData(&data);
// Data persists even after `setData` exits.
printf("%d\n", *data);
free(data);
}

4. Handling Large Data

The stack is limited in size and intended for small, short-lived data structures. Allocating large
arrays or objects on the stack risks stack overflow. Heap allocation enables handling larger data
structures without exhausting stack memory:

// Use heap for large arrays.


double* largeArray = malloc(1e6 * sizeof(double));

5. Flexible Memory Management

The heap offers the programmer full control over the lifetime of the data. You can allocate
memory when needed and free it when no longer required, optimizing memory usage.

for (int i = 0; i < 100; i++) {


// Allocate memory for each iteration.
char* buffer = malloc(256);
// Use buffer...

667
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

free(buffer); // Free memory when done.


}

Conclusion

The heap complements the stack by allowing for:

1.​ Persistent data beyond function scope.


2.​ Dynamically sized and large data structures.
3.​ Shared data across different program components.
4.​ Flexible, on-demand memory management.

Without the heap, many complex programs (like those managing dynamic data structures,
persistent objects, or large datasets) would be much harder to write efficiently!

If the heap can do so much, why not use it exclusively?


While the heap is flexible, it has disadvantages:

●​ Slower allocation/deallocation: Heap allocation involves system calls and is less


efficient than stack allocation.
●​ Memory leaks: Forgetting to free() allocated memory results in leaks, which can
degrade performance over time.
●​ Fragmentation: Repeated allocations and deallocations can fragment memory, reducing
available contiguous memory blocks.
●​ Complexity: Managing heap memory correctly requires discipline and careful coding.

Can you write a simple C program that illustrates how the heap is
managed?
Sure. Here's a very simple implementation of the heap, which assumes that every word is
eight-bit long:

#include <stdio.h>
#include <stdint.h>

#define HEAP_SIZE 128 // Total size of the simulated heap.

static uint8_t heap[HEAP_SIZE]; // The heap memory pool.

// Metadata for each memory block.


typedef struct BlockHeader {
uint8_t size; // Size of the block (excluding header).
uint8_t is_free; // Whether the block is free.
uint8_t next; // Offset to the next block in the heap.
} BlockHeader;

668
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

// Pointer (offset) to the start of the free list.


static uint8_t free_list = 0; // Start at the beginning of the heap.

// Initialize the heap.


void init_heap() {
BlockHeader* initial = (BlockHeader*)heap;
initial->size = HEAP_SIZE - sizeof(BlockHeader);
initial->is_free = 1;
initial->next = 0; // End of the list.
}

// malloc implementation.
void* my_malloc(uint8_t size) {
uint8_t current_offset = free_list;

while (current_offset < HEAP_SIZE) {


BlockHeader* current = (BlockHeader*)&heap[current_offset];

if (current->is_free && current->size >= size) {


// Found a suitable block.
uint8_t remaining = current->size - size;

if (remaining > sizeof(BlockHeader)) {


// Split the block.
uint8_t new_offset = current_offset + sizeof(BlockHeader) + size;
BlockHeader* new_block = (BlockHeader*)&heap[new_offset];
new_block->size = remaining - sizeof(BlockHeader);
new_block->is_free = 1;
new_block->next = current->next;

current->next = new_offset;
current->size = size;
}

current->is_free = 0; // Mark the block as allocated.


return &heap[current_offset + sizeof(BlockHeader)];
}

current_offset = current->next;
if (current_offset == 0) {
break; // Reached the end of the list.
}
}

// No suitable block found.


return NULL;
}

669
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

// free implementation.
void my_free(void* ptr) {
if (!ptr) {
return;
}

uint8_t block_offset = (uint8_t*)ptr - heap - sizeof(BlockHeader);


BlockHeader* block = (BlockHeader*)&heap[block_offset];
block->is_free = 1;
}

// Debug function to print the heap state.


void print_heap() {
uint8_t current_offset = free_list;

printf("Heap state:\n");
while (current_offset < HEAP_SIZE) {
BlockHeader* current = (BlockHeader*)&heap[current_offset];
printf("Block at offset %u: size=%u, is_free=%d, next=%u\n",
current_offset, current->size, current->is_free, current->next);

current_offset = current->next;
if (current_offset == 0) {
break; // End of the list.
}
}
}

// Debug function to print the heap state.


void dump() {
printf("Heap State:\n");
for (int i = 0; i < HEAP_SIZE; i += 8) {
for (int j = i; j < i+8; ++j) {
printf("%4d: %4d, ", j, heap[j]);
}
printf("\n");
}
}

Seems complicated. Can you explain how this code works?


This program implements a basic memory allocator using a heap represented as a fixed-size
array. It introduces concepts such as pointers, block metadata, and how memory is managed
through a linked list of free blocks. Here, we will explain the key principles behind the allocator
and its operations: initialization, allocation (my_malloc), freeing (my_free), and how the
heap's state is organized and visualized.

The Heap and Metadata

670
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The heap is simulated as a fixed-size array of bytes (heap). Each memory block in the heap
has a header (BlockHeader), which stores:

1.​ size: The size of the block's usable space (in bytes), excluding the header.
2.​ is_free: A flag indicating whether the block is free (1) or allocated (0).
3.​ next: The offset (relative to the start of the heap) of the next block in the list.

This metadata helps organize the heap into a linked list of blocks.

Pointers and Linked List of Blocks

In this context, a "pointer" is represented by an offset into the heap array. The allocator starts
with a free_list, which is the offset of the first block. Each block header has a next field that
points to the next block's offset, forming a linked list.

By traversing this list, the allocator finds free blocks or reaches the end (next = 0).

Initializing the Heap (init_heap)

When the program begins, the entire heap is one large free block. The init_heap function
creates this initial block by:

1.​ Setting its size to the total heap size minus the size of the header.
2.​ Marking the block as free (is_free = 1).
3.​ Setting the next field to 0, indicating the end of the list.

Allocating Memory (my_malloc)

The my_malloc function finds a suitable block to allocate memory and optionally splits the
block into smaller ones:

1.​ Search for a free block: It traverses the linked list of blocks using the next pointer,
checking each block's is_free flag and size.
2.​ Block splitting: If the block is larger than the requested size by at least the size of a
header, the block is split:
○​ The first part is allocated and updated with the requested size.
○​ The remaining portion becomes a new free block with its own header.
3.​ Marking the block: The selected block is marked as allocated (is_free = 0).
4.​ Return pointer: The function returns a pointer to the usable memory (the address after
the header).

If no suitable block is found, NULL is returned.

Freeing Memory (my_free)

671
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The my_free function marks a block as free by:

1.​ Calculating the block's offset in the heap from the pointer given to the function.
2.​ Setting the block's is_free flag to 1.

This enables future allocations to reuse the block.

Debugging and Visualization

Two utility functions help visualize the heap state:

1.​ print_heap traverses the list and prints the properties (size, is_free, next) of each
block. This is useful to understand the current organization of the heap.
2.​ dump displays the raw byte values of the heap array in chunks, offering a low-level view
of the memory state.

Important Principles

1.​ Block Splitting: Efficient memory usage is achieved by splitting large free blocks when
the requested size is smaller.
2.​ Linked List Traversal: Allocation relies on traversing the linked list, which may lead to
performance issues if the list becomes long.

This simple allocator demonstrates the core ideas behind dynamic memory management, laying
the foundation for more sophisticated techniques like best-fit or buddy allocation algorithms.

Geez! What in the name of heavens is happening in (BlockHeader


*)&heap[current_offset?
That's C at its finest hour! Let’s break down the code and address this assignment step by step:

1. What happens in (BlockHeader *)&heap[current_offset]?

This operation involves several steps:

1.​ Array indexing: heap[current_offset] accesses the element at index


current_offset in the heap array.
○​ heap is declared as a uint8_t array, so each element is a single byte.
2.​ Address-of operation: &heap[current_offset] takes the address of the element at
the specified index. This gives a pointer to the byte at position current_offset in the
array.
3.​ Pointer casting: (BlockHeader *) casts the address to a pointer of type
BlockHeader.

672
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

As a result, (BlockHeader *)&heap[current_offset] treats the memory starting at


heap[current_offset] as if it were the start of a BlockHeader structure.

2. Why is this cast allowed?

In C, casting between pointer types (e.g., from uint8_t * to BlockHeader *) is allowed.


The programmer is effectively instructing the compiler to interpret the memory at that address as
a BlockHeader.

However, it’s the programmer’s responsibility to ensure that:

●​ The memory at that location is properly aligned for the type being cast to.
●​ There is enough space at that address to hold the entire BlockHeader structure.

3. Is this safe?

This casting can be safe if the following conditions are met:

●​ Alignment: The memory address &heap[current_offset] must be aligned


according to the alignment requirements of BlockHeader. Misaligned access can lead
to undefined behavior, especially on architectures that require strict alignment.
○​ On most modern architectures, alignment is typically determined by the largest
primitive in the structure. For example, if BlockHeader contains only uint8_t
fields, it is typically byte-aligned and safe in this context. However, if it contained
larger fields (e.g., uint32_t), stricter alignment would be required.
●​ Bounds: The program must ensure there are enough bytes remaining in the heap array
at current_offset to store the entire BlockHeader structure. If current_offset
+ sizeof(BlockHeader) exceeds HEAP_SIZE, this results in a buffer overflow,
leading to undefined behavior.

4. Is this verified?

Well, not by the language runtime! Neither of the following properties are verified by the code
the C compiler generates:

●​ The alignment of heap[current_offset] matches the requirements of


BlockHeader.
●​ There is sufficient space in heap to safely read/write a BlockHeader.

Verifying these properties is the responsibility of the programmer. Proper implementations would
include checks for these conditions before performing the cast or accessing memory.

5. Order of operations in (BlockHeader *)&heap[current_offset]

673
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The operations are performed in this order:

1.​ Array indexing (heap[current_offset]): This calculates the memory address of the
current_offsetth byte in the array.
2.​ Address-of operation (&heap[current_offset]): This gives the address of the byte
at the computed index.
3.​ Casting ((BlockHeader *)): This tells the compiler to interpret the address as
pointing to a BlockHeader.

6. What happens if there are not enough bytes for the struct?

If there are not enough bytes in the array starting at current_offset to cover the
BlockHeader structure, the code will cause undefined behavior when attempting to access
memory beyond the bounds of the array. This could manifest as:

●​ Reading or writing unintended memory (buffer overflow).


●​ Crashes due to accessing invalid memory.
●​ Corrupting unrelated program data.

7. Best practices to avoid problems

To make this code safer:

●​ Bounds checking: Before casting, ensure current_offset +


sizeof(BlockHeader) <= HEAP_SIZE.
●​ Alignment checking: Ensure the address &heap[current_offset] is correctly
aligned for BlockHeader. If necessary, adjust current_offset to align it properly.
●​ Clear documentation: Indicate that heap is intended to store a sequence of
BlockHeader structures, making the purpose of the cast explicit.

For example:

if (current_offset + sizeof(BlockHeader) > HEAP_SIZE) {


// Handle error: not enough space
return NULL;
}

// Assuming alignment of BlockHeader is guaranteed or aligned storage is used


BlockHeader *current = (BlockHeader *)&heap[current_offset];

In summary, the cast is permissible in C but relies on assumptions about alignment and bounds
that are not inherently checked by the compiler. Proper verification is necessary to ensure safety
and correctness.

674
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you go over a simple example of allocations and deallocations?


Sure! Consider the following program:

int main() {
init_heap();
void* a = my_malloc(20);
void* b = my_malloc(30);
my_free(a);
void* c = my_malloc(10);
my_free(b);
}

This figure illustrates the state of the heap after executing the sequence of memory allocation
and deallocation operations in the provided program. Each subfigure represents a snapshot of
the heap at different points in the program's execution.

675
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Initial State

676
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The first subfigure shows the state of the heap after the init_heap function is called. At this
point:

●​ The entire heap is represented as a single free block.


●​ The block has a size of 125 bytes, marked as free (is_free = 1), and its next pointer
is 0, indicating the end of the list.
●​ The free list (FL) points to the start of this block.

After Allocating 20 Bytes (my_malloc(20))

In the second subfigure:

●​ A block of 20 bytes is allocated from the beginning of the heap.


●​ The allocated block is marked as not free (is_free = 0) and has its size set to 20
bytes.
●​ The remaining portion of the heap is split into a new free block of size 102 bytes. This
block is added to the free list and updated to reflect its is_free = 1 and next = 0.

The free list (FL) now points to the free block starting at the offset after the allocated block.

After Allocating 30 Bytes (my_malloc(30))

In the third subfigure:

●​ The allocator traverses the free list to find a block that can satisfy the request.
●​ The free block of 102 bytes is split into an allocated block of 30 bytes and a new free
block of size 69 bytes.
●​ The allocated block is marked as not free (is_free = 0), while the remaining free
block is added back to the free list.

The free list (FL) points to the newly created free block starting at the offset after the allocated
block.

After Freeing the First Block (my_free(a))

In the fourth subfigure:

●​ The block that was allocated with 20 bytes is returned to the free list.
●​ Its is_free field is updated to 1, marking it as free.
●​ The block is now part of the free list, linked to the remaining free blocks in the heap.

The free list (FL) points to the newly freed block, which links to the next free block in the list.

677
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

After Allocating 10 Bytes (my_malloc(10))

In the fifth subfigure:

●​ The allocator searches the free list for a block large enough to accommodate 10 bytes.
●​ The block of 20 bytes (previously freed) is split into an allocated block of 10 bytes and a
free block of 7 bytes.
●​ The allocated block is marked as not free (is_free = 0), while the remaining portion
remains free.

The free list now begins at the free block of 7 bytes, which links to the next free block in the
heap.

After Freeing the Second Block (my_free(b))

In the final subfigure:

●​ The block of 30 bytes is returned to the free list.


●​ Its is_free field is updated to 1, marking it as free.
●​ The free block is added back to the list and linked to the other free blocks.

At this point, the free list contains three free blocks: 7 bytes, 56 bytes, and 69 bytes. The
allocator could potentially coalesce adjacent free blocks into larger ones to reduce
fragmentation, though this implementation does not include that feature.

One of the problems with this implementation of a memory manager is


"fragmentation". What is this?
Fragmentation in memory management refers to a situation where the heap becomes
inefficiently utilized, making it difficult to allocate new memory blocks even though there might
be sufficient total free memory. There are two primary types of fragmentation: internal
fragmentation and external fragmentation.

1. Internal Fragmentation

●​ Definition: Internal fragmentation occurs when the allocated memory block is larger than
the memory actually requested by the program. The "extra" memory inside the block is
wasted because it cannot be used for any other purpose.
●​ Cause in this system: If the allocator rounds up requests to the nearest block size or
includes metadata inside the block (e.g., size, is_free, next), this can lead to
unused space within an allocated block. However, we do not face internal fragmentation
in our implementation of the heap, because blocks are sized exactly to match the
request plus metadata. In other words, our implementation does not keep memory
addresses aligned in memory.

678
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. External Fragmentation

●​ Definition: External fragmentation occurs when free memory is scattered in small,


non-contiguous blocks across the heap. Even though the total amount of free memory
might be enough to fulfill a large allocation request, no single free block is large enough
to satisfy it.
●​ Cause in this system: As memory blocks are allocated and freed, the heap becomes
fragmented because free blocks are not merged (or "coalesced") when adjacent blocks
are freed. Over time, this can lead to many small free blocks scattered throughout the
heap, as shown in the final subfigure.

For example:

●​ In the final state of the heap in the figure, the free list contains blocks of 7 bytes, 56
bytes, and 69 bytes.
●​ If a request for 80 bytes is made, it will fail despite there being 132 bytes of free memory
in total. None of the individual free blocks is large enough to satisfy the request.

Why is Fragmentation a Problem?

1.​ Wasted Memory: Fragmentation reduces the effective usable memory. Even though the
heap has free space, the allocator might not be able to use it efficiently.
2.​ Allocation Failures: Programs may fail to allocate memory for large requests, even
when enough total free memory exists.
3.​ Performance Overhead: Searching through a fragmented free list can increase
allocation time.

How to Address Fragmentation

●​ Coalescing: Merge adjacent free blocks into a single larger block whenever possible.
For example, if two free blocks are adjacent in memory, they can be combined to reduce
fragmentation.
●​ Compaction: Move allocated blocks closer together to create larger contiguous free
blocks. However, this is usually impractical for heaps due to the complexity of moving
allocated memory.
●​ Better Allocation Strategies: Use algorithms like best fit (allocate the smallest block
that fits the request) or buddy allocation (divide memory into power-of-2 sizes) to reduce
fragmentation.

By addressing fragmentation, the heap system can maintain better utilization and serve larger
allocation requests more effectively.

679
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How can we add coalescing to our simple heap?


To implement coalescing, we need to modify the my_free function so that it merges adjacent
free blocks. This involves checking the blocks before and after the block being freed and
combining them into a single larger block if they are also free.

Here's the updated code with coalescing support:

// free implementation with coalescing.


void my_free(void* ptr) {
if (!ptr) {
return;
}

uint8_t block_offset = (uint8_t*)ptr - heap - sizeof(BlockHeader);


BlockHeader* block = (BlockHeader*)&heap[block_offset];
block->is_free = 1;

// Attempt to coalesce with the next block.


if (block->next != 0) {
BlockHeader* next_block = (BlockHeader*)&heap[block->next];
if (next_block->is_free) {
block->size += sizeof(BlockHeader) + next_block->size;
block->next = next_block->next;
}
}
}

What Changed?

1.​ Coalescing with the Next Block:


○​ After marking a block as free, check if its next block is also free. If so, merge the
two blocks by combining their sizes and updating the next pointer.

Benefits of Coalescing

●​ Reduces external fragmentation by combining adjacent free blocks.


●​ Increases the likelihood of satisfying larger allocation requests.

This implementation ensures that the heap remains efficient even after multiple allocation and
free operations.

Can you show a simple example of coalescing in action?


Sure, the figure below illustrates the three last effects that the program below performs on the
heap:

680
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

int main() {
init_heap();
void* a = my_malloc(20);
void* b = my_malloc(30);
// First figure:
my_free(a);
// Second figure:
void* c = my_malloc(10);
// Third figure:
my_free(b);
return 0;
}

The provided figure shows the state of the heap memory at three different stages during the
execution of the program. The changes are a result of allocating and freeing memory blocks
using the given my_malloc and my_free functions. Below is an explanation of each stage:

First Figure:

●​ Operation: my_free(a)
●​ The block allocated for a (size 20) is marked as free.
●​ However, the coalescing does not occur because the next block (allocated for b, size 30)
is not free.
●​ Observation: Coalescing only happens if the next block is free, which prevents
combining these two blocks.

Second Figure:

681
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Operation: my_malloc(10)
●​ A new block is allocated, splitting the previously freed block (size 20).
●​ The block is divided into two parts:
○​ A new allocated block (size 10).
○​ A remaining free block (size 10 + metadata overhead).
●​ The b block remains unchanged.

Third Figure:

●​ Operation: my_free(b)
●​ The block allocated for b (size 30) is marked as free.
●​ The allocator now attempts to coalesce this block with the next free block (size 10).
●​ Observation: Coalescing succeeds in this case because the next block is free. The two
blocks are merged into a single free block (size 30 + 10 + metadata).

Key Takeaway:

The figure highlights a limitation of the coalescing logic:

●​ In the first figure, no coalescing occurs when a is freed because the subsequent block
(b) is not free.
●​ The design simplifies coalescing by only considering adjacent free blocks (the current
block and its next neighbor). However, it can lead to fragmentation if there are
interleaving allocations and free blocks in the heap.

Could you coalesce also with the previous free block?


We can do that, but this coalescing operation would involve traversing the list of free blocks until
we find the immediate predecessor of the freed block. Remember: we are working with a singly
linked list! So, we can try some simple code, but that wouldn't be the smartest move at this
point:

// free implementation with coalescing.


void my_free(void* ptr) {
if (!ptr) {
return;
}

uint8_t block_offset = (uint8_t*)ptr - heap - sizeof(BlockHeader);


BlockHeader* block = (BlockHeader*)&heap[block_offset];
block->is_free = 1;

// Attempt to coalesce with the next block.


...

682
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

// Attempt to coalesce with the previous block.


uint8_t current_offset = free_list;
while (current_offset < HEAP_SIZE) {
BlockHeader* current = (BlockHeader*)&heap[current_offset];

if (current->next == block_offset) {
// Found the previous block.
if (current->is_free) {
current->size += sizeof(BlockHeader) + block->size;
current->next = block->next;
}
break;
}

current_offset = current->next;
if (current_offset == 0) {
break;
}
}
}

Would it be possible to move coalescing to the language runtime?


Absolutely! But what is a "language runtime", first of all? The term "language runtime" refers to
the software layer responsible for executing a program written in a particular programming
language. It provides essential services such as memory management, garbage collection,
thread scheduling, and other tasks that abstract low-level hardware details from the
programmer.

In this context, moving memory coalescing to the language runtime means shifting the
responsibility for optimizing memory layout (e.g., merging free blocks to reduce fragmentation)
from user-defined functions like my_free to the runtime system of the programming language.

What does "language runtime" entail?

1.​ Execution Environment:


○​ It’s the environment in which the compiled or interpreted program runs.
○​ Examples include the Java Virtual Machine (JVM) for Java, the Common
Language Runtime (CLR) for C#, and the CPython runtime for Python.
2.​ Managed Memory:
○​ Many modern runtimes include automated memory management features such
as garbage collection (GC) and heap compaction.

683
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Memory coalescing would be part of the runtime's strategy for managing memory
efficiently without programmer intervention.
3.​ Abstraction Layer:
○​ The runtime provides an abstraction over the underlying system, making
programs portable and less dependent on platform-specific details.

Memory Coalescing in the Runtime

If memory coalescing were handled by the runtime:

●​ The runtime would periodically scan and optimize the memory heap (e.g., by combining
adjacent free blocks).
●​ This could be done:
○​ During garbage collection, where unused memory is reclaimed.
○​ When memory allocation fails due to fragmentation.
○​ At specific intervals or idle periods, as decided by the runtime's internal
heuristics.

Examples of Runtimes Handling Memory Coalescing

1.​ Managed Runtimes:


○​ JVM (Java):
■​ The JVM’s garbage collector performs coalescing and compaction as part
of its memory management routine.
○​ .NET CLR (C#):
■​ Coalescing happens automatically during heap defragmentation, which is
part of the garbage collector’s duties.
2.​ Languages without Built-in GC:
○​ In languages like C and C++, there is no runtime that handles memory
automatically. Instead, developers rely on tools like malloc and free or
implement their own memory allocators (like your example).

In Our Case

If memory coalescing were moved to the runtime:

●​ Your my_free function would no longer need to worry about coalescing memory
immediately.
●​ The runtime (a simulated one for teaching purposes) would handle coalescing at its
discretion, through a function like your coalesce_memory.

This approach ties into the concept of managed runtimes like the JVM, which students will
likely encounter in high-level languages, while still giving them hands-on experience with
low-level memory concepts in C.

684
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

So, some languages have a heavy runtime, like Java, and others have a
light runtime, like C?
Exactly! Programming languages differ in the complexity and functionality of their runtime
environments, which can be classified as heavy or light based on the responsibilities they take
on. Let's break this down:

Heavy Runtimes (e.g., Java, Python, C#)

These languages come with a runtime environment that manages many aspects of program
execution for the developer. This leads to easier programming but at the cost of performance
overhead and complexity.

Features of Heavy Runtimes:

1.​ Memory Management:


○​ Automated garbage collection (e.g., JVM for Java, CLR for C#).
○​ Coalescing and defragmentation of memory are often part of garbage collection.
2.​ Dynamic Features:
○​ Support for reflection, runtime type checking, dynamic typing (Python, Ruby).
○​ Just-In-Time (JIT) compilation for optimization during program execution.
3.​ Error Handling:
○​ Exception handling integrated into the runtime.
○​ Runtime checks for null pointers, out-of-bounds access, etc.
4.​ Portability:
○​ Abstracts hardware specifics, allowing programs to run on multiple platforms
without modification (e.g., Java’s "Write Once, Run Anywhere").
5.​ Concurrency Management:
○​ High-level support for multithreading and concurrency (e.g., managed thread
pools in Java and Python).

Examples of Heavy Runtimes:

●​ Java: The JVM performs garbage collection, JIT compilation, and memory optimization.
●​ Python: The CPython runtime interprets code, handles garbage collection, and
manages dynamic typing.
●​ C#: The CLR provides garbage collection, JIT compilation, and advanced runtime
features like LINQ.

Light Runtimes (e.g., C, C++)

These languages have minimal runtime environments, providing low-level control and
requiring the programmer to manage most aspects of program execution.

Features of Light Runtimes:

685
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Minimal Overhead:


○​ No garbage collection or automated memory management.
○​ Direct access to hardware and system resources.
2.​ Manual Memory Management:
○​ Programmers must allocate (malloc) and deallocate (free) memory explicitly.
3.​ Performance-Oriented:
○​ Fewer abstractions mean faster execution, making these languages suitable for
systems programming, embedded systems, and real-time applications.
4.​ Limited Built-In Features:
○​ Error checking, concurrency, and dynamic features must often be implemented
by the programmer or through libraries.
5.​ Closer to the Hardware:
○​ The runtime is typically just the C standard library (e.g., libc) and minimal
startup code, making it much lighter.

Examples of Light Runtimes:

●​ C: The runtime is little more than libc and basic startup code.
●​ C++: Adds slightly more runtime overhead than C (e.g., dynamic_cast, exceptions,
virtual function tables), but still lightweight compared to Java or Python.

Comparison
Aspect Heavy Runtime (e.g., Java) Light Runtime (e.g., C)

Memory Automated (e.g., garbage collection). Manual (malloc/free).


Management

Error Checking Runtime exceptions, type safety Minimal, programmer's


checks. responsibility.

Portability Highly portable Platform-dependent; compiled


(platform-independent runtimes). binaries.

Performance Slower due to runtime overhead. Faster, but requires careful


coding.

Complexity Larger and more complex runtime Minimal runtime with low-level
systems. control.

Abstractions High-level, hides system details. Low-level, exposes hardware


details.

How Does This Relate to Our Example?

686
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ In C, we implemented memory management (my_free and coalesce_memory)


ourselves. This is because C's runtime is light and doesn’t handle these tasks
automatically.
●​ In a managed language like Java, memory management (including coalescing) would
be hidden from us and handled by the runtime.

Why Choose One Over the Other?

●​ Heavy runtimes are ideal for:


○​ Rapid development.
○​ Applications where developer productivity and safety matter more than raw
performance (e.g., web apps, enterprise software).
●​ Light runtimes are ideal for:
○​ Applications that require full control over memory and system resources (e.g.,
operating systems, game engines, embedded systems).

By learning about memory coalescing and runtime management in C, we are gaining insight into
how lightweight runtimes work and laying the groundwork for understanding how heavy
runtimes automate these processes.

How would be an implementation of memory coalescing in the language


runtime?

Implementing a coalesce_memory function that traverses the list of blocks and performs
coalescing when possible is a great way to simulate the behavior of a managed runtime. This
"deferred" approach mirrors how garbage collectors in managed runtimes (e.g., Java Virtual
Machine or .NET) often perform compaction or defragmentation in the background.

Below is an example implementation for your coalesce_memory function:

Updated Code
void coalesce_memory() {
// Start from the beginning of the heap.
BlockHeader* block = (BlockHeader*)heap;

while (block->next != 0) { // Traverse until the last block.


BlockHeader* next_block = (BlockHeader*)&heap[block->next];

if (block->is_free && next_block->is_free) {


// Coalesce current block with the next block.
block->size += sizeof(BlockHeader) + next_block->size;
block->next = next_block->next; // Link to the next-next block.
} else {
block = next_block; // Move to the next block.

687
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

}
}
}

Key Concepts for Teaching

1.​ Managed Runtimes:


○​ This function simulates what happens in a managed runtime where background
processes optimize memory (e.g., heap compaction).
○​ Explain to students that in a real-world setting, this routine would be invoked
periodically by the runtime environment, not manually by the programmer.
2.​ Deferred Coalescing:
○​ Unlike immediate coalescing in my_free, the coalescing is deferred until
coalesce_memory is explicitly called.
○​ This demonstrates the trade-off between performance and memory optimization,
as immediate coalescing can be computationally expensive.
3.​ Traversal of Free List:
○​ Highlight how the linked structure of blocks enables traversal and manipulation.
○​ This is an excellent opportunity to explain fragmentation and how coalescing
reduces it.
4.​ Garbage Collection Analogy:
○​ You can compare this function to the "mark-and-sweep" phase of garbage
collectors, where memory regions are inspected and optimized.

Example Usage
int main() {
init_heap();
void* a = my_malloc(20);
void* b = my_malloc(30);

my_free(a); // Mark block 'a' as free.


my_free(b); // Mark block 'b' as free, but no coalescing happens
yet.

// Call the coalescing function to optimize memory.


// This code would be invoked by the language
// runtime, without human intervention. Here, we
// are calling it explicitly just as an example.
coalesce_memory();

return 0;
}

Pros and Cons of Deferred Coalescing

688
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Pros:
○​ Simplifies my_free.
○​ Coalescing happens only when necessary, reducing overhead during frequent
malloc/free operations.
●​ Cons:
○​ Increased fragmentation until coalesce_memory is called.
○​ The runtime requires additional logic to decide when to invoke coalescing.

This deferred coalescing mechanism provides an excellent teaching tool to demonstrate


memory management trade-offs in both low-level systems and high-level managed runtimes!

Can you show an example of this managed version of memory coalescing


in action?
The figure below demonstrates how memory coalescing works to combine adjacent free
memory blocks into a single larger block, optimizing the use of the heap:

689
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Initial State (Top Left):


○​ The memory heap is represented as a linked list of blocks.
○​ Each block has a size, an allocated flag (1 for allocated, 0 for free), and a
pointer to the next block.
2.​ After Calling my_free(b) (Top Right):
○​ The block b (of size 20) is marked as free (allocated = 0).
○​ However, no merging (coalescing) happens yet, so the heap still contains
separate free and allocated blocks.
3.​ After Allocating 10 Bytes with my_malloc(10) (Bottom Left):
○​ A new block of size 10 is allocated.
○​ Since no coalescing has occurred yet, the heap remains fragmented with free
blocks scattered between allocated blocks.
4.​ After Calling coalesce_memory() (Bottom Right):
○​ The function traverses the list of blocks and merges adjacent free blocks into a
single, larger block.
○​ In this case, the blocks with sizes 20 and 69 are merged into a single free block
of size 112.
○​ This process reduces fragmentation and makes more contiguous memory
available for future allocations.

Key Lesson:

●​ Coalescing is a vital step in memory management to prevent fragmentation, especially in


manual memory management systems like those in C or low-level environments. In
managed runtimes, similar behavior is typically handled by garbage collectors.

I would like to take an actual look into an implementation of malloc and


free. Where can I find these implementations?

You can find the source code for the malloc and free functions in the GNU C Library (glibc).
This library is a critical component of most Linux systems, providing essential system functions
like memory allocation.

Here's how you can access the source code:

1.​ Download glibc Source Code:


○​ You can download the latest version of glibc from the GNU website or directly
clone the glibc repository from GitHub: https://github.com/lattera/glibc
2.​ Locate the malloc.c and free.c Files:
○​ Once you've downloaded or cloned the repository, navigate to the malloc
directory.
○​ You'll find the implementation details of malloc and free in the malloc.c and
free.c files, respectively.

690
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Understanding the Implementation:

While the exact implementation details can be complex, the core idea behind malloc and free
is to manage a heap, a region of memory that can be dynamically allocated and deallocated.

Here's a simplified overview:

1.​ malloc:
○​ When you call malloc(size), it searches the heap for a free block of memory
that is large enough to accommodate the requested size.
○​ If a suitable block is found, it is allocated to the caller, and the remaining free
space (if any) is marked as available for future allocations.
○​ If no suitable block is found, the system may request more memory from the
operating system.
2.​ free:
○​ When you call free(ptr), it marks the memory block pointed to by ptr as free.
○​ This freed block may be merged with adjacent free blocks to form a larger free
block, improving memory utilization.

A Note on Complexity:

The actual implementation of malloc and free in glibc is significantly more intricate, involving
various memory management techniques like:

●​ Memory Pooling: Dividing the heap into smaller pools to optimize allocation and
deallocation.
●​ Coalescing: Merging adjacent free blocks to reduce fragmentation.
●​ Memory Alignment: Ensuring that allocated blocks are aligned to specific boundaries to
optimize performance.

One problem with unmanaged programming languages like C and C++ is


the possibility of memory error. Can you give me examples?
Unmanaged memory systems, like in C or C++, require the programmer to explicitly allocate
and deallocate memory using functions like malloc and free. While this gives control over
memory, it introduces significant risks:

Memory Leaks:

1.​ A memory leak happens when allocated memory is not properly freed after it is no
longer needed.

void problem() {
int* i = (int*) malloc(sizeof(int));
*i = 3;

691
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

printf("%d\n", *i);
}

2.​ The malloc allocates memory, but there’s no free call to release it. Each time
problem() is called, more memory is allocated and never returned, eventually
exhausting the available memory.

Dangling Pointers:

3.​ A dangling pointer occurs when memory is freed but the pointer still refers to that
memory location, leading to unpredictable behavior if it is accessed again.

void dangling() {
int* i = (int*) malloc(sizeof(int));
int* j;
*i = 3;
free(i);
j = (int*) malloc(sizeof(int));
*j = 8;
printf("%d\n", *i);
}

4.​ The free(i) releases the memory allocated for i, but the pointer i still points to that
memory. When j is allocated, it reuses the same memory location. Writing to j
overwrites i, causing undefined behavior.

So, what are the advantages and disadvantages of using a managed


memory system?

Managed Memory Systems: Advantages

Managed runtimes (e.g., Java Virtual Machine, Python interpreter) handle memory allocation
and deallocation automatically. This approach offers several advantages:

1.​ Eliminates Common Memory Bugs:


○​ No Memory Leaks: Managed runtimes use garbage collection to track and
reclaim unused memory automatically, ensuring memory is released when it’s no
longer referenced.
○​ No Dangling Pointers: Once memory is freed by the garbage collector, pointers
to that memory are no longer valid or accessible.
2.​ Simplified Programming:
○​ Developers can focus on writing logic without worrying about manually managing
memory.
○​ This reduces cognitive load and accelerates development.

692
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Improved Stability:


○​ Managed runtimes prevent issues like double free() or use-after-free errors,
making programs more robust and less prone to crashing.

Managed Memory Systems: Disadvantages

While managed runtimes solve many problems, they introduce trade-offs:

1.​ Performance Overhead:


○​ Garbage collection adds runtime overhead as it periodically scans and manages
memory. This can lead to pauses in program execution (known as
"stop-the-world" pauses).
○​ The performance might be less predictable, especially in latency-sensitive
applications.
2.​ Reduced Control:
○​ Developers have less control over when and how memory is freed. This lack of
precision can sometimes lead to higher memory usage compared to manual
memory management.
3.​ Increased Complexity in the Runtime:
○​ The runtime system itself becomes more complex, requiring additional memory
and processing power to manage memory efficiently.
4.​ Portability and Size Constraints:
○​ Managed runtimes are often larger and may not be suitable for embedded
systems or resource-constrained environments where every byte of memory
matters.

Summary

Managed memory systems simplify programming and eliminate many common bugs, like
memory leaks and dangling pointers. However, they come at the cost of performance overhead,
reduced control, and increased runtime complexity. The choice between managed and
unmanaged systems depends on the application's requirements. For example:

●​ A real-time system with strict performance needs might favor unmanaged memory.
●​ A web application, where developer productivity and stability are crucial, might benefit
more from a managed runtime.

One of the key features of a managed-memory runtime is garbage


collection. Do you know what that is?
Garbage collection (GC) is a key feature of managed-memory runtimes. It's the process of
automatically identifying and reclaiming memory that a program is no longer using, to prevent
memory leaks and ensure efficient memory use. This removes the burden of manual memory

693
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

management (like malloc and free in C) from the programmer, reducing the risk of errors
such as double frees, dangling pointers, or forgetting to free memory.

Key Components of Garbage Collection:

1.​ Allocation: Memory is allocated from the heap for objects or data structures when
needed by the program.
2.​ Tracing or Counting: A mechanism tracks whether memory is still in use. This can
involve:
○​ Reference Counting: Keeping a count of how many references point to an
object, as we discussed earlier.
○​ Tracing: Using algorithms like mark-and-sweep or generational GC to traverse
memory and identify "live" (in use) versus "dead" (unreachable) objects.
3.​ Reclamation: When memory is determined to be unused, it is reclaimed and returned to
the heap for future allocations.

Common Garbage Collection Algorithms:

●​ Reference Counting: Simple but can’t handle cyclic references (e.g., two objects
referencing each other but not used elsewhere).
●​ Mark-and-Sweep: A tracing algorithm where reachable objects are "marked," and
unmarked objects are swept up as garbage.
●​ Generational GC: Divides memory into regions (young, old) to optimize for the fact that
most objects die young.
●​ Copying GC: Divides the heap into two spaces and copies live objects to the other
space, compacting memory in the process.
●​ Stop-the-World GC: Pauses the program while garbage collection is running.
●​ Incremental or Concurrent GC: Runs alongside the program to minimize pauses.

Benefits of Garbage Collection:

●​ Automatic Memory Management: Less manual effort for developers.


●​ Safety: Reduces risks of common bugs like use-after-free or memory leaks.
●​ Efficiency: Advanced GC algorithms can compact memory to reduce fragmentation.

However, GC can also introduce runtime overhead, potentially causing pauses in program
execution. Managed runtimes like the JVM (.NET, Python, etc.) strive to minimize this impact
through efficient GC designs.

One of the simplest types of garbage collector is the so-called


reference-counting GC. Can you explain its basic ideas?
Reference-counting garbage collection is one of the simplest and oldest methods of automatic
memory management. It works by keeping a reference count for each object in memory. This

694
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

count tracks how many references (or pointers) currently point to the object. When the reference
count drops to zero, the object is no longer accessible and can be safely reclaimed.

How Reference-Counting GC Works

1.​ Increment the Count: When a new reference to an object is created (e.g., assigning a
pointer or variable), the reference count of the object is incremented.
2.​ Decrement the Count: When a reference is removed (e.g., when a pointer goes out of
scope or is reassigned), the reference count is decremented.
3.​ Reclaim Memory: If the reference count of an object reaches zero, the object is
considered unreachable and is immediately freed.

Advantages of Reference Counting

1.​ Simplicity: The implementation is straightforward, and memory can often be reclaimed
immediately when objects are no longer in use.
2.​ Deterministic: Since objects are deallocated as soon as their reference count reaches
zero, memory cleanup is predictable, which is particularly useful in systems requiring
low-latency operations.
3.​ Incremental: Unlike some other GC techniques (e.g., mark-and-sweep), reference
counting does not require pausing the program to traverse the entire memory space.

Disadvantages of Reference Counting

1.​ Cyclic References:


○​ Reference counting cannot handle cyclic references, where two or more objects
reference each other but are otherwise unreachable.

Example:​
struct Object {
struct Object* ref;
};

struct Object* a = malloc(sizeof(struct Object));


struct Object* b = malloc(sizeof(struct Object));
a->ref = b;
b->ref = a;

// Once we leave the scope of local variables `a` and `b`, the
// chunks of memory they were referencing are no longer accessible.
// However, neither is freed because they reference each other!

○​ Even though a and b are no longer used, their reference counts never drop to
zero, resulting in a memory leak.
2.​ Runtime Overhead:

695
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Reference counts need to be updated every time a reference is created or


destroyed, which adds runtime overhead, especially in applications with many
reference operations.
3.​ Space Overhead:
○​ Each object requires additional memory to store its reference count.
4.​ No Compaction:
○​ Reference counting does not compact memory. Over time, memory can become
fragmented, leading to inefficient use of the heap.
5.​ Global Updates:
○​ In multi-threaded environments, managing reference counts may require locks or
atomic operations, which can hinder performance.

When to Use Reference Counting

Despite its limitations, reference counting is still used in scenarios where its simplicity and
deterministic behavior are advantageous. Examples include:

●​ Resource Management: Managing files, sockets, or other non-memory resources


where immediate cleanup is crucial.
●​ Languages and Systems: Used in Python's CPython implementation (combined with a
cycle-detecting GC) and Objective-C's ARC (Automatic Reference Counting).

Enhancements to Address Limitations

To overcome its weaknesses, reference counting can be combined with other techniques:

1.​ Cycle Detection: Periodically run an additional garbage collection algorithm (e.g.,
mark-and-sweep) to identify and clean up cyclic references.
2.​ Hybrid Approaches: Combine reference counting for deterministic cleanup with tracing
garbage collection for cycles and memory compaction.

Can you modify our heap implementation adding a reference-counting


garbage collector to it?
Below is the full implementation of the heap allocator with reference counting added:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

#define HEAP_SIZE 128 // Total size of the simulated heap.

static uint8_t heap[HEAP_SIZE]; // The heap memory pool.

// Metadata for each memory block.

696
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

typedef struct BlockHeader {


uint8_t size; // Size of the block (excluding header).
uint8_t is_free; // Whether the block is free.
uint8_t next; // Offset to the next block in the heap.
uint8_t ref_count; // Reference counter for garbage
collection.
} BlockHeader;

// Pointer (offset) to the start of the free list.


static uint8_t free_list = 0; // Start at the beginning of the heap.

// Initialize the heap.


void init_heap() {
BlockHeader* initial = (BlockHeader*)heap;
initial->size = HEAP_SIZE - sizeof(BlockHeader);
initial->is_free = 1;
initial->next = 0; // End of the list.
initial->ref_count = 0; // Initialize reference count to 0.
}

// malloc implementation.
void* my_malloc(uint8_t size) {
uint8_t current_offset = free_list;
while (current_offset < HEAP_SIZE) {
BlockHeader* current = (BlockHeader*)&heap[current_offset];
if (current->is_free && current->size >= size) {
// Found a suitable block.
uint8_t remaining = current->size - size;

if (remaining > sizeof(BlockHeader)) {


// Split the block.
uint8_t new_offset = current_offset +
sizeof(BlockHeader) + size;
BlockHeader* new_block =
(BlockHeader*)&heap[new_offset];
new_block->size = remaining - sizeof(BlockHeader);
new_block->is_free = 1;
new_block->ref_count = 0;
new_block->next = current->next;
current->next = new_offset;
current->size = size;
}
// Mark the block as allocated.
current->is_free = 0;
// Initialize reference count to 1.

697
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

current->ref_count = 1;
return &heap[current_offset + sizeof(BlockHeader)];
}
current_offset = current->next;
if (current_offset == 0) {
break; // Reached the end of the list.
}
}
// No suitable block found.
return NULL;
}

// free implementation (which only the runtime will call!)


// We could make it private, if C had private functions...
void my_free(void* ptr) {
if (!ptr) return;
uint8_t block_offset = (uint8_t*)ptr - heap -
sizeof(BlockHeader);
BlockHeader* block = (BlockHeader*)&heap[block_offset];
if (!block->is_free) {
block->is_free = 1;
// Attempt to coalesce with the next block.
if (block->next != 0) {
BlockHeader* next_block =
(BlockHeader*)&heap[block->next];
if (next_block->is_free) {
block->size += sizeof(BlockHeader) +
next_block->size;
block->next = next_block->next;
}
}
}
}

// Increment the reference count of a block.


void increment_ref(void* ptr) {
if (!ptr)
return;
uint8_t block_offset = (uint8_t*)ptr - heap -
sizeof(BlockHeader);
BlockHeader* block = (BlockHeader*)&heap[block_offset];
block->ref_count++;
}

// Decrement the reference count of a block and free it if the count

698
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

// reaches 0.
void decrement_ref(void* ptr) {
if (!ptr)
return;
uint8_t block_offset = (uint8_t*)ptr - heap -
sizeof(BlockHeader);
BlockHeader* block = (BlockHeader*)&heap[block_offset];
if (block->ref_count > 0) {
block->ref_count--;
if (block->ref_count == 0) {
// Free the block when the reference count reaches 0.
my_free(ptr);
}
}
}

This implementation incorporates reference counting for memory management while retaining
the basic structure of your heap allocator. It includes functions to increment and decrement
reference counters and automatically frees memory when the reference count reaches zero.
Adding reference counting to a heap involves several key modifications to track and manage the
number of references to each allocated memory block. Here's an explanation of the changes
needed:

1.​ Add a Reference Count Field to the Block Header


○​ Each memory block's metadata (BlockHeader) needs a new field, typically
called ref_count. This field is an integer or byte that keeps track of the number
of active references pointing to the block.
○​ For example, you can add a field uint8_t ref_count; to the BlockHeader
structure.
2.​ Initialize the Reference Count on Allocation
○​ When a block is allocated using my_malloc, its ref_count should be initialized
to 1, indicating that the block is being referenced by the code that requested the
memory.
3.​ Implement Functions to Increment and Decrement the Reference Count
○​ Create an increment_ref function to increase the reference count when a new
reference to the block is created. For instance:
○​ Similarly, implement a decrement_ref function to decrease the count and free
the block when the count reaches zero:
4.​ Modify the my_free Function
○​ Normally, my_free immediately marks a block as free. However, with reference
counting, it should only do so when the reference count reaches zero. This
change ensures that blocks are not prematurely deallocated while still in use.
5.​ Account for Reference Counting in Other Memory Operations

699
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Ensure that all functions handling pointers (e.g., copying or assigning pointers)
appropriately call increment_ref and decrement_ref as needed to maintain
the correct reference counts.
6.​ Coalescing Free Blocks
○​ The logic for coalescing free blocks (merging adjacent free blocks) can remain
mostly unchanged but must be carefully applied to ensure it doesn't disrupt the
reference-counting system.
7.​ Handle Null or Invalid Pointers
○​ Both increment_ref and decrement_ref should include checks to handle
NULL pointers or invalid memory regions gracefully, avoiding potential errors.
8.​ Debugging and Monitoring
○​ Add debugging tools, such as a print_heap function, to display the reference
counts of blocks alongside their other metadata. This can help identify issues like
memory leaks (blocks with a reference count greater than zero but no valid
references).

These changes allow the heap to manage memory more effectively, ensuring blocks are freed
only when no references remain. However, this approach has limitations, such as failing to
handle cyclic references, which would require a more advanced garbage collection technique to
resolve.

How does a reference-counting garbage collector works in practice?


In a programming language or runtime system that uses a reference-counting garbage collector,
the management of reference counters happens automatically as part of the language's
semantics or runtime implementation. The increment_ref and decrement_ref functions (or
their equivalents) are invoked implicitly by the runtime whenever the program modifies
references to objects in memory. Here's how it works in practice:

1. Incrementing Reference Counts

The increment_ref function is called when a new reference to an object is created. This
typically happens in the following scenarios:

●​ Assignment:​
When a pointer or reference variable is assigned to point to an existing object, the
reference count of that object is incremented. For example, if b is assigned to point to a,
increment_ref is called for the object pointed to by a.​

●​ Passing as a Function Argument:​


When an object is passed as an argument to a function, the reference count is
incremented to ensure the object is not deallocated while in use by the function. For
example, if a is passed to a function some_function, the reference count of a is

700
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

incremented.​

●​ Returning from a Function:​


When a function returns an object, its reference count is incremented to reflect the new
reference held by the caller.

2. Decrementing Reference Counts

The decrement_ref function is called when a reference to an object is removed or goes out of
scope. This happens in scenarios such as:

●​ Reassigning a Reference:​
When a variable holding a reference to an object is reassigned to point to a different
object, the reference count of the old object is decremented. For example, if b is
reassigned to point to c, decrement_ref is called for the object previously pointed to
by b, and increment_ref is called for the object pointed to by c.​

●​ Going Out of Scope:​


When a local variable holding a reference goes out of scope (e.g., at the end of a block
or function), the reference count of the object is decremented. For instance, if a variable
b is declared inside a block, decrement_ref is automatically called for the object when
b goes out of scope.​

●​ Object Destruction:​
When a data structure (e.g., an array or a hash table) holding references to objects is
destroyed, decrement_ref is called for all the objects it references.

3. Automatic Invocation of Reference Count Updates

In a language with reference-counting garbage collection, the runtime system or compiler


automatically inserts calls to increment_ref and decrement_ref where needed. This
removes the burden of manual memory management from the programmer. For example:

●​ When you assign a value or pass it as an argument, the compiler inserts a call to
increment_ref.
●​ When a variable is reassigned or goes out of scope, the compiler inserts a call to
decrement_ref.

4. What Happens When Reference Count Reaches Zero?

When decrement_ref reduces the reference count of an object to zero, the garbage collector
immediately deallocates the memory for the object. This involves:

701
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Freeing the memory occupied by the object itself.


2.​ If the object contains references to other objects, their reference counts are also
decremented recursively.

5. Example in a Hypothetical Language


def example():
a = MyObject() # ref_count(a) = 1
b = a # increment_ref(a); ref_count(a) = 2
c = MyObject() # ref_count(c) = 1
b = c # decrement_ref(a); increment_ref(c);
# ref_count(a) = 1, ref_count(c) = 2
# At the end of the function:
# decrement_ref(a); ref_count(a) = 0 -> free(a)
# decrement_ref(c); ref_count(c) = 1

6. Advantages of Automatic Management

●​ The programmer doesn’t need to manually manage reference counts.


●​ Memory is freed as soon as it's no longer referenced, preventing long delays in memory
reclamation (unlike in tracing garbage collectors).

7. Challenges with Automatic Reference Counting

1.​ Cyclic References: If objects form a reference cycle, their reference counts never reach
zero, leading to memory leaks. Special techniques (e.g., cycle detection or weak
references) are needed to address this.
2.​ Overhead: Incrementing and decrementing counters on every reference manipulation
adds computational overhead, which can slow down execution compared to manual
memory management.

In summary, the increment_ref and decrement_ref functions are invoked implicitly by the
runtime system during common operations like assignments, function calls, and variable scope
changes. This automation makes reference counting easy to use but comes with trade-offs in
terms of performance and handling edge cases like cycles.

Can you show me an example of how the reference-counting garbage


collector would work on an actual programming language?
Sure. The program below illustrates the usage of this type of garbage collector. This figure
demonstrates the lifecycle of objects in memory under a reference-counting garbage collector.
The diagram provides a step-by-step explanation of how reference counts change as references
are assigned, reassigned, or go out of scope.

702
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Overview

The memory is managed using reference counts. Each object in memory has an associated
reference count (indicated in the red box). This count tracks how many active references point
to the object. When the reference count drops to zero, the garbage collector deallocates the
object since it is no longer in use.

Key Insights from the Figure

703
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Automatic Memory Management: Reference counting ensures that memory is


deallocated as soon as an object is no longer referenced, avoiding memory leaks in
simple cases.
●​ Reference Count Tracking: Each operation affecting a reference (assignment,
reassignment, or going out of scope) updates the corresponding reference count.
●​ Object Deallocation: The dashed lines and reference counts illustrate when and why
objects are deallocated.

Who inserts these calls to increment and decrement the counters?

In a language with reference-counting garbage collection, the calls to increment_ref and


decrement_ref are typically not manually written by the programmer. Instead, they are
automatically inserted by the compiler or code generator during the program's compilation
or translation phase. These calls are triggered by specific operations in the source code, e.g:

●​ Variable assignments: Whenever a reference is created, reassigned, or passed to


another function, the compiler adds increment_ref or decrement_ref calls as
needed.
●​ Scope exits: When a variable goes out of scope, the compiler inserts a
decrement_ref call to ensure that the reference count is updated.
●​ Parameter passing: When a function is called, references passed as arguments trigger
increment_ref to ensure the function has its own "claim" on the object.

You can refer to these as "invisible instructions" because they are not explicitly visible to the
programmer in the source code. Their presence is inferred by the compiler based on how the
program manipulates references. This abstraction allows programmers to focus on writing code
without worrying about manually managing memory in many cases. However, these "invisible
instructions" are crucial for ensuring that the runtime system keeps track of reference counts
correctly and deallocates memory when it's no longer needed.

Can I say that the more "invisible instructions" a language has, the more
"dynamic" it is?
Yes, you're absolutely on the right track! The proportion of "invisible instructions" to "visible
instructions" could indeed be seen as a proxy for the dynamicity or runtime complexity of a
programming language. These "invisible instructions" represent the work that the runtime
system or compiler generates to manage the execution environment, ensuring correctness,
safety, and flexibility. Let's delve into this idea:

Why More "Invisible Instructions" Suggest Greater Dynamicity

Languages that require more runtime checks, management, and guarantees tend to rely
heavily on these invisible instructions. They handle tasks that are either not specified explicitly
by the programmer or are abstracted away to improve usability and safety. A language's

704
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

dynamicity increases with the complexity and number of such operations, as it reflects a higher
degree of runtime intervention.

Here are examples of how these "invisible instructions" manifest in different contexts:

1.​ Memory Management:


○​ Reference counting (e.g., Python, Swift): Insertions of increment_ref and
decrement_ref to manage object lifetimes dynamically.
○​ Garbage collection (e.g., Java, C#): While not directly visible as instructions,
memory is periodically reclaimed by a runtime GC thread that works behind the
scenes.
2.​ Type Systems:
○​ Strongly typed languages with runtime checks (e.g., Java): Inserting
instructions to enforce type coercions or validate casts.
○​ Dynamically typed languages (e.g., Python, JavaScript): Every operation
(e.g., +) may include type-checking instructions to confirm the operands are
compatible.
3.​ Memory Safety:
○​ Array bounds checking (e.g., Rust, Java): Every array access includes runtime
checks to prevent out-of-bounds errors.
○​ Null safety (e.g., Kotlin, Swift): Runtime checks to avoid dereferencing null
references.
4.​ Concurrency:
○​ Synchronization primitives like locks or atomics may be invisibly inserted to
ensure thread safety in concurrent environments.
5.​ Dynamic Features:
○​ Reflection and dynamic dispatch: Languages like Java and Python rely on
invisible runtime mechanisms for method resolution and introspection.
○​ Dynamic scoping or closures: Implementing closures often involves additional
runtime bookkeeping.

Proportion of Invisible to Visible Instructions

If we could measure the ratio of invisible to visible instructions in a program, it would give an
idea of how much the language/runtime abstracts from the programmer:

●​ Low Ratio (Static and Predictable): Languages like C or Assembly have minimal
runtime support. Most instructions are directly mapped to what the programmer writes.
Dynamicity is low.​

●​ Moderate Ratio (Hybrid Models): Languages like C++, Rust, or Go offer safety
features such as bounds checking or concurrency primitives but limit runtime
intervention.​

705
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ High Ratio (Dynamic and Managed): Languages like Python, Java, or JavaScript rely
heavily on invisible instructions for runtime type checking, memory management, safety,
and flexibility.

Potential as a Metric

While the proportion of invisible to visible instructions is an intriguing way to quantify


"dynamicity," it is important to note:

1.​ Dynamicity Isn't Just Quantity: Some languages insert a few highly complex invisible
instructions (e.g., JVM's bytecode verification) rather than numerous simpler ones.​

2.​ Optimization Reduces Dynamism: Many "invisible instructions" can be optimized away
(e.g., just-in-time compilation may remove bounds checks for loops with known ranges).​

3.​ Context Matters: The ratio may vary significantly depending on the program's domain
(e.g., numerical computing versus web applications).

Conclusion

Yes, you can use the proportion of invisible to visible instructions as a measure of a language's
runtime dynamicity, but it should be seen as one facet of the broader runtime behavior. This
perspective helps highlight the trade-offs between performance, safety, and abstraction in
programming languages.

Could you give me an example of another kind of garbage collector?


A very important and widely used type of garbage collector is the mark-and-sweep garbage
collector.

What is a Mark-and-Sweep Garbage Collector?

Mark-and-sweep is one of the simplest and most foundational garbage collection algorithms. It
works in two phases: marking and sweeping. Here's how it operates:

1.​ Mark Phase:


○​ The garbage collector starts by identifying all the objects in memory that are still
"reachable" (i.e., those that can still be accessed by the program). These are
called live objects.
○​ It does this by starting from a set of root references (such as global variables,
local variables in functions, and active threads) and traversing the objects they
reference, marking each one as reachable.
○​ Any object that cannot be reached from these roots is considered "garbage."
2.​ Sweep Phase:

706
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ After identifying all live objects, the garbage collector scans through the heap
memory and deallocates any object that was not marked during the marking
phase.
○​ This frees up memory space that can later be reused by the program.

How Does Mark-and-Sweep Work in Practice?

Imagine you have a program managing a heap with objects A, B, C, and D. Here's an example
of how the algorithm would work:

●​ Step 1 (Mark Phase):


○​ The program's variables (roots) reference objects A and B.
○​ Object A references object C.
○​ Object D has no references pointing to it.
○​ The garbage collector starts from the roots, marking A and B as reachable. It
then follows A’s reference to C and marks C as reachable.
○​ Object D is not reachable, so it is left unmarked.
●​ Step 2 (Sweep Phase):
○​ The garbage collector scans the heap.
○​ It keeps objects A, B, and C (because they are marked as reachable) and
deallocates D (because it is unmarked).

Advantages of Mark-and-Sweep

1.​ Handles Cycles: Unlike reference counting, mark-and-sweep can handle cyclic
references (e.g., two objects referencing each other but unreachable from the roots).
2.​ Simple to Implement: Its conceptual model is straightforward and easy to implement.
3.​ No Reference Counting Overhead: Unlike reference counting, there’s no need to
increment or decrement counters during normal execution.

Disadvantages of Mark-and-Sweep

1.​ Stop-the-World Pauses:


○​ The program execution is paused while the garbage collector runs, which can
cause noticeable delays, especially for large heaps.
2.​ Fragmentation:
○​ After sweeping, the memory can become fragmented (small chunks of free
memory scattered across the heap), which may make it difficult to allocate large
contiguous blocks.
3.​ Full Heap Scan:
○​ It requires scanning the entire heap, even if only a small portion of memory is
being used.

Why Is Mark-and-Sweep Important?

707
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Mark-and-sweep is important because it is the foundation of many modern garbage collectors.


Although it has some limitations, advanced garbage collection algorithms (such as generational
garbage collectors or concurrent garbage collectors) build on its basic principles to improve
performance and efficiency. Understanding mark-and-sweep gives students a solid base for
learning about these more sophisticated methods.

Can you give me examples of well-known garbage collectors used in


popular programming languages?
Here are some famous language implementations and the types of garbage collectors they use,
along with a brief explanation of why each implementation chooses its particular garbage
collection strategy.

1. Java (Java Virtual Machine - JVM)

●​ Garbage Collector: Generational Garbage Collector (e.g., G1 GC, Parallel GC, ZGC,
Shenandoah).
●​ How It Works:
○​ Memory is divided into generations: Young Generation (for short-lived objects)
and Old Generation (for long-lived objects).
○​ Short-lived objects are collected frequently in the Young Generation, which
avoids scanning the entire heap.
○​ Advanced garbage collectors like G1 GC and ZGC focus on minimizing pause
times for large-scale applications.
●​ Why?
○​ Java programs often create many short-lived objects (e.g., temporary objects in
loops), so generational collection is efficient.
○​ JVM targets enterprise applications, where responsiveness and scalability are
crucial.

2. Python (CPython)

●​ Garbage Collector: Reference Counting + Cycle Detection.


●​ How It Works:
○​ Python primarily uses reference counting to track the number of references to an
object. When the count drops to zero, the object is deallocated.
○​ To handle cycles (e.g., two objects referencing each other), Python has a
separate cycle detector that periodically checks for unreachable cyclic garbage.
●​ Why?
○​ Reference counting provides immediate deallocation, which works well for many
Python workloads.
○​ However, cycles are a limitation of reference counting, so the cycle detector adds
robustness.

708
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. C# (.NET Framework and .NET Core)

●​ Garbage Collector: Generational Garbage Collector with Compaction.


●​ How It Works:
○​ Similar to Java, .NET uses generational garbage collection with distinct heaps for
short-lived and long-lived objects.
○​ After collecting garbage, the collector compacts memory to reduce fragmentation,
making allocation faster.
●​ Why?
○​ .NET targets high-performance applications, so compacting the heap ensures
efficient use of memory.
○​ Generational collection fits well with typical C# workloads, which also involve
many temporary objects.

4. Go (Golang)

●​ Garbage Collector: Concurrent Mark-and-Sweep.


●​ How It Works:
○​ Go uses a non-generational, concurrent garbage collector.
○​ The collector runs alongside the program, identifying and sweeping unreachable
objects while minimizing pause times.
●​ Why?
○​ Go is designed for systems programming and microservices, where low-latency
and scalability are critical.
○​ The non-generational design keeps the implementation simpler and predictable
for developers.

5. JavaScript (V8 Engine in Chrome, Node.js)

●​ Garbage Collector: Generational Garbage Collector with Incremental Mark-and-Sweep.


●​ How It Works:
○​ Objects are divided into young and old generations.
○​ The young generation is collected frequently with minimal overhead.
○​ The old generation uses incremental marking, where the garbage collection work
is split into small chunks to avoid long pauses.
●​ Why?
○​ JavaScript is used in interactive web applications, where responsiveness is
paramount.
○​ Incremental collection reduces the noticeable delays caused by garbage
collection.

6. Rust

●​ Garbage Collector: None (Manual Memory Management).

709
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ How It Works:
○​ Rust uses a concept called ownership with borrow checking at compile time to
manage memory.
○​ The compiler enforces strict rules about when memory can be accessed or freed,
ensuring safety without a runtime garbage collector.
●​ Why?
○​ Rust is designed for systems programming, where deterministic performance and
minimal runtime overhead are essential.

7. Ruby (MRI - Matz’s Ruby Interpreter)

●​ Garbage Collector: Generational Mark-and-Sweep.


●​ How It Works:
○​ Similar to Java, Ruby divides objects into generations.
○​ The garbage collector uses a marking phase to identify live objects and a
sweeping phase to reclaim memory.
○​ In modern implementations, Ruby's GC is concurrent to reduce pause times.
●​ Why?
○​ Ruby applications, especially web frameworks like Rails, benefit from managing
many objects efficiently while maintaining responsiveness.

8. Haskell (GHC - Glasgow Haskell Compiler)

●​ Garbage Collector: Generational Garbage Collector.


●​ How It Works:
○​ GHC employs a generational garbage collector tuned for functional programming
workloads.
○​ Since Haskell encourages immutable data structures, many objects are
short-lived and can be collected efficiently in the young generation.
●​ Why?
○​ Functional programming generates a lot of temporary objects due to immutability,
so generational GC is a natural fit.

9. Swift

●​ Garbage Collector: Reference Counting (with Automatic Reference Counting - ARC).


●​ How It Works:
○​ Swift uses reference counting to track object lifetimes. References are
incremented and decremented automatically, and objects are deallocated when
their count drops to zero.
○​ Swift's compiler ensures that reference counting is optimized, and it provides
mechanisms to avoid cycles (e.g., weak references).
●​ Why?

710
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Swift prioritizes predictable performance for applications like iOS development,


where ARC offers low-latency memory management without a full GC pause.

10. Erlang

●​ Garbage Collector: Per-Process Garbage Collector (based on Generational GC).


●​ How It Works:
○​ Each Erlang process has its own isolated heap, and garbage collection is
performed individually for each process.
○​ This approach reduces pauses and improves concurrency in applications with
many processes.
●​ Why?
○​ Erlang is designed for fault-tolerant, concurrent systems. Its GC ensures minimal
interference between processes, maintaining high performance in distributed
applications.

Summary

Garbage collection strategies vary widely depending on the language's goals:

●​ Generational GC (Java, C#) is great for managing short-lived and long-lived objects
efficiently.
●​ Reference Counting (Python, Swift) provides immediate deallocation but struggles with
cycles.
●​ Concurrent GC (Go) minimizes pause times for real-time systems.
●​ No GC (Rust) gives ultimate control and predictability at the cost of manual memory
management.

Each strategy reflects trade-offs between performance, responsiveness, simplicity, and safety!

Who invented the Mark-and-Sweep Garbage Collector?


That was John McCarthy. We talked about him in our class on functions!

How did he invent that?

●​ Pioneer of Garbage Collection: McCarthy is credited with inventing garbage collection


in 1959 as part of his work on the programming language LISP. Garbage collection was
introduced to automate memory management and eliminate the need for programmers
to explicitly allocate and free memory.
●​ Motivation: LISP's functional programming paradigm relied heavily on creating and
discarding linked data structures like lists. Manually managing memory for these
structures was error-prone, so McCarthy developed garbage collection to ensure
memory safety and reduce programmer burden.

711
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Mark-and-Sweep Algorithm: McCarthy's original algorithm was the mark-and-sweep


garbage collector. It was a simple and elegant solution:
1.​ Mark all reachable objects starting from "roots" (e.g., variables in scope).
2.​ Sweep through memory to reclaim unreachable objects.

Broader Contributions

McCarthy's work on garbage collection was transformative because it laid the foundation for
decades of advancements in memory management. Nearly all modern garbage collection
techniques (e.g., generational, concurrent, incremental) build on ideas rooted in McCarthy's
original invention.

Why is This Important?

Without McCarthy's contribution, programming languages might have evolved differently. His
invention of garbage collection popularized higher-level programming languages by abstracting
away low-level details like memory allocation and deallocation. For these reasons, John
McCarthy is a central figure in the history of garbage collectors and programming languages!

And who invented Generational Garbage Collection?


That was David Ungar! David Ungar, along with his collaborator Paul R. Wilson, invented
generational garbage collection. This approach is based on the observation that most
objects in programs tend to have very short lifespans (e.g., temporary variables or
intermediate results). Generational garbage collection splits memory into "generations", focusing
garbage collection efforts on younger objects. Generational garbage collection significantly
improved the performance of garbage collectors by reducing the frequency of full-heap scans. It
has since become a cornerstone of modern garbage collection and is used in implementations
like the Java Virtual Machine (JVM) and .NET CLR.

712
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

David Ungar is a highly influential figure in computer science, particularly in the fields of
programming languages, memory management, and software performance. Here are some of
his major contributions:

1. The Self Programming Language

●​ Development of Self: Along with Randall Smith, Ungar co-created Self, a


prototype-based, object-oriented programming language in the late 1980s.
○​ Prototype-Based Object Orientation: Self introduced a simpler alternative to
class-based object orientation, where objects are cloned from prototypes rather
than instantiated from classes.
○​ Dynamic and Interactive Programming: Self emphasized simplicity and
flexibility, paving the way for languages like JavaScript, which adopted
prototype-based inheritance.
●​ Performance Optimizations: Self's development led to groundbreaking research in
just-in-time (JIT) compilation and adaptive optimization techniques, making dynamic
languages more efficient.

2. Pioneering Work on Generational Garbage Collection

●​ Generational Garbage Collection: Ungar's work on generational garbage collection


popularized the idea of splitting memory into "young" and "old" generations.
○​ This innovation exploited the "weak generational hypothesis," which observes
that most objects in programs are short-lived. The approach allowed garbage
collectors to focus more efficiently on memory regions with frequent allocation
and deallocation.
○​ Generational garbage collection is a core technique in modern runtime
environments, such as the Java Virtual Machine (JVM) and the .NET Common
Language Runtime (CLR).

3. Advances in Just-In-Time (JIT) Compilation

●​ Adaptive Optimization: Ungar's research into optimizing dynamic languages like Self
led to advancements in adaptive compilation, a technique that analyzes program
behavior at runtime and dynamically optimizes performance-critical paths.
○​ This research inspired many modern JIT systems, such as the ones used in
Java, Python, and JavaScript engines like V8 and SpiderMonkey.

4. Visual Programming Environments

●​ Innovative User Interfaces: Ungar's work on Self extended beyond the language itself
to include the design of highly interactive and intuitive programming environments.

713
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Self's environment featured direct manipulation of objects and graphical


representations of the runtime state, setting early standards for developer
experience in modern IDEs.

5. Contributions to Multicore Programming

●​ In more recent years, Ungar has focused on research related to leveraging multicore
processors for high-performance software.
●​ His work explores how language runtime systems and programming paradigms can be
adapted to take better advantage of parallel computing architectures.

6. Influence on Modern Languages

●​ JavaScript: The prototype-based object system in Self directly influenced JavaScript's


design, as Brendan Eich, JavaScript's creator, drew inspiration from Self.
●​ Dynamic Languages: Many of the optimizations and concepts developed in Self laid the
groundwork for modern dynamic languages like Ruby, Python, and Lua.

Awards and Recognition

●​ Ungar has been recognized for his contributions with various honors, including the ACM
Programming Languages Achievement Award in 2009.
●​ His work remains a cornerstone of modern programming language research and
development.

Legacy

David Ungar’s influence spans both theoretical and practical domains of computer science. His
work on memory management, programming language design, and performance optimization
has made dynamic languages more powerful and accessible, impacting millions of developers
worldwide.

714
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Introduction to Code Optimizations


Code: https://github.com/pronesto/DCC053_Public/tree/main/22_Optimizations

What are compiler optimizations? Can you give me examples of some of


them?
Compiler optimizations are techniques used by a compiler to improve the performance, size, or
efficiency of the code it generates without changing the program's observable behavior. These
optimizations aim to produce faster execution, reduce memory usage, or minimize power
consumption, depending on the target environment.

Examples of Compiler Optimizations

1. Loop Optimizations

●​ Loop Unrolling: Duplicates the loop body multiple times to reduce the overhead of loop
control and increase instruction-level parallelism.
●​ Loop Fusion: Combines adjacent loops that iterate over the same range to reduce loop
overhead and improve cache performance.
●​ Loop Invariant Code Motion: Moves computations that are constant across iterations
outside the loop.
●​ Loop Tiling: Breaks large loops into smaller blocks to improve data locality and cache
usage.

2. Inline Expansion

●​ Replaces a function call with the function's body. This transformation has two benefits.
First, it eliminates the overhead of the call, something important particularly for small
functions. But, second, and most importantly, inlining makes other compiler optimizations
context-sensitive, meaning that they can benefit from information that is available at the
point the function is called.

3. Constant Folding

●​ Evaluates constant expressions at compile time instead of runtime. For example, x = 2


+ 3 would be replaced with x = 5.

4. Dead Code Elimination

●​ Removes code that does not affect the program's output, such as variables or
statements that are never used.

5. Common Subexpression Elimination

715
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Identifies and eliminates repeated computations of the same expression. For example:​

a = b * c + d * e + b * c;

Can be optimized to:​

temp = b * c;
a = temp + d * e + temp;

6. Strength Reduction
Replaces expensive operations with cheaper equivalents. For example, replacing multiplication
with addition in a loop:​

for (int i = 0; i < n; i++) {


y = i * 5;
}

Could be transformed to:​

int y = 0;
for (int i = 0; i < n; i++) {
y += 5;
}

7. Register Allocation

●​ Maps frequently used variables to processor registers instead of memory to speed up


access.

8. Peephole Optimization
Focuses on small sequences of instructions, replacing inefficient patterns with more efficient
ones. For example, replacing:​

mov a, b
mov b, a

With a single instruction:​

xchg a, b

9. Tail Call Optimization

●​ Converts recursive tail calls into loops. This optimization has two main benefits. First, it
saves stack space: whereas we would need multiple activation records for the function,
now we need only one! Second, it enables loop optimizations to kick in! Most compilers

716
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

of imperative programming languages provide loop optimizations, but not really


recursive-function optimizations.

10. Interprocedural Optimizations

●​ Optimizations that analyze and modify code across multiple functions or modules.
Examples include:
○​ Inline Function Expansion
○​ Constant Propagation Across Function Boundaries

11. Profile-Guided Optimizations (PGO)

●​ Uses runtime profiling information to guide optimizations, such as improving branch


prediction or identifying hot paths.

12. Vectorization

●​ Transforms scalar operations into vector operations that can take advantage of SIMD
(Single Instruction, Multiple Data) hardware capabilities.

13. Instruction Scheduling

●​ Reorders instructions to avoid pipeline stalls in the CPU while preserving the program's
semantics.

14. Memory Optimizations

●​ Cache Blocking: Improves cache performance by optimizing data access patterns.


●​ Heap-to-Stack Conversion: Moves memory allocation from the heap to the stack where
feasible.

Levels of Optimization

Compilers often allow users to specify optimization levels, such as:

●​ -O0: No optimization, for fast compilation.


●​ -O1, -O2, -O3: Increasing levels of optimization for speed and performance.
●​ -Os: Optimize for reduced code size.

These optimizations are applied automatically by modern compilers, but developers can also
fine-tune or disable specific optimizations as needed for debugging or performance-critical
applications.

717
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why and When Should Programs Be Optimized?


Optimizing programs is a decision rooted in a balance between preparation time (compilation,
optimization effort) and execution time (runtime performance). This trade-off is governed by the
nature and purpose of the program being developed.

1. The Tension Between Preparation and Execution

Optimizing a program is an investment: the more time we spend on preparation, the better the
program can perform during execution. However, this investment only pays off if:

●​ The program runs frequently (e.g., system utilities like ls, cp).
●​ The program has a long execution lifespan (e.g., server applications, simulations).

Languages or tools often fall along a spectrum between these extremes:

●​ Low preparation time: Shell scripts (e.g., Bash) are designed for rapid iteration and
minimal preparation. They delegate the heavy lifting to other programs (e.g., GNU Core
Utils), making extensive optimization unnecessary.
●​ High preparation time: Compiled languages like C or Rust prioritize execution
efficiency, making them ideal for programs executed millions of times or in
performance-critical environments.

2. The "Carbon Expenditure" Analogy

Optimizing programs can be likened to managing carbon expenditure. A program's preparation


phase (compilation, optimization) can be seen as "carbon investment," while execution
efficiency reflects "carbon savings." The goal is to minimize the overall cost:

●​ For short-lived or rarely executed programs, the preparation cost outweighs potential
execution savings, so minimal optimization is warranted.
●​ For programs executed at scale or over long durations, the execution savings far exceed
the preparation cost, justifying significant optimization efforts.

This tension between preparation time and execution time is well-reflected into the design and
purpose of programming languages:

●​ Bash and shell scripting languages occupy the low-preparation, high-execution


quadrant. These languages optimize for ease of use and rapid deployment rather than
runtime performance.
●​ C, C++, and Rust fall in the high-preparation, low-execution quadrant. These are highly
optimized to ensure runtime efficiency, making them suitable for performance-critical
tasks.
●​ Languages like Python or JavaScript strike a middle ground. Their interpreters or JIT
compilers improve runtime performance but don't match the execution efficiency of
highly optimized binaries.

718
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. A Strategic Approach to Optimization

Thus, ultimately, the decision to optimize should consider:

1.​ Frequency of Execution: How often will the program run?


2.​ Execution Duration: How much work does the program do per execution?
3.​ Optimization Cost: How long will it take to optimize, and is the effort justified?
4.​ Impact of Efficiency: In resource-constrained environments (e.g., embedded systems
or cloud computing), even small optimizations can have significant cost savings.

Can you illustrate this tension with examples?


Sure! See the figure below:

It visually contrasts the trade-off between program preparation time (time to compile or
otherwise prepare the code for execution) and program execution time (time taken to actually
run the program). The x-axis represents the preparation time, while the y-axis represents the
execution time. It compares approaches for running C programs (compiled using clang or gcc)
and JavaScript programs (executed using various compilation modes of the V8 engine). The
figure below shows some actual data, obtained from this website. The figure compares the
compilation time and the running time of three different programs (P3_SL, P1_A and P4_SL are
randomly generated programs) compiled with gcc at its many optimization levels on a typical
Intel I7:

719
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Observations:

1.​ C Compilation (gcc/clang):


○​ The levels -O0, -O1, -O2, and -O3 represent increasing levels of optimization in
the compilation process.
○​ Moving to higher optimization levels (e.g., -O3) requires more preparation time
(compilation takes longer) but results in shorter execution times (the program
runs faster).
○​ This makes sense for programs that will be executed many times or handle
significant workloads—spending more time optimizing upfront pays off in
execution.
2.​ JavaScript Execution (V8 engine):
○​ V8 uses different compilation and execution stages like Ignition (interpreter),
Sparkplug (baseline JIT), TurboFan (optimized JIT), and Maglev (optimized
mid-tier JIT).
○​ Ignition has a minimal preparation time but relatively high execution times,
suitable for scripts run briefly or infrequently.
○​ Sparkplug, Maglev, and TurboFan progressively spend more time preparing
code for execution (JIT optimization), but this reduces execution time
significantly.
○​ The variety in V8 stages allows it to adapt dynamically, starting with quick
execution via Ignition and switching to more optimized code paths (e.g.,
TurboFan) as the program runs repeatedly and more dynamic information (e.g.,
types) is discovered.

Interpretation in the Context of Preparation vs. Execution:

720
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ C Compilation illustrates a static optimization strategy: you decide upfront how much
preparation (compilation) time you're willing to invest, based on expectations of how the
program will run.
●​ JavaScript (V8) demonstrates a dynamic optimization approach: it begins with minimal
preparation and adjusts on the fly as the program runs longer or more often, balancing
the tension between preparation and execution over time.

Practical Implications:

●​ For short-lived programs, such as scripts written in bash or JavaScript executed


infrequently, it’s not worth investing much preparation time—tools like Ignition or -O0
serve well.
●​ For performance-critical applications like database servers or numerical simulations in C,
extensive preparation (e.g., -O3) is justified, as the resulting performance improvements
during execution outweigh the initial compilation cost.
●​ The V8 engine highlights how modern runtime environments adaptively handle this
trade-off, making JavaScript competitive for longer-running tasks while retaining rapid
responsiveness for short ones.

It seems that JavaScript engines and C compilers offer different


optimization levels for distinct reasons. What reasons?
Motivations behind the optimization levels in JavaScript and C are not necessarily the same.
Let's try to see why that's so:

Optimization in JavaScript (Dynamic Adaptation)

JavaScript's approach to optimization revolves around dynamic adaptation because the


runtime (like V8) doesn't know in advance whether a piece of code will be "hot" (executed
frequently). The JIT compiler starts with minimal preparation time to ensure responsiveness
(e.g., Ignition interpreter). If the program's execution patterns indicate that some code is running
frequently, it progressively optimizes that code using baseline or advanced JIT techniques (e.g.,
Sparkplug, TurboFan). This dynamic adaptability is particularly well-suited for web browsers
where responsiveness and unpredictable workloads are key. This gradual approach to
optimizations allows V8 to gather runtime information at lower optimization levels, and apply this
information at higher optimization levels. An example of dynamic information that becomes
available as the program runs is type information! The JIT compiler can use type information to
implement, at runtime, some form of type inference, as Gal et al. nicely explain in their paper.

Optimization in C (Static Optimization)

In contrast, C operates under a static optimization strategy, where you choose the level of
optimization before execution. The decision is influenced by the program's lifecycle:

721
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Development: Frequent recompilation makes fast compile times and debuggable builds
(-O0) more valuable than runtime performance.
●​ Production: Programs that will run many times or handle heavy workloads benefit from
higher optimization (-O2 or -O3), as the compile-time cost is outweighed by the runtime
gains.

Why Multiple Levels of Optimization in C (-O0, -O1, -O2, -O3)

The multiple optimization levels in gcc and clang exist because real-world scenarios often
require nuanced trade-offs between compilation time, binary size, and execution speed. Here's
why we have four levels:

1.​ -O0 (No Optimization):


○​ Prioritizes compilation speed and preservation of debugging information.
○​ Simplifies debugging because the compiled code mirrors the source code more
closely (e.g., variable names, function calls remain intact).
○​ Ideal for early development and testing.
2.​ -O1 (Basic Optimization):
○​ Applies minimal, low-cost optimizations that improve performance without
significantly increasing compilation time or binary size (e.g., removing dead code,
simple inlining).
○​ Useful for faster builds during development when some performance
improvements are desired.
3.​ -O2 (Balanced Optimization):
○​ A balance between compile time, binary size, and runtime performance.
○​ Enables optimizations like loop unrolling, vectorization, and instruction
reordering, which significantly improve runtime performance without bloating the
binary.
○​ Commonly used in production environments where runtime performance is
crucial but compilation time and binary size also matter.
4.​ -O3 (Aggressive Optimization):
○​ Maximizes runtime performance, often at the cost of longer compilation times and
larger binary size.
○​ Includes aggressive optimizations like function inlining, loop transformations, and
speculative optimizations that may not always pay off.
○​ Ideal for highly compute-intensive workloads where every bit of performance
counts (e.g., scientific simulations).

Why Not Just -O0 and -O3?

While it may seem simpler to have just -O0 (no optimization) and -O3 (maximum optimization),
the real world often demands intermediate trade-offs for the following reasons:

722
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Development Scenarios:


○​ -O0 is great for debugging, but a minimally optimized build (-O1) can help detect
performance bottlenecks without the full overhead of -O3.
2.​ Binary Size Constraints:
○​ Higher optimization levels like -O3 can sometimes increase the binary size due
to aggressive inlining and loop unrolling. For embedded systems or
memory-constrained environments, -O2 provides a better balance.
3.​ Compilation Time:
○​ Full optimizations (-O3) can be computationally expensive. For moderate
workloads, -O2 is often the sweet spot.
4.​ Avoiding Over-Optimization:
○​ Aggressive optimizations at -O3 can sometimes cause subtle bugs, particularly
with less portable or older code. -O2 offers a safer choice that still provides
significant performance improvements.

Summary

The intermediate levels (-O1 and -O2) allow developers to choose optimizations tailored to
specific scenarios. This granularity reflects the diversity of software development needs, ranging
from quick debugging (-O0), iterative testing with moderate optimizations (-O1),
performance-critical production builds (-O2), and highly optimized compute-intensive
applications (-O3). This flexibility ensures that developers can strike the right balance between
preparation and execution times, even in statically optimized languages like C.

Notice, however, that the difference between the optimization levels depends heavily on the
compiler. In GCC, it's easier to notice a performance difference between the codes produced at
the -O2 and -O3 optimization levels, as the figure below, taken from this paper suggests:

723
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

However, similar experiments, performed on clang/LLVM seem to yield very similar performance
results between these two optimization levels, as seen in this article by Lucas Victor, when
compiling large programs generated randomly:

So, an optimization in, say, clang, is carried out by an LLVM pass. What is
this? Which kinds of passes exist?
You're exactly right! In LLVM (Low-Level Virtual Machine), an optimization pass is a modular
unit of work that analyzes or transforms the Intermediate Representation (IR) of a program.
Passes in LLVM are the building blocks for both program analysis and transformation. Let’s
dive into the details.

What is an LLVM Pass?

An LLVM pass operates on the program's Intermediate Representation (IR), which is a


platform-independent, low-level representation of the code. Each pass performs a specific task,
such as analyzing properties of the code or applying a particular transformation to improve its
efficiency.

Passes can:

1.​ Analyze the program to gather information (e.g., dependency graphs, alias information).
2.​ Transform the program (e.g., eliminate unused code, inline functions).
3.​ Be combined to form pass pipelines, where the results of one pass feed into others,
forming a sequence of optimizations and analyses.

724
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Types of LLVM Passes

LLVM passes are typically categorized based on their purpose and how they operate:

1. Analysis Passes

●​ These gather information about the program but do not modify it.
●​ Other optimization passes rely on the results of these analysis passes to make informed
decisions.

Examples of Analysis Passes:

●​ Alias Analysis (AA): Determines if two pointers might refer to the same memory.
●​ Loop Analysis: Analyzes properties of loops, such as induction variables and bounds.
●​ Call Graph Analysis: Builds a graph of function calls in the program.
●​ Dominance Analysis: Computes dominance relationships between basic blocks in a
control flow graph.

These analyses provide key insights about the structure, dependencies, and behavior of the
code.

2. Transform Passes (Optimizations)

●​ These modify the IR to improve performance, reduce code size, or enhance other
program properties.
●​ Transform passes can use information provided by analysis passes to guide their
changes.

Examples of Transform Passes:

●​ Dead Code Elimination (DCE): Removes instructions or blocks of code that have no
effect on the program’s output.
●​ Loop Unrolling: Expands loops to reduce loop overhead and enable further
optimizations.
●​ Function Inlining: Replaces a function call with the actual body of the function to reduce
call overhead.
●​ Instruction Combining: Merges instructions (e.g., combining x = a + b; y = x *
2 into y = (a + b) * 2) for efficiency.

3. Utility Passes

●​ These handle miscellaneous tasks, such as printing IR or checking consistency of


transformations.

Examples:

725
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ PrintModulePass: Prints the IR of the program.


●​ VerifierPass: Ensures the IR remains valid after transformations.

How Passes Work Together

LLVM has a Pass Manager that orchestrates the execution of passes. Here’s how passes
interact:

1.​ Analysis Before Transformation:


○​ Many transformation passes depend on analyses. For instance, Loop-Invariant
Code Motion (LICM) relies on loop analysis to determine which instructions can
be safely moved outside a loop.
2.​ Preservation of Analyses:
○​ After a transformation, certain analyses may still hold. For example, if a pass
modifies a loop but doesn't alter memory, alias analysis might still be valid.
3.​ Invalidation:
○​ When a transformation invalidates an analysis (e.g., modifying loops can
invalidate loop analysis), the pass manager reruns the affected analyses as
needed.

Examples of Combined Optimization

To illustrate how analyses and transformations work together:

●​ Constant Propagation:
○​ Analysis: Find all variables that are constants throughout their lifetime.
○​ Transformation: Replace those variables with their constant values.
●​ Dead Store Elimination:
○​ Analysis: Use alias analysis to find memory writes that are never read.
○​ Transformation: Remove those writes.

Why Modular Passes?

This design provides:

1.​ Reusability: Passes can be reused across different compiler pipelines.


2.​ Composability: Developers can chain passes to achieve complex optimization goals.
3.​ Separation of Concerns: Analysis and transformation are cleanly separated.

Summary

In LLVM, there are two main kinds of passes:

1.​ Analysis passes gather information about the program but don’t change it.
2.​ Transform passes use that information to optimize the program.

726
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This separation allows LLVM to build efficient and flexible optimization pipelines tailored to
different goals, whether it’s reducing binary size, improving runtime performance, or preparing
code for specific hardware platforms.

At which level of the compiler do these optimizations happen? Would that


be mostly in the middle end?
That's right! Compiler optimizations are typically associated with the middle-end, but
optimizations can also occur in the front-end and the back-end, depending on the compiler's
design and the specific goals of each stage. Here's a breakdown of where optimizations happen
across the three major stages of a compiler:

1. Front-End Optimizations

The front-end is responsible for parsing the source code, performing lexical and syntactic
analysis, and converting it into an Intermediate Representation (IR). Optimizations here are
usually language-specific and aim to clean up or simplify the input program.

Examples of Front-End Optimizations:

●​ Constant Folding: Simplifies constant expressions at compile time (e.g., replacing 3 +


5 with 8).
●​ Inline Expansion: Inlining small functions directly at their call sites.
●​ Simplifications Based on Language Semantics:
○​ In languages like Python or JavaScript, variable scoping and type information
might allow for minor optimizations early.
○​ For C++ templates, certain computations can be resolved at compile time.

Advantages:

●​ Exploits high-level language semantics.


●​ Reduces complexity for later stages by simplifying code upfront.

However, because the front-end works at a higher level of abstraction (close to the original
source code), it’s limited in its ability to perform deep, target-independent optimizations.

2. Middle-End Optimizations

The middle-end is where the bulk of optimizations happen. Here, the program is represented in
a mostly target-independent intermediate representation (IR), such as LLVM IR. Since this
stage is meant to be independent of both the input language and the target hardware,
optimizations are focused on improving program performance and efficiency in a generalized
way.

Examples of Middle-End Optimizations:

727
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Loop Optimizations:
○​ Loop Unrolling: Expands the loop body to reduce loop overhead.
○​ Loop-Invariant Code Motion (LICM): Moves calculations out of loops if their
results don’t change.
●​ Data Flow Optimizations:
○​ Dead Code Elimination (DCE): Removes unused variables or instructions.
○​ Constant Propagation: Replaces variables with known constant values.
●​ Control Flow Optimizations:
○​ Branch Prediction: Reorganizes branches to improve runtime performance.
○​ Simplification of Control Flow Graphs (CFGs).
●​ Memory Optimizations:
○​ Alias Analysis: Determines whether pointers can refer to the same memory.
○​ Scalar Replacement: Replaces memory accesses with registers when possible.

Advantages:

●​ The middle-end operates on a lower-level abstraction, so it can make more general


and aggressive optimizations without being tied to the input language or target platform.
●​ These optimizations are the same for all languages that the compiler supports.

3. Back-End Optimizations

The back-end generates target-specific code (e.g., x86, ARM) from the IR. Optimizations here
take the target architecture’s characteristics into account, focusing on register allocation,
instruction scheduling, and code generation.

Examples of Back-End Optimizations:

●​ Instruction Scheduling:
○​ Rearranges instructions to minimize pipeline stalls in the CPU.
●​ Register Allocation:
○​ Maps variables to a limited number of CPU registers.
●​ Peephole Optimization:
○​ Looks for small sequences of instructions that can be replaced with more efficient
ones (e.g., combining two instructions into one).
●​ Vectorization:
○​ Converts scalar operations to vector operations, taking advantage of SIMD
(Single Instruction, Multiple Data) instructions.
●​ Target-Specific Optimizations:
○​ Exploiting hardware-specific features, like specialized instructions or caches.

Advantages:

●​ Tailors the code to the specific hardware, squeezing out the last bit of performance.

728
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Ensures the generated binary adheres to the target platform's constraints (e.g., calling
conventions, instruction set).

Comparison of Optimizations Across Compiler Stages


Stage Optimizations Scope Focus

Front-End Constant folding, Source-level High-level,


simplifications language-specific

Middle-End Loop optimizations, DCE, Target-independent General performance


inlining IR improvements

Back-End Register allocation, Target-specific Architecture and


instruction scheduling machine code hardware-specific

Why Perform Optimizations in Different Stages?

Each stage of the compiler has access to different types of information:

●​ The front-end knows the most about high-level language semantics, so it can leverage
this information for optimizations like constant folding or inline expansion.
●​ The middle-end focuses on platform-independent improvements, which allows the
compiler to reuse these optimizations across many target architectures.
●​ The back-end is where the hardware constraints are considered, so it’s the only stage
that can fully exploit platform-specific features.

Summary

While most optimizations occur in the middle-end, each stage of the compiler contributes to the
overall process:

●​ The front-end performs language-specific optimizations to simplify code.


●​ The middle-end applies general-purpose, target-independent optimizations.
●​ The back-end fine-tunes the program for the target hardware.

Together, these stages ensure the program is both efficient and correct for the target platform.

What is the name of LLVM's middle-end optimizer?


In LLVM, the middle-end optimizer is handled by a tool called opt. It is a command-line tool
used to run optimization passes on LLVM Intermediate Representation (IR) files. The opt tool
allows developers to experiment with and apply various optimization passes to LLVM IR code,
providing a way to test and debug the middle-end optimizations.

729
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How opt Works

1.​ Input and Output:


○​ Takes an LLVM IR file as input (.ll or .bc).
○​ Produces an optimized IR file as output (or simply prints it to stdout).
2.​ Optimization Passes:
○​ You can specify individual passes using the -passes flag. For example:​
opt -passes=mem2reg -S input.ll -o output.ll
○​ Here, the mem2reg pass promotes memory-based variables to registers.
3.​ Pass Pipelines:
○​ You can run multiple passes in sequence (often referred to as a pipeline).
○​ For instance, the -O2 or -O3 flags apply predefined optimization pipelines
equivalent to those used in clang.
4.​ Analysis Only:
○​ If you only want to analyze the IR without modifying it, you can run analysis
passes, such as:​
opt -passes=instcount input.ll
○​ This counts the instructions in the IR.

opt vs clang

●​ clang:
○​ A compiler driver that handles the entire compilation process, from front-end
parsing of source code to back-end code generation and optimization.
○​ clang invokes the LLVM middle-end optimizations as part of its pipeline.
●​ opt:

730
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Specifically used for the middle-end stage. It is most useful for developers
working with LLVM IR directly, allowing them to apply and debug optimization
passes.

Commonly Used opt Passes

●​ Transform Passes:
○​ mem2reg: Promote memory variables to SSA registers.
○​ loop-unroll: Unroll loops.
○​ gvn: Perform Global Value Numbering.
○​ inline: Inline function calls.
○​ simplifycfg: Simplify control flow graphs.
●​ Analysis Passes:
○​ instcount: Count instructions in the IR.
○​ loop-info: Print loop information.
○​ callgraph: Generate the program's call graph.

Example Workflow
The figure below illustrates a typical usage of opt, showing the transformation pipeline:

731
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Conclusion

The opt tool is an integral part of LLVM's middle-end, enabling you to experiment with and
debug optimization passes directly on IR. It complements tools like clang, which abstract away
the middle-end optimizations for the typical user.

So, how can I see all the passes that opt runs on a program, at a given
optimization level?
Starting with LLVM 13, you can use the -print-pipeline-passes option to directly list all
passes in the optimization pipeline.

opt -O1 -print-pipeline-passes -disable-output constprop.ll

●​ Explanation:

732
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ -O2: Specifies the optimization level.


○​ -print-pipeline-passes: Prints the list of passes without running them.

Notes:

●​ The passes may vary slightly depending on the LLVM version.


●​ If you’re experimenting with custom IR or trying to debug optimizations, opt is the most
detailed and useful tool.

It seems that each level higher than O0 uses lots of different passes. Who
determines these sequences of passes?
Determining the best sequence of optimization passes in a compiler is an incredibly challenging
problem due to several factors:

1.​ Number of Passes: Modern compilers like Clang offer a large number of optimization
passes, each targeting different aspects of performance or code size.
2.​ Order Sensitivity: The effectiveness of these passes often depends on their ordering,
as the output of one pass serves as the input for subsequent ones.
3.​ Iterative Applications: Some passes can be applied multiple times, interleaved with
other passes, to maximize their effectiveness.
4.​ Program-Specific Variability: The optimal sequence for one program may differ
significantly from another, depending on their structure and behavior.

In Clang, the optimization levels (-O1, -O2, -O3, etc.) are carefully crafted to balance these
challenges and provide reasonable defaults for general-purpose programming. These
sequences are designed through extensive experimentation and are continuously refined by the
LLVM community.

Some Principles of Pass Ordering in LLVM

Certain LLVM passes naturally complement each other and are often grouped together. For
example:

●​ Preparation and Cleanup: Passes like instcombine and simplifycfg are


commonly used to simplify and canonicalize the intermediate representation (IR), making
it more amenable to further optimization.
●​ Loop-Oriented Optimizations: Passes like SROA (Scalar Replacement of Aggregates)
are often rerun after loop unrolling to maximize effectiveness. Similarly, LICM
(Loop-Invariant Code Motion) and GVN (Global Value Numbering) are applied after loop
vectorization for cleanup.
●​ Ordering Dependencies: Loop rotation and induction variable simplification typically
precede vectorization and unrolling to create a suitable IR structure.

733
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Community-Driven Refinements

Finding effective pass sequences requires a blend of expertise and empirical studies.
Researchers and practitioners collaborate to refine these sequences, as exemplified in works
like the paper New Optimization Sequences for Code-Size Reduction for the LLVM Compilation
Infrastructure.

Insights from Other Compilers

LLVM-based compilers like Swift and Rust configure their optimization pipelines to align with
their language-specific requirements:

●​ Swift: The Swift compiler uses methods like getPerformancePassPipeline and


swift::performLLVMOptimizations to customize its optimization pipeline. Dead
Code Elimination (DCE) is added before SROA to prevent unnecessary expansions of
dead loads.
●​ Rust: Rust leverages LLVM's default pipelines but adapts them for its needs via the
LLVMRustOptimize function.

Custom Pipelines

Not all languages adopt LLVM's default pipelines. For instance, the Burst compiler for C#
abandoned the default LLVM pipeline in favor of a custom one tailored to its requirements (See
the bits around "we long since abandoned the default LLVM pass pipeline for a custom one").
This demonstrates the importance of understanding the trade-offs and benefits of different
pipeline configurations for each use case.

Resources for Further Exploration

To understand the LLVM pass pipelines better you can take a look into LLVM default pass
sequences, and similar resources:

●​ Clang and Flang: RunOptimizationPipeline


●​ PassBuilder: Default Pipelines

Conclusion

The design of optimization pipelines is a blend of art and science. While Clang’s default
pipelines are rooted in years of experimentation, ongoing research and language-specific
adaptations continue to drive innovation in this field.

734
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What is the difference between these optimization levels?


The differences between LLVM's optimization levels (-O0, -O1, -O2, -O3) lie in the trade-offs
they make between compilation time, binary size, and execution speed. Based on the output
you provided, here's an explanation of the optimization levels and what you can expect at each:

1. -O0 (No Optimization)


Passes:​
verify,always-inline,CoroConditionalWrapper,function(annotation-remar
ks),verify

●​ Description:
○​ -O0 focuses on fast compilation with no optimizations applied.
○​ Only essential passes for correctness are included:
■​ verify: Ensures the IR is valid.
■​ always-inline: Forces inlining of functions marked as
always_inline.
■​ Minimal other passes (e.g., CoroConditionalWrapper for coroutine-related
transformations).
●​ Use Case:
○​ Ideal for debugging and development, as the output closely mirrors the source
code.
○​ Simplifies debugging by preserving variable names and function calls.

2. -O1 (Minimal Optimization)

●​ Passes:
○​ Includes a wide range of optimizations for:
■​ Control flow simplification: simplifycfg
■​ Memory optimization: mem2reg, sroa (promotes memory to registers).
■​ Dead code elimination: deadargelim.
■​ Function inlining: Limited to small or mandatory cases.
■​ Loop transformations: Basic loop rotation, unswitching.
■​ Call graph-based optimizations: cgscc, inline for necessary cases.
●​ Description:
○​ Performs lightweight optimizations to reduce execution time without significantly
increasing compilation time or binary size.
○​ Intermediate transformations include dead code elimination, early constant
propagation, and some inlining.
●​ Use Case:
○​ Suitable for early testing when faster execution is desired without incurring
significant compile-time overhead.

735
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3. -O2 (Standard Optimization)

●​ Passes:
○​ Builds on -O1 and includes:
■​ Global Value Numbering (gvn): Removes redundant computations.
■​ Aggressive inlining and loop transformations: Additional loop unrolling
(loop-unroll), vectorization (loop-vectorize), and speculative
execution.
■​ More advanced control flow optimizations: Jump threading, correlated
propagation.
■​ Scalar Replacement of Aggregates (sroa): Breaks down memory
objects into scalars.
■​ Dead store elimination (dse): Removes unnecessary memory writes.
●​ Description:
○​ Focuses on a balance between compile time, runtime performance, and
binary size.
○​ Includes almost all major optimizations except the most aggressive ones from
-O3.
●​ Use Case:
○​ Default choice for production builds, providing a good balance of performance
and compilation time.

4. -O3 (Aggressive Optimization)

●​ In contrast to gcc, there is not much difference between clang -O2, and clang -O3.
In fact, Emery Berger et al.'s work have reported that there's basically no statistical
difference between these two optimization levels. See, for instance, this video (at
23:58).

Key Differences
Feature -O0 -O1 -O2 -O3

Optimization None Minimal Balanced Maximum


Goal Performance

Compilation Fast Slightly Longer Longest


Time longer

Execution Slow Improved Significantly Best Possible


Speed Improved

Binary Size Large Moderate Moderate Largest

736
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Inlining Minimal Mandatory Moderate Aggressive


only

Loop None Basic Advanced Aggressive


Optimizations

Vectorization None None Loop Vectorization Loop and SLP


Vectorization

Target Debuggin Development General Production Performance-Critical


Applications g Builds Builds

When to Use Each Level

●​ -O0: For debugging, where fast compilation and a faithful source-to-binary mapping are
more important than runtime performance.
●​ -O1: For development and testing phases when you want some runtime improvement
without much additional compile-time cost.
●​ -O2: The go-to optimization level for production, striking a balance between compilation
time and runtime performance.
●​ -O3: For highly compute-intensive applications where maximum runtime performance is
required, even at the cost of compilation time and binary size.

Would you like me to elaborate on specific passes or discuss more about trade-offs in real-world
scenarios?

Can you show me an example that illustrates the difference between -O2
and -O3?

Absolutely! Let's use gcc nstead of clang in this case, as the difference is much more
noticeable:

737
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

When you compile your dot function with -O3 instead of -O2, the generated code grows
significantly due to vectorization, loop unrolling, and the generation of multiple execution
paths to maximize runtime performance. Let’s analyze what happens and why the code grows
so much at -O3.

What Happens When Switching from -O2 to -O3

1.​ Vectorization Kicks In:


○​ At -O3, the compiler aggressively applies vectorization to loops, utilizing SIMD
(Single Instruction, Multiple Data) instructions to process multiple array elements
in parallel.
○​ SIMD instructions like movdqu, paddd, pcmpeqd, and pandn are introduced.
These enable parallel data processing, but they also require:
■​ Setup Instructions: For example, loading data into SIMD registers
(movdqu) and manipulating vector masks.
■​ Remainder Loops: Separate scalar loops are generated to handle
iterations that don't fit into the SIMD processing block.
2.​ Loop Unrolling and Multiple Execution Paths:
○​ Main Vectorized Loop: Processes large chunks of data using SIMD.
○​ Scalar Fallback Loop: Handles the "remainder" iterations left after the SIMD
loop.
○​ Special Cases: Additional branches are added to handle edge cases, such as:
■​ When the array size is too small for vectorization.

738
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ Misaligned memory accesses or arrays that don’t fit SIMD-friendly


boundaries.
3.​ Aggressive Inlining and Speculative Transformations:
○​ The compiler inlines helper operations, even speculative branches, to avoid
function call overhead and increase opportunities for optimization.

For example:

movdqu (%rdi,%rax), %xmm1 ; Load multiple elements into a SIMD register.
pcmpeqd %xmm2, %xmm0 ; Compare SIMD vectors for equality.
pandn %xmm1, %xmm0 ; Perform vectorized operations.

Why Does the Code Grow?

1. Vectorized Code Adds Overhead

●​ SIMD instructions process multiple elements at once, but preparing data for SIMD and
gathering results afterward require additional instructions.
●​ For example:
○​ Loading data into SIMD registers (movdqu).
○​ Masking operations (pcmpeqd, pandn).
○​ Shuffling or reordering operations to align data (pshufd).
●​ The vectorized loop (.L4) is more complex than the scalar loop in -O2 because it uses
instructions optimized for parallel processing.

2. Handling Non-SIMD-Compatible Cases

●​ To ensure correctness, the compiler generates additional scalar loops for:


○​ Remainder iterations: When the array size isn’t divisible by the SIMD width.
○​ Edge cases, like misaligned memory or very small arrays.
●​ These scalar fallback loops duplicate much of the functionality already present in the
vectorized loop.

3. Alignment and Bounds Checking

●​ Vectorized instructions often require aligned memory accesses. The compiler generates
code to:
○​ Check if the arrays are aligned.
○​ Handle unaligned memory accesses gracefully.

For example:

cmpq $30, %r8 ; Check if alignment and size allow vectorization.
je .L3 ; If not, fallback to scalar code.

4. Branch Handling

739
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ The compiler generates branches to choose between:


○​ The vectorized loop.
○​ The scalar remainder loop.
○​ Special-case handling for small arrays.
●​ These branches increase the code size but ensure the program runs efficiently on a
variety of inputs.

Breakdown of the Assembly

At -O2: The loop is a simple scalar loop:​



movl (%rdi,%rax,4), %r8d ; Load element a[i].
movl (%rsi,%rax,4), %ecx ; Load element b[i].
testl %ecx, %ecx ; Check if b[i] == 0.
cmovne %r8d, %ecx ; Conditionally assign r[i].
movl %ecx, (%rdx,%rax,4) ; Store r[i].

The scalar loop directly operates on one element at a time, making the code compact.

At -O3: The loop is transformed into a SIMD loop (processes multiple elements):

movdqu (%rdi,%rax), %xmm1 ; Load 4 elements of a[].


movdqu (%rsi,%rax), %xmm0 ; Load 4 elements of b[].
pcmpeqd %xmm2, %xmm0 ; Compare b[] to zero.
pandn %xmm1, %xmm0 ; Compute result for r[].
movups %xmm0, (%rdx,%rax) ; Store 4 elements to r[].

A scalar remainder loop (handles the leftover elements):​

movl (%rdi,%r8,4), %r8d ; Load a[size-1].


movl (%rsi,%r8,4), %ecx ; Load b[size-1].
testl %ecx, %ecx ; Check b[size-1].
cmovne %r8d, %ecx ; Conditionally assign r[size-1].
movl %ecx, (%rdx,%r8,4) ; Store r[size-1].

Additional setup and alignment checks:​



cmpq $30, %r8 ; Check if vectorization is feasible.
je .L3 ; If not, fallback to scalar code.

Why Does the Compiler Do This?

●​ Goal: Maximize runtime performance.


●​ Trade-off:
○​ Increased code size: Necessary for handling edge cases and supporting SIMD.

740
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Improved performance: SIMD significantly speeds up array processing for large


input sizes.

When Should You Use -O3?

●​ Use -O3 if:


○​ The loop processes large datasets where SIMD benefits outweigh the overhead.
○​ Runtime performance is critical, and increased binary size is acceptable.
●​ Use -O2 if:
○​ Code size or compile time is a concern.
○​ The input size is small, where SIMD's overhead negates its benefits.

Summary

Switching to -O3 grows the code because the compiler:

1.​ Adds SIMD loops for parallel processing.


2.​ Generates scalar fallback loops for remainder iterations.
3.​ Handles alignment and edge cases robustly.

This trade-off is worth it for performance-critical workloads on large datasets, but it might be
excessive for smaller inputs or memory-constrained environments. Would you like further details
on specific instructions or optimizations?

Are there ways for the program to "talk" to the compiler about how to
optimize the code? At least in C or similar system languages?
Certainly! Improving the output of a C compiler like GCC through better source code practices is
a fascinating area of discussion. Here are additional strategies to help the compiler generate
better code:

1. Use Inline Functions for Small and Repeated Operations

Instead of macros, prefer inline functions for small, frequently used operations. Inline
functions:

●​ Preserve type safety.


●​ Enable the compiler to optimize better since they are part of the optimization pipeline.

Example:

inline int square(int x) {


return x * x;
}

741
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Leverage __attribute__((...)) Extensions

GCC/clang provide several attributes that can fine-tune optimization.

hot/cold functions: Mark functions likely to be called frequently (hot) or rarely (cold).​

void critical_function() __attribute__((hot));
void error_handler() __attribute__((cold));

aligned: Improve memory alignment for better cache performance.​



int array[100] __attribute__((aligned(64)));

noinline: Prevent inlining for debugging or when inlining could negatively affect cache locality.

flatten: Inline all functions called within a specific function.​



void compute() __attribute__((flatten));

3. Organize Code for Cache Locality

Compilers optimize for locality, but writing code that naturally adheres to memory access
patterns helps. Strategies include:

●​ Structure of Arrays (SoA) over Array of Structures (AoS) for SIMD-friendly code.
●​ Group frequently accessed variables together to minimize cache line misses.

4. Utilize restrict for Non-Aliased Pointers

You're already familiar with restrict, but ensure that its use is backed by rigorous analysis.
Compilers are cautious with optimizations around pointers due to aliasing concerns.

Example:

void add_arrays(
int *restrict a,
int *restrict b,
int *restrict c, size_t n
) {
for (size_t i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
}

742
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Enable Loop Unrolling and Vectorization

Write loops in a way that enables the compiler to unroll them and apply vectorization:

●​ Use clear and simple loop conditions.


●​ Prefer array indexing over pointer arithmetic for clarity.

Example:

void sum_arrays(float *a, float *b, float *c, int n) {


for (int i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
}

Compilers like GCC (-O2 or -O3) will attempt vectorization automatically if the loop structure is
simple enough.

6. Use Built-In Functions

Use GCC’s built-in intrinsics for common operations:

●​ __builtin_expect to hint branch prediction to the compiler (similar to


likely/unlikely).
●​ __builtin_popcount for counting bits in an integer.
●​ __builtin_prefetch to explicitly prefetch memory into the cache.

7. Reduce Indirection

Avoid excessive levels of pointer dereferencing or linked data structures if arrays or flat data
structures can suffice. Flat structures are often easier for compilers to optimize.

Example:

struct FlatArray {
int data[100];
};

8. Minimize External Function Calls

Inline calculations or use compiler-inlinable helper functions for small computations. Minimize
calls to external functions, as they break optimization opportunities like inlining and instruction
scheduling.

9. Use volatile Sparingly

743
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Avoid marking variables volatile unless necessary (e.g., memory-mapped I/O, signal
handlers). volatile prevents optimization, including register allocation and instruction
reordering.

10. Avoid Undefined Behavior

Code that invokes undefined behavior (UB) often prevents the compiler from applying
aggressive optimizations. Examples of UB include:

●​ Dereferencing null pointers.


●​ Signed integer overflow.
●​ Out-of-bounds array access.

Example of UB-safe code:

int divide(int a, int b) {


if (b == 0) return 0; // Avoid division by zero.
return a / b;
}

11. Profile-Guided Optimization (PGO)

Run the program with typical workloads and use GCC's profiling tools:

1.​ Compile with -fprofile-generate.


2.​ Run the program to generate profiling data.
3.​ Recompile with -fprofile-use.

This allows the compiler to optimize based on actual runtime behavior.

12. Exploit Explicit Loop Constructs

Use explicit constructs like #pragma for optimization hints:

#pragma GCC unroll 4


for (int i = 0; i < 16; i++) {
array[i] *= 2;
}

13. Optimize Conditional Statements

Write conditional statements to favor predictable execution paths, especially in critical sections.

Example:

744
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

if (likely(x > 0)) {


// Fast path.
} else {
// Rare case.
}

14. Avoid Function Pointer Overuse

Function pointers can degrade performance as the compiler often cannot predict which function
will be called, impacting branch prediction and instruction prefetching.

15. Annotate Functions as pure or const

Use __attribute__((pure)) or __attribute__((const)) for functions with no side


effects. This helps the compiler optimize calls better.

Example:

int compute_sum(int a, int b) __attribute__((pure));

These techniques require understanding the workload and profiling output to maximize benefits.
Combining them with careful benchmarking and testing can significantly improve the efficiency
of compiled code.

Can you show how we can use the restrict keyword in C?


Sure! Experiment compiling the function below without and with the restrict keyword:

void dot(int *restrict a, int *restrict b, int *restrict r, int size)


{
int i;
for (i = 0; i < size; i++) {
r[i] = a[i];
if (!b[i])
r[i] = b[i];
}
}

Let's call the function without the restrict keyword dot0 and let's call the function with that
keyword dot1. See below the two CFGs that we might get with the following compilation lines:

$> clang -O2 -Xclang -disable-O0-optnone -emit-llvm -S \


-fno-discard-value-names restrict.c -o restrict.ll

$> opt -dot-cfg restrict.ll -disable-output

745
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

$> dot -Tpdf .dot0.dot -o dot0.pdf

$> dot -Tpdf .dot1.dot -o dot1.pdf

You will notice that we have two kinds of loops in each CFG. Each loop has a different purpose:

●​ Loops 1 and 3 are vectorized loops for dot0 and dot1, respectively, triggered when
iterations exceed vector lane requirements and memory safety checks (if any) pass.
●​ Loops 2 and 4 are fallback loops handling small iteration counts, aliasing (for dot0),
and leftover iterations.
●​ The removal of runtime checks in dot1 relies on the programmer’s guarantee, which
must be upheld to ensure correctness.

746
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Code Compiled Without restrict (Function dot0)

When the restrict keyword is not present, the compiler assumes that the arrays may alias
(i.e., overlap in memory). This uncertainty requires the inclusion of runtime memory checks to
ensure correctness under all scenarios.

Generated Control Flow

747
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ [Loop 1: Vectorized Loop]


○​ This loop executes using vectorized instructions if runtime checks determine that
the arrays do not overlap and the number of iterations is sufficient for
vectorization (greater than the vector lane count).
2.​ [Loop 2: Fallback Loop]
○​ This loop executes in the following cases:
■​ If the runtime checks detect aliasing between the arrays.
■​ If the number of iterations is insufficient for vectorization.
■​ To handle any leftover iterations after the vectorized loop completes.

Runtime Memory Checks

Before the vectorized loop (Loop 1) runs, a vector.memcheck block verifies the following:

●​ That the arrays a, b, and r do not alias.


●​ That the number of iterations is sufficient for vectorization.

If these checks fail, execution falls back to the scalar loop (Loop 2), ensuring correctness under
all circumstances.

Code Compiled With restrict (Function dot1)

The restrict keyword guarantees to the compiler that the arrays do not alias. This assurance
allows the compiler to skip runtime memory checks and optimize the code more aggressively.

Generated Control Flow

3.​ [Loop 3: Vectorized Loop]


○​ This loop executes using vectorized instructions if the number of iterations is
sufficient for vectorization (greater than the vector lane count).
4.​ [Loop 4: Fallback Loop]
○​ This loop executes in the following cases:
■​ If the number of iterations is insufficient for vectorization.
■​ To handle any leftover iterations after the vectorized loop completes.

Key Differences

●​ No Runtime Memory Checks: Unlike dot0, dot1 does not require the
vector.memcheck block because the restrict keyword guarantees non-overlapping
memory.
●​ Simplified Control Flow: The removal of memory checks reduces the overhead,
resulting in a leaner and more efficient CFG.

Comparison of dot0 and dot1

748
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Aspect dot0 (Without restrict) dot1 (With restrict)

Runtime Memory Present Absent


Checks

Vectorized Loop Alias-free and sufficient iterations Sufficient iterations only


Conditions

Fallback Loop Alias detected, small iterations, or Small iterations or leftovers


Conditions leftovers

Optimization Potential Limited by aliasing assumptions Optimized assuming no


aliasing

Correctness Guarantee Ensured by runtime checks Dependent on restrict


contract

Correctness and Programmer Responsibility

While dot0 guarantees correctness even with overlapping arrays (via runtime checks), dot1
relies on the programmer to uphold the restrict contract. If the programmer violates this
contract (e.g., by passing overlapping arrays to dot1), the behavior of the program is undefined
and may produce incorrect results.

Conclusion

The use of the restrict keyword in dot1 enables the compiler to generate more efficient
code by eliminating runtime memory checks and simplifying the control flow. However, this
optimization comes with the responsibility of ensuring that the contract of non-aliasing is upheld.
In contrast, dot0 provides robustness at the cost of additional runtime checks and a more
complex control flow.

By carefully analyzing the control flow graphs, we can see how the presence or absence of
restrict shapes the compiler’s optimization decisions and impacts performance and
correctness guarantees.

But, even without restrict, clang can still vectorize the function! How's
that possible?
Clang is able to do it with a runtime check! The vector.memcheck block in LLVM IR checks
for potential memory aliasing conflicts between the input arrays (a, b, and r) to determine
whether the vectorized loop can safely run. If a conflict is detected, it redirects the control flow to

749
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

the scalar fallback loop (for.body.preheader33); otherwise, it proceeds to the vectorized


loop (vector.ph). Here’s how the vector.memcheck logic might look in C:

void dot0(int a[], int b[], int r[], int size) {


// Memory aliasing check (vector.memcheck logic)
if ((a + size > r && r + size > a) || // Check overlap between `a` and `r`
(b + size > r && r + size > b)) { // Check overlap between `b` and `r`
// Fallback scalar loop
run_non_vectorized_loop(...);
} else {
// Vectorized loop (executed only if no aliasing conflict is detected)
// Compiler-generated: typically vectorized code here
run_super_vectorized_and_amazing_and_cool_loop(...);
}
}

Explanation:

1.​ Pointer arithmetic:


○​ a + size computes the address just beyond the end of array a.
○​ r + size computes the address just beyond the end of array r.
2.​ Overlap checks:
○​ (a + size > r && r + size > a) ensures there’s no overlap between the
ranges [a, a + size) and [r, r + size).
○​ Similarly, (b + size > r && r + size > b) checks for overlap between b
and r.
3.​ Conflicts resolution:
○​ If either condition evaluates to true, the function redirects to the fallback scalar
loop.
4.​ Control flow:
○​ goto statements are used to mimic the branch behavior (br) in LLVM IR.

This logic ensures that vectorization occurs only when the input arrays are guaranteed not to
overlap, preserving correctness.

How does clang discover the sizes of the arrays?


Well, there is an explanation in Section 2 of the "Runtime Pointer Disambiguation" paper (ACM
OOPSLA 2015). In our example, the sizes of the arrays are discovered through runtime pointer
disambiguation checks, which are implemented as part of the vector.memcheck logic.
Here's how it works conceptually:

How Clang Discovers Array Sizes

750
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Pointer Arithmetic and size Parameter:


○​ The size parameter in your dot0 function indicates the number of elements in
each array.
○​ The compiler uses pointer arithmetic (base_pointer + size) to calculate the
upper bounds of the memory regions accessed by a, b, and r.
2.​ Runtime Bounds Comparison:
○​ Using the computed upper bounds, the compiler generates runtime checks like
the following:
■​ if (a + size > r && r + size > a) checks overlap between a
and r.
■​ if (b + size > r && r + size > b) checks overlap between b
and r.
○​ These checks compare the start and end memory ranges of the arrays to ensure
they do not overlap.
3.​ Fallback to Scalar Code:
○​ If the runtime checks detect a potential overlap, the code falls back to a safe
scalar version of the loop.
4.​ Safe Vectorization:
○​ If no overlap is detected, the optimized vectorized loop can safely execute,
leveraging SIMD instructions for better performance.

Ah, it uses a combination of static and dynamic analysis?


Exactly! The combination of static analysis and dynamic runtime checks is what makes
Clang (and LLVM) both safe and efficient when dealing with potentially overlapping memory
regions in array operations. Here's a quick breakdown of the two components and their
interplay:

1. Static Analysis

At compile time, Clang uses:

●​ Pointer Range Analysis:


○​ It examines the base addresses (a, b, and r) and computes potential ranges
using the size parameter.
○​ For example, the compiler knows a occupies [a, a + size) in memory.
●​ Alias Analysis:
○​ Clang checks whether the memory ranges for different pointers are guaranteed
to be disjoint (non-overlapping) based on the available information. If it cannot
determine this conclusively, it generates runtime checks.
●​ Loop Vectorization Analysis:
○​ If aliasing cannot be ruled out, the compiler prepares two versions of the loop:
■​ A scalar loop (safe fallback).

751
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ A vectorized loop (optimized for performance).

2. Dynamic Analysis

When static analysis isn't conclusive, runtime checks are introduced:

●​ Memory Overlap Checks:


○​ The vector.memcheck block in LLVM IR dynamically verifies whether the
memory regions overlap by comparing pointer ranges.
○​ If an overlap is detected, execution branches to the scalar version.
●​ Execution Path Decision:
○​ If no overlap is detected, the vectorized code path executes safely.
○​ Otherwise, the scalar loop ensures correctness without risking undefined
behavior.

Why Both Are Needed

●​ Static Analysis Strengths: It's fast and avoids runtime overhead but limited by
conservative assumptions (e.g., it may not know pointer relationships or sizes at compile
time).
●​ Dynamic Analysis Strengths: It's precise, as it evaluates actual memory regions at
runtime, but introduces some overhead.

By combining the two, Clang ensures:

1.​ Safety: The program behaves correctly even when memory aliasing occurs.
2.​ Performance: It uses vectorized code paths whenever possible, maximizing speed.

This dual approach is central to modern compiler optimizations and is well-captured in the
techniques described in the Runtime Pointer Disambiguation paper. It's one of the reasons
LLVM-based compilers like Clang are so effective in balancing safety and performance!

So, code optimization is the art of moving into compilation time as much of
the computations that we could do at execution time?
Well, in a way! Compilers are very good at solving at compilation time computations that,
otherwise, would be solved at running time. Consider the following example, taken from
Stefanos Baziotis' webpage:

752
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The puzzling behavior the figure is showing is due to an optimization that the LLVM community
likes to call Scalar Evolution. But even without it, a compiler can still achieve a similar result via
a combination of constant propagation, loop unrolling, and dead-code elimination. Let me
explain what's happening in detail:

1. Compiler Optimizations at Work

●​ Constant Propagation and Folding: When the collatz function is invoked with a
constant argument (like 6 or 250), the compiler tries to evaluate the result at compile
time. This is part of constant propagation and folding, where the compiler replaces
computations with their results to generate more efficient code.​

●​ Loop Unrolling and Iteration Limits: For certain inputs, the compiler attempts to
evaluate loops at compile time. However, this has practical limits:​

○​ If the loop is simple and the number of iterations is small, the compiler fully
evaluates it.

753
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If the loop becomes too complex (e.g., too many iterations), the compiler gives up
and emits the original loop code.

2. The Behavior You Observed

Case 1: collatz(6) with Clang 18.1.0

For collatz(6), scalar evolution simulates the function at compile time; hence, fully
computing the number of iterations (8) because the Collatz sequence for 6 is relatively short:​
6 → 3 → 10 → 5 → 16 → 8 → 4 → 2 → 1. Thus, clang is able to replaces the call to
collatz(6) with a single instruction, e.g:

mov eax, 8
ret

Case 2: collatz(250) with Clang 18.1.0

For collatz(250), the sequence is longer: 250 → 125 → 376 → 188 → 94 → 47 → 142 → ...
→ 1. Here, the compiler likely hits its iteration threshold for how many iterations scalar
evolution can perform. Instead of fully evaluating the function, it emits optimized loop code, but
does not compute the result at compile time.

Case 3: collatz(249) with Clang 18.1.0

For collatz(249), the sequence is much shorter (47 iterations), so the compiler successfully
evaluates it at compile time and replaces the loop with a constant:

mov eax, 47
ret

3. Why This Happens

Clang (and other modern compilers) employs heuristics during optimization:

1.​ Loop Iteration Limits:


○​ The compiler limits the number of iterations it evaluates at compile time to avoid
spending excessive time on optimization.
○​ These limits are often configurable but depend on the specific optimization level
(-O1, -O2, etc.).
2.​ Complexity of the Loop Body:
○​ If the loop body is too complex (e.g., involves branches, memory accesses, or
large computations), the compiler may choose not to unroll or evaluate it.
3.​ Pragmatic Trade-Offs:

754
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Fully evaluating collatz(250) at compile time would increase compilation time


significantly with diminishing runtime benefits. Instead, the compiler emits
optimized runtime code.

4. Why Different Clang Versions Behave Differently

●​ Clang 17.0.1: This version doesn't aggressively optimize the collatz loop at -O1. It
emits runtime loop code for all cases.​

●​ Clang 18.1.0: This version likely has improvements in constant propagation and partial
evaluation, enabling it to evaluate short Collatz sequences at compile time.

5. Limits of This Behavior

The compiler's willingness to compute loops at compile time is subject to:

●​ Optimization Level: Higher levels (-O2, -O3) often lead to more aggressive
optimizations.
●​ Hardcoded Limits: There are usually fixed thresholds for the number of iterations or
computation steps the compiler will attempt to evaluate.

For example:

●​ LLVM has a threshold for loop unrolling and constant folding controlled by internal
parameters like UnrollThreshold or ConstExprLimit.
●​ If these thresholds are exceeded, the compiler stops partial evaluation and emits the
loop code.

Key Takeaway

What you're observing is a perfect example of the compiler solving problems at compile time
when it’s computationally feasible and beneficial for performance. However, practical limits exist
to prevent excessive compile times, which is why the behavior changes for larger or more
complex inputs!

But a compiler can't solve everything at compilation time, right? What are
the limits of optimizing compilers?
Great question! Optimizing compilers are powerful tools, but they have inherent limits stemming
from both theoretical and practical considerations. Let’s break this down:

What an Optimizing Compiler Can Do

1.​ Eliminate Redundancies:

755
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Constant Folding: Replace expressions like 2 + 2 with 4.


○​ Loop-Invariant Code Motion: Move calculations outside of loops if their results
don’t change within the loop.
2.​ Simplify and Transform Code:
○​ Dead Code Elimination: Remove code that doesn’t affect the program’s
observable behavior.
○​ Strength Reduction: Replace expensive operations (e.g., multiplication) with
cheaper ones (e.g., addition in loops).
3.​ Make Execution Faster:
○​ Inlining Functions: Replace a function call with the function’s body to save the
overhead of the call.
○​ Vectorization: Transform scalar operations into vector operations that run in
parallel.
4.​ Optimize Memory Usage:
○​ Register Allocation: Use CPU registers efficiently to minimize slower memory
accesses.
○​ Cache Optimization: Rearrange memory accesses to improve spatial and
temporal locality.

Theoretical and Practical Limits

1.​ Halting Problem:


○​ An optimizing compiler cannot solve undecidable problems. For example,
deciding whether arbitrary code will terminate (loop optimization) or produce a
certain value is equivalent to the halting problem, which is undecidable.
2.​ Ensured Termination of Optimizations:
○​ Compilers are designed to ensure their optimization passes terminate, typically
through predefined limits (e.g., a maximum number of optimization iterations) or
by applying only provably safe transformations. Otherwise, the compiler itself
could enter an infinite loop while trying to optimize.
3.​ Undecidable Properties:
○​ Certain program properties, such as whether two pieces of code are functionally
equivalent, are undecidable in general. This means the compiler can't always
make the "best" optimization without risking correctness.
4.​ Performance Trade-Offs:
○​ Optimizations often involve trade-offs. For example, some optimizations might
increase code size (e.g., inlining) to improve speed. Compilers balance such
trade-offs using heuristics, but they don’t always find the globally optimal solution.
5.​ Semantics Preservation:
○​ Optimizations must preserve the program's original semantics. This constraint
limits what compilers can do since any aggressive optimization must be provably
safe under all possible inputs.
6.​ Resource Constraints:

756
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Compilers operate under limited time and resources. Deep analyses or


optimizations may be skipped if they are too computationally expensive,
especially for large programs.

Can Compilers Remove All Loops?

For loops with bounds and computations that are statically known (e.g., summing integers from
1 to 10), a compiler can replace the loop with a constant. However:

●​ If the loop depends on runtime values or inputs, the compiler generally can’t remove it
outright, because doing so risks changing the program’s behavior.
●​ If the loop’s termination condition can’t be proven at compile-time, the compiler may
conservatively leave it unchanged.

Bottom Line

Optimizing compilers can perform impressive transformations but are fundamentally limited by:

●​ Undecidability (halting problem, equivalence checking),


●​ Resource and time constraints,
●​ Requirement to preserve semantics.

These limits ensure that optimizations are effective yet safe, and compilers terminate within
reasonable bounds.

LLVM is pretty amazing! Who created it?


The LLVM project started as part of Chris Lattner's PhD project. At the time, he was advised by
Professor Vikran Adve. Chris Lattner is a major figure in the development of modern optimizing
compilers. Although he is relatively young, his contributions have had a significant impact on the
way compilers handle optimization and code generation today.

757
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Here are some of his key contributions:

1. Creation of LLVM

●​ Conceptualized and developed LLVM (Low-Level Virtual Machine), a modular,


reusable compiler infrastructure.
●​ Introduced the LLVM Intermediate Representation (IR), enabling powerful optimization
passes at compile-time, link-time, and runtime.
●​ Made LLVM the backbone of many modern compilers, including Clang, Rust, Julia, and
Swift.
●​ Designed LLVM to support a wide range of target architectures, promoting
hardware-agnostic optimizations.

2. Development of Clang

●​ Created Clang, the C, C++, and Objective-C frontend for LLVM.


●​ Improved error diagnostics, making Clang a favorite among developers for its clear,
actionable error messages.
●​ Focused on performance and modularity, ensuring faster compilation times and better
integration with LLVM's backend optimizations.
●​ Facilitated static analysis and tooling, enabling advanced code quality checks.

3. Innovations in Compiler Optimizations

●​ Contributed to key optimization techniques in LLVM, such as:


○​ Loop unrolling, vectorization, and fusion.

758
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Link-Time Optimization (LTO) for cross-module performance improvements.


○​ Just-In-Time (JIT) compilation, allowing runtime-specific optimizations.
●​ Advocated for modern, flexible compiler architectures, influencing both academic and
industrial practices.

4. Creator of the Swift Programming Language

●​ Designed Swift, a programming language focused on performance, safety, and


developer productivity.
●​ Leveraged LLVM to generate highly optimized machine code for Swift programs.
●​ Built Swift with modern features like optionals, type inference, and memory safety,
making it a preferred language for iOS and macOS development.
●​ Made Swift open source, broadening its reach and encouraging contributions from the
global developer community.

5. Contributions to the Open Source Community

●​ Promoted open collaboration by making LLVM, Clang, and Swift widely accessible as
open-source projects.
●​ Inspired the development of other tools and languages that build on or extend LLVM,
such as Julia, Kotlin Native, and the Rust compiler.

6. Innovations in Machine Learning and AI

●​ Played a leadership role at companies like Google (TPU development) and Tesla
(Autopilot software) to integrate machine learning into high-performance systems.
●​ Focused on AI infrastructure and compiler technologies that optimize machine learning
workflows.

7. Academic and Industrial Impact

●​ Influenced modern computer science education through LLVM and its use in teaching
compiler design and optimization.
●​ Contributed to industry adoption of LLVM in production environments, from browsers
(e.g., Chrome) to high-performance computing systems.

8. Advocacy for Modular and Reusable Systems

●​ Advocated for modular design principles in compiler infrastructure, which have been
widely adopted in subsequent tools and frameworks.
●​ Demonstrated how such modularity accelerates innovation in language design and
software tooling.

759
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Chris Lattner’s work on LLVM and its ecosystem revolutionized compiler design, while his
creation of Swift brought modern, performant programming to the masses. His contributions
continue to influence both academia and industry.

760
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Data-Flow Analyses
Code: https://github.com/pronesto/DCC053_Public/tree/main/23_DataFlow

How do compilers obtain the information they need to safely optimize


programs?
Compilers rely on a variety of techniques to safely optimize programs while preserving their
semantics. Here are the key methods used to obtain the necessary information:

1. Static Analysis

●​ Control Flow Analysis (CFA): Builds a control flow graph (CFG) to determine the flow
of execution between basic blocks. This is crucial for optimizations like dead code
elimination and loop unrolling.
●​ Data Flow Analysis: Tracks how data values propagate through the program. Examples
include reaching definitions, live variable analysis, and available expressions.
●​ Alias Analysis: Determines whether two pointers or references can point to the same
memory location. Precise alias analysis is critical for optimizations like loop invariant
code motion and memory access reordering.
●​ Type Analysis: Identifies the types of variables and expressions, enabling type-specific
optimizations like vectorization.
●​ Loop Dependency Analysis: Determines dependencies between iterations of loops,
enabling parallelization or vectorization while avoiding data races.
●​ Memory Dependency Analysis: Ensures that optimizations like instruction reordering
or prefetching do not violate memory consistency.

2. Intermediate Representations (IRs)

●​ Compilers translate source code into an IR (e.g., LLVM IR, HIR/MIR in Rust) that is
simpler and more uniform. This facilitates analysis and enables transformations without
worrying about high-level syntactic details or low-level machine specifics.

3. Symbol Table and Scope Information

●​ The compiler uses symbol tables to track variables, functions, and their properties (e.g.,
type, scope, storage duration). This information ensures correct variable access and
assists in optimizations like constant propagation.

4. Profiling and Feedback (Dynamic Analysis)

●​ Profiling data can guide optimizations by identifying frequently executed code paths ("hot
spots"). Techniques like feedback-directed optimization (FDO) and profile-guided

761
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

optimization (PGO) use runtime data to improve decisions such as inlining and loop
unrolling.

5. Annotations and Language Features

●​ Language features like const, restrict, or final provide explicit guarantees about
immutability or aliasing. These hints allow compilers to apply more aggressive
optimizations safely.
●​ Pragmas or compiler-specific annotations (e.g., #pragma in C/C++) let developers
provide additional guidance.

6. Speculative Analysis

●​ Some optimizations are speculative and depend on assumptions that can be checked at
runtime. For example, speculative execution or value speculation optimizations are
guarded by runtime checks to ensure safety.

7. Compiler Heuristics

●​ Compilers often use heuristics based on common patterns and trade-offs, such as when
deciding whether to inline a function or allocate variables in registers.

8. Safety Guarantees

●​ Optimizations are always validated against the program's semantic rules to ensure
correctness. For example, constant folding or strength reduction is only applied when the
transformations are mathematically sound under the target language's rules.

For instance, consider the control-flow graph below. How does the compiler
know that two variables can occupy the same register?

762
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The compiler determines whether two variables can occupy the same register using liveness
analysis, a key part of register allocation. Here's how it works in the context of the control-flow
graph (CFG) you provided:

1. Liveness Analysis

●​ Definition: A variable is live at a point in the program if its value is used later without
being overwritten. The compiler calculates this information for each variable at every
program point.
●​ Steps:
1.​ Compute live-out information for each basic block in the CFG. This includes
variables that are live immediately after the block.
2.​ Propagate the live-out information backward through the block to determine
which variables are live at each point.

2. Interference Graph

●​ Once liveness information is available, the compiler builds an interference graph,


where:
○​ Each variable is represented as a node.
○​ An edge exists between two nodes if their variables are live at the same time
(i.e., they interfere and cannot share the same register).
●​ Variables without an edge between them in this graph can safely share a register.

3. Register Allocation

763
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Graph Coloring Algorithm: The compiler assigns registers to variables using a graph
coloring algorithm on the interference graph. Two interfering variables get different
"colors" (registers), while non-interfering variables can share the same "color."

Example in Your CFG

●​ Variable x:
○​ It is live throughout the loop, as it is read in multiple conditions and updated
repeatedly. It likely interferes with many other variables.
●​ Variable y:
○​ It is used within the loop but redefined on each iteration. If it is not live outside its
use within the loop, it may not interfere with other variables, like z.
●​ Variable z:
○​ Similar to y, z has a limited scope of use within the loop. It may be live at
different times than y.

In this way, the compiler can determine that y and z may share the same register if their
lifetimes do not overlap.

Optimization Opportunities

●​ By analyzing the CFG, the compiler might also perform optimizations like:
○​ Reusing registers for temporary values like y and z.
○​ Reducing memory access by keeping frequently used values, like x, in a register
across iterations.

What do you mean by a "Variable is Live" at a certain program point?


What's liveness?
Liveness is a key concept in compiler optimization, especially in register allocation. Here's a
precise definition and explanation:

Definition of Liveness

A variable is live at a particular program point if:

1.​ It holds a value that might be used later, i.e., the value is "alive" and necessary for the
correctness of the program.
2.​ Its value has not been overwritten by another assignment before that use.

In other words, a variable is live if there is a path in the control-flow graph (CFG) from the
current point to a point where the variable is used, and the variable is not redefined along that
path.

764
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Formalization

For a given variable v:

●​ v is live-in at a program point if it is live at the beginning of that point.


●​ v is live-out at a program point if it is live at the end of that point.

Example

Consider the snippet below:

a = 5;
b = 10;
c = a + b; // Use of a, b
d = c * 2; // Use of c

●​ At the point before c = a + b:


○​ a and b are live because their values are used in the computation of c.
○​ c is not live because it hasn't been computed yet.
●​ At the point after d = c * 2:
○​ c is live because it is used to compute d.
○​ a and b are not live unless used elsewhere in the program.

Algorithm for Liveness Analysis

Liveness analysis is a data-flow analysis performed on the control-flow graph (CFG) of the
program. Here's the high-level process:

1.​ For each basic block in the CFG:​

○​ Define:
■​ use: The set of variables used before any redefinition in the block.
■​ def: The set of variables defined (assigned a value) in the block.
○​ Compute:
■​ live-in[b] = use[b] ∪ (live-out[b] - def[b])
■​ live-out[b] = ∪ (live-in[s]) for each successor s of the block
b.
2.​ Iterate these equations until the sets stabilize (a fixed point is reached).

Visualizing Liveness in Your CFG

In the control-flow graph (CFG) example you provided:

765
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ A variable like x is live at every program point where its value might influence the output
x. This includes:
○​ The test conditions x > 1, y > 3, and z > 0.
○​ Any assignment or computation where x is used, like y = x / 2.

Practical Use of Liveness in Compilation

●​ Register Allocation: Variables live at the same time (overlapping lifetimes) cannot
share a register.
●​ Dead Code Elimination: Variables that are not live at any program point can be safely
removed, as their computation has no effect on the program's outcome.

How can the compiler find out the program points where a variable is alive?
Liveness analysis works in three steps: The figure below illustrates the first and the last of them:

1. Mapping Program Points to Equations

●​ Goal: Derive equations for each program point based on the data-flow relationships.
●​ For each program point, you compute the IN and OUT sets:
○​ IN[p]: The set of variables that are live before the program point p.
○​ OUT[p]: The set of variables that are live after the program point p.
●​ These sets are computed using two relationships:

766
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ OUT[p] = ∪ (IN[s]), where s is any successor of p (this depends on


control-flow).
○​ IN[p] = (OUT[p] \ DEF[p]) ∪ USE[p], where:
■​ DEF[p]: The variables defined (assigned) at program point p.
■​ USE[p]: The variables used at program point p before any definition in
that point.

2. Solving the Equations

●​ The equations are solved iteratively (iterative data-flow analysis) until a fixed point is
reached (i.e., the IN and OUT sets no longer change).
●​ This process propagates the liveness information throughout the control-flow graph:
○​ Backward in the case of liveness, as liveness is a backward analysis (it
depends on later program points).

3. Mapping Solutions to Program Points

●​ Once the equations stabilize, the computed IN and OUT sets are associated with each
program point, as shown in the figure's second part.
●​ This information is now ready for practical use, such as register allocation or dead code
elimination.

Practical Use of the Resulting IN and OUT Sets

●​ Variables in the IN set at any point are alive before that program point.
●​ Variables in the OUT set at any point are alive after that program point.

This approach makes the concept of liveness analysis systematic and easy to implement for
compilers. Would you like a worked-out example with equations?

How do we solve these equations?


There are many ways to solve the equations. Essentially, they define a "system of constraints",
and any programming language that is expressive enough to solve these constraints can find a
solution to the equations. For instance, the figure below shows a Prolog program that
implements our system of constraints:

767
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure presents an elegant way to solve liveness analysis using Prolog, where the
equations for liveness are translated directly into logical predicates. This approach leverages
Prolog's natural ability to express relationships and recursive computation, closely mirroring the
structure of data-flow equations.

Translation to Prolog:

1.​ Each equation is modeled as a logical predicate, e.g.:

2.​ Prolog recursively evaluates these predicates, computing IN and OUT sets for all
program points.

Using Prolog to Solve:

1.​ A query like the following computes the IN and OUT sets for all program points:​
?- member(P, [1, 2, 3, 4]), in(P, IN), out(P, OUT).
2.​ The solution provides the IN and OUT sets for each program point:​
P = 1, IN = [], OUT = [y];

768
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

P = 2, IN = [y], OUT = [x, y];


P = 3, IN = [x, y], OUT = [z];
P = 4, IN = [z], OUT = [].

The Prolog program captures the semantics of liveness analysis by modeling:

1.​ Variable Use/Definition: The diff operation removes killed variables, while union
adds used variables.
2.​ Control Flow: The recursive relationships between OUT and IN mimic the propagation
of live variables through the control-flow graph.
3.​ Data Flow: The computed IN and OUT sets align with the possible executions of the
program, reflecting which variables are live at each point.

Prolog is cool, but can you show me how to solve these equations in a
more "imperative" setting?
There are several ways to solve these equations in a more imperative setting. The simplest
approach is to use something called "chaotic iterations". Essentially, go over all the equations in
your list of equations and evaluate all of them. If there is any change on any IN or OUT set,
repeat this process; otherwise, stop it. It's like the code below:

def abstract_interp(equations):
env = {eq.name(): set() for eq in equations}
changed = True
while changed:
changed = False
for eq in equations:
if eq.eval(env)
changed = True
return env

Breakdown of The Solver

The method abstract_interp(equations):

1.​ Initialization:
○​ Each equation (representing a program point) starts with its associated set (IN or
OUT) initialized to an empty set or some default value, stored in the env
dictionary.
2.​ Iterative Updates:
○​ The equations are evaluated in some order (usually corresponding to the order of
program points).

769
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Each equation modifies the environment env (i.e., updates the IN or OUT sets)
based on the constraints of the data-flow analysis (e.g., IN[p] = (OUT[p] \
DEF[p]) ∪ USE[p] for liveness).
○​ If any set changes (e.g., a new variable is added to an IN or OUT set), the
changed flag is set to True, ensuring another iteration.
3.​ Termination:
○​ The iterations continue until no IN or OUT set changes in a complete pass over
the equations, at which point the solution has reached a fixed point.

Why Is This "Chaotic"?

●​ The term "chaotic iterations" refers to the fact that the equations can be processed in
any order, and the order of evaluation doesn't affect correctness (though it may affect
performance).
●​ In practice, heuristics like evaluating equations in reverse post-order (for backward
analyses like liveness) can speed up convergence.

Example Walkthrough for Liveness Analysis

For a simple control flow like:

1) y = input (Def: y, Use: {})


2) x = input (Def: x, Use: {})
3) z = x + y (Def: z, Use: {x, y})
4) output z (Def: {}, Use: {z})

We derive the equations:

●​ OUT[4] = ∅ (no successors)


●​ IN[4] = OUT[4] ∪ {z} = {z}
●​ OUT[3] = IN[4] \ {z} = {z}
●​ IN[3] = OUT[3] ∪ {x, y} = {z, x, y}
●​ OUT[2] = IN[3] \ {x} = {z, y}
●​ IN[2] = OUT[2] ∪ {} = {z, y}
●​ OUT[1] = IN[2] \ {y} = {z}
●​ IN[1] = OUT[1] ∪ {} = {z}

Chaotic Iteration Steps

1.​ Start with all IN and OUT sets empty.


2.​ Evaluate the equations for each program point:
○​ Update IN and OUT sets based on the current environment.
3.​ Repeat until no changes occur.

770
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The Evaluator in Context

The function you've written implements this process:

●​ Each equation in the list corresponds to the constraints for a program point.
●​ env tracks the state of the IN and OUT sets.
●​ eq.eval(env) updates the sets for a program point based on the current state of the
environment, and returns True if it caused a change.

But how do we know that this process terminates?


Chaotic iterations are guaranteed to terminate and reach a fixed point for data-flow analysis if
certain properties hold. Let's explore why this is the case.

1. Key Properties That Ensure Termination

a) Finite Lattice of Values

●​ The sets we are working with (IN and OUT) are drawn from a finite domain of values. For
example:
○​ Variables in the program are finite.
○​ IN[p] and OUT[p] are subsets of all variables.
●​ This means there are only a finite number of possible states for IN and OUT.

b) Monotonic Transfer Functions

●​ Each equation defines how IN and OUT are updated, often using a transfer function.
These transfer functions must be monotonic, meaning that if the input to the function
grows (in terms of set inclusion), the output will also grow or remain the same.

Example for liveness analysis:



IN[p] = (OUT[p] \ DEF[p]) ∪ USE[p]

Thus, if OUT[p] grows, so does IN[p].

c) Fixed Point Exists

●​ Since the transfer functions are monotonic and the lattice of values is finite, the process
will converge to a fixed point (where no further changes occur).

2. How Does It Terminate?

The process terminates because:

771
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ At every iteration, the IN and OUT sets can only grow (for backward problems like
liveness) or shrink (for forward problems like reaching definitions).
2.​ Since the sets are subsets of a finite set (the program's variables), there is a maximum
size they can reach.
3.​ Once all IN and OUT sets stop changing (no new variables are added or removed), the
process halts.

For example:

●​ If a program has 3 variables (x, y, z), the possible values of IN[p] and OUT[p] for any
program point are subsets of {x, y, z}, i.e., there are 23=82^3 = 8 possible states for
each set.
●​ At most, after a finite number of steps, the process will exhaustively explore these states,
reaching a fixed point.

3. Formal Proof of Termination

The termination relies on two mathematical properties:

1.​ Finite Height of the Lattice:


○​ The lattice of subsets (e.g., power set of variables) has a maximum height. For
nn variables, the height is nn (the number of variables that can be added to a set
step-by-step).
2.​ Monotonicity:​

○​ Transfer functions guarantee that the values in the lattice either grow (for
backward problems) or shrink (for forward problems).
○​ Since the lattice has finite height, the process eventually reaches a point where
further updates do not change the IN or OUT sets.

4. Why Are Fixed Points Guaranteed?

●​ The iterative process computes a fixed point of the transfer functions.


●​ A fixed point occurs when applying the transfer function no longer changes the IN and
OUT sets.
●​ At this point, all equations are satisfied, and the solution stabilizes.

5. Practical Example

Take a simple program:

1) x = input (DEF: x, USE: {})


2) z = x + y (DEF: z, USE: {x, y})
3) output z (DEF: {}, USE: {z})

772
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Initially, all IN and OUT sets are empty.


●​ The equations ensure that IN and OUT grow monotonically until all live variables are
accounted for. For example:
○​ Initially, IN[3] = OUT[3] ∪ {z} = {z}.
○​ This propagates back to OUT[2], which becomes {x, y, z}.
○​ After a few iterations, no further changes occur.

6. Summary

The process of solving data-flow equations (like in liveness analysis) terminates because:

1.​ The domain of possible values is finite.


2.​ The transfer functions are monotonic.
3.​ A fixed point must exist.

If you apply these principles correctly, you can be sure the iterative process will halt and give the
correct result. Let me know if you'd like further clarifications or examples!

You mentioned "Lattice". What's that?


A lattice is a mathematical structure that is often used to reason about the values computed in
data-flow analyses like liveness. In this context, lattices help model the space of possible
solutions and guarantee properties like termination and correctness. Let's break it down.

1. What Is a Lattice?

A lattice is a partially ordered set (poset) that satisfies the following conditions:

●​ Partial Order: Elements in the set are ordered by some relation ≤\leq (e.g., "subset of"
for sets).
●​ Join (Least Upper Bound): For any two elements aa and bb, there is a unique smallest
element cc such that a≤ca \leq c and b≤cb \leq c. This is called the join (a∨ba \vee b).
●​ Meet (Greatest Lower Bound): For any two elements aa and bb, there is a unique
largest element dd such that d≤ad \leq a and d≤bd \leq b. This is called the meet (a∧ba
\wedge b).

Example: The set of subsets of a finite set (e.g., {x,y,z}) forms a lattice:

●​ Join (∨): Set union (A∪B).


●​ Meet (∧): Set intersection (A∩B).
●​ Order Relation (≤): Subset relation (A⊆B).

2. How Does a Lattice Relate to Data-Flow Analysis?

773
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In data-flow analysis, the lattice represents the domain of values that we are computing. Each
program point maps to an element in this lattice (e.g., the set of live variables at that point for
liveness analysis). The structure of the lattice ensures:

1.​ Monotonicity: The equations used in the analysis respect the lattice order. For example,
if OUT grows, IN will also grow or stay the same.
2.​ Fixed Points: The iterative process used in chaotic iterations converges to a fixed point
because the lattice is finite and the operations respect the partial order.

Example: Lattice in Liveness Analysis

●​ Domain: Subsets of variables P(V), where V={x,y,z,… }.


●​ Join: Union of sets (A∪B).
●​ Meet: Intersection of sets (A∩B).
●​ Order Relation: Subset relation (A⊆B).

The iterative process works within this lattice:

●​ It starts with the smallest element (∅, no live variables).


●​ Monotonically updates sets by adding variables using the equations for IN and OUT.
●​ Stops when no more variables can be added (reaching a fixed point, which is the top of
the lattice).

3. How Does the Lattice Help in Data-Flow Analyses?

a) General Framework for Analyses

Different data-flow problems (e.g., liveness analysis, reaching definitions, constant propagation)
can be expressed in terms of lattices:

●​ The lattice elements represent the possible states of program points.


●​ The transfer functions define how these states evolve.

For example:

●​ Reaching Definitions: The lattice is the power set of all definitions.


●​ Constant Propagation: The lattice represents possible values of variables (e.g.,
{constant, top, bottom}).

b) Termination

A lattice with a finite height guarantees that the iterative process converges because:

1.​ The process can only move upwards (or downwards, for some analyses) in the lattice.
2.​ There is a maximum height, so the process terminates.

774
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

c) Correctness

The fixed point computed in the lattice corresponds to the solution to the data-flow equations,
ensuring that the analysis gives the correct result.

4. Summary of Lattices in Data-Flow Analysis

1.​ Domain of Values: The lattice represents the possible states for each program point
(e.g., sets of variables).
2.​ Transfer Functions: Define how states evolve, respecting monotonicity.
3.​ Convergence Guarantee: A finite-height lattice ensures that chaotic iterations terminate
at a fixed point.
4.​ General Applicability: Different analyses (liveness, reaching definitions, etc.) use
different lattices but follow the same principles.

Can you show how a lattice models the values computed by our liveness
analysis?
Let's work through this example below, step by step:

1. Overview of the Example

●​ The program consists of five instructions (d0d0 to d4d4):


○​ d0:a=read()
○​ d1:b=read()
○​ d2:a>b? (conditional branch)
○​ d3:print(a)
○​ d4:print(b)
●​ The lattice is the power set of {a,b} (the variables in the program):
○​ Elements: {},{a},{b},{a,b}
○​ Order: Subset relation (⊆).

775
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Join (∨): Set union (∪).


○​ Meet (∧): Set intersection (∩).

Data-Flow Equations

The monotone data-flow equations describe how the IN and OUT sets evolve:

●​ IN[n] and OUT[n] represent the live variables before and after a program point nn,
respectively.
●​ IN[n] = (OUT[n] ∖ DEF[n]) ∪ USE[n]
●​ OUT[n] = ⋃IN[s], where s ∈ succ[n]

For this program:

●​ USE: Variables used in an instruction.


●​ DEF: Variables defined in an instruction.

2. Step-by-Step Analysis Using the Lattice

We solve the equations iteratively, starting with OUT[n]={} for all instructions and updating until a
fixed point is reached.

Initialization

Start with:

●​ OUT[d3]={},OUT[d4]={} (no successors).


●​ Other OUT[n]={} (initial guess).

Iteration 1

1.​ d3:print(a)
○​ USE={a},DEF={}
○​ IN[d3]=(OUT[d3]∖DEF)∪USE={}∪{a}={a}
2.​ d4:print(b)
○​ USE={b},DEF={}
○​ IN[d4]=(OUT[d4]∖DEF)∪USE={}∪{b}={b}
3.​ d2:a>b?
○​ Successors: d3 and d4.
○​ OUT[d2]=IN[d3]∪IN[d4]={a}∪{b}={a,b}
○​ IN[d2]=(OUT[d2]∖DEF)∪USE={a,b} (no definitions or uses in d2).
4.​ d1:b=read()
○​ Successor: d2d2.
○​ OUT[d1]=IN[d2]={a,b}
○​ IN[d1]=(OUT[d1]∖DEF)∪USE=({a,b}∖{b})∪{}={a}.
5.​ d0:a=read()d0: a = \text{read()}

776
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Successor: d1.
○​ OUT[d0]=IN[d1]={a}.
○​ IN[d0]=(OUT[d0]∖DEF)∪USE=({a}∖{a})∪{}={}.

Iteration 2

We repeat the process, but now there are no changes to any of the IN or OUT sets. The fixed
point is reached, so we stop.

3. Role of the Lattice in This Process

1.​ Representation of States


○​ The lattice provides a framework to model the evolution of IN and OUT sets.
○​ The lattice structure ensures that all possible subsets of variables are
considered.
2.​ Monotonicity
○​ The transfer functions are monotonic: they only move upwards (add variables) in
the lattice.
○​ For example, OUT[d2]=IN[d3]∪IN[d4] ensures that the new OUT[d2] is at least
as large as the previous value.
3.​ Convergence
○​ The lattice has a finite height (2∣V∣}, where ∣V∣ is the number of variables).
○​ Iterative updates eventually reach the top or stabilize at a fixed point.
4.​ Correctness
○​ The fixed point represents the correct solution to the liveness equations.

4. Final Results

At the fixed point:

●​ IN[d0] = {}
●​ OUT[d0] = {a}
●​ IN[d1] = {a}
●​ OUT[d1] = {a,b}
●​ IN[d2] = {a,b}
●​ OUT[d2] = {a,b}
●​ IN[d3] = {a}
●​ OUT[d3] = {}
●​ IN[d4] = {b}
●​ OUT[d4] = {}

777
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

We have been talking about liveness analysis, but there are many other
data-flow analyses, right?
Yes, there are many more! Liveness analysis is just one example of many data-flow analyses.
These analyses are designed to gather information about program behavior by solving systems
of equations over program control flow graphs. Here's an overview of a few common data-flow
analyses, along with their key equations:

1. Reaching Definitions Analysis

Purpose: Determines which definitions of variables may reach each program point.

●​ Direction: Forward (from entry to exit).


●​ Key Sets:
○​ DEF[n]: Variables defined at node n.
○​ GEN[n]: Definitions generated at node n.
○​ IN[n]: Definitions reaching the start of n.
○​ OUT[n]: Definitions reaching the end of n.

Equations:

IN[n] = ⋃ OUT[p] # Union of OUT sets of all predecessors.


OUT[n] = (IN[n] \ DEF[n]) ∪ GEN[n]

2. Available Expressions Analysis

Purpose: Determines which expressions are guaranteed to have already been computed and
not invalidated by redefinition.

●​ Direction: Forward.
●​ Key Sets:
○​ GEN[n]: Expressions computed at n.
○​ KILL[n]: Expressions invalidated at n.
○​ IN[n]: Available expressions at the start of n.
○​ OUT[n]: Available expressions at the end of n.

Equations:

IN[n] = ⋂ OUT[p] # Intersection of OUT sets of all predecessors.


OUT[n] = (IN[n] \ KILL[n]) ∪ GEN[n]

3. Very Busy Expressions Analysis

778
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Purpose: Determines expressions that will definitely be computed in the future, along all paths
from a program point.

●​ Direction: Backward (from exit to entry).


●​ Key Sets:
○​ GEN[n]: Expressions that are used before any redefinition in n.
○​ KILL[n]: Expressions killed by redefinitions in n.
○​ IN[n]: Very busy expressions at the start of n.
○​ OUT[n]: Very busy expressions at the end of n.

Equations:

OUT[n] = ⋂ IN[s] # Intersection of IN sets of all successors.


IN[n] = (OUT[n] \ KILL[n]) ∪ GEN[n]

5. Live Variables Analysis

Purpose: Determines which variables are "live" (potentially used in the future) at each program
point.

●​ Direction: Backward.
●​ Key Sets:
○​ USE[n]: Variables used at node n.
○​ DEF[n]: Variables defined at node n.
○​ IN[n]: Live variables at the start of n.
○​ OUT[n]: Live variables at the end of n.

Equations:

OUT[n] = ⋃ IN[s] # Union of IN sets of all successors.


IN[n] = (OUT[n] \ DEF[n]) ∪ USE[n]

Observations Across All Analyses

1.​ Directionality (Forward or Backward):


○​ Forward: Information flows from the entry of the program to the exit.
○​ Backward: Information flows from the exit to the entry.
2.​ Fixed Point Computation:
○​ Each analysis solves equations iteratively over a lattice.
○​ The fixed point represents the stable solution to the analysis.
3.​ Domain of Analysis:
○​ The lattice depends on the type of analysis.

779
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Example: Power set of variables (for liveness), constant values (for constant
propagation), etc.

Let's see one more data-flow analysis. First, can you explain which kind of
information you need to optimize the program below?

Variable z in the program above is "dead": it's not used anywhere, except to redefine itself. To
demonstrate that variable z is dead, we can use reaching-definition analysis.

Step 1: Reaching Definitions for z

Reaching-definition analysis tells us where a variable is defined and whether that definition
"reaches" a certain point in the program. For variable z, definitions occur at the following points:

1.​ d5: z = x - 4
2.​ d8: z = z - 1

Step 2: Liveness of z

A variable is live at a program point if its current value is used later in the program along any
path. If the value of z is not used anywhere after its definition, the variable is considered dead.

From the control-flow graph:

1.​ The last use of z in the program should be identified to determine if any of its definitions
reach that use.
2.​ Looking at the program and CFG:

780
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ After d5: z = x - 4, z is redefined at d8: z = z - 1 without being used in


between.
○​ After d8: z = z - 1, the value of z is never used, as there are no instructions
that reference z after this point.

Step 3: Key Conclusion

●​ Reaching-definition analysis shows that:


○​ d5 and d8 reach their respective points in the program but are never followed by
a use of z.
○​ Thus, variable z is dead after its last definition (d8).

This conclusion can be leveraged to optimize the program by:

1.​ Removing dead code associated with z (e.g., d5 and d8).


2.​ Saving resources like memory or registers that would have been unnecessarily allocated
for z.

Can you explain what's reaching-definition analysis?


Yes! Reaching-definition analysis is another fundamental data-flow analysis. Let me explain it
step by step, similar to how we discussed liveness analysis.

Definition: What is Reaching-Definition Analysis?

Reaching-definition analysis determines which definitions of a variable can reach a given


program point. A definition of a variable is any point in the program where its value is assigned
(e.g., x = ...). The goal is to figure out whether a given definition of a variable might be used
without being overwritten by another definition.

Use Case:

●​ It is useful for detecting dead code, and doing constant propagation, for instance.
●​ Example: If a definition of a variable does not reach any usage, it may indicate that this
definition is redundant.

Reaching-Definition Analysis as a Data-Flow Analysis

1. Data-Flow Equations

Reaching-definition analysis is a forward data-flow analysis because we compute the


definitions that flow forward along the control-flow graph (CFG).

Each program point has:

781
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ IN[di]: The set of definitions reaching the start of instruction di.


●​ OUT[di]: The set of definitions reaching the end of instruction di.

The equations for reaching-definition analysis are:

OUT[di] = GEN[di] ∪ (IN[di] \ KILL[di])


IN[di] = ∪ (OUT[p]) # For all predecessors p of di

2. Components of the Equations

●​ GEN[di]: The set of definitions generated by di (e.g., x = ...).


●​ KILL[di]: The set of definitions killed by di, i.e., other definitions of the same variable
that are overwritten.

For example:

d1: x = 5 --> GEN[d1] = {d1}, KILL[d1] = {other definitions of x}


d2: y = x --> GEN[d2] = {}, KILL[d2] = {}

Algorithm for Solving the Equations

The algorithm is similar to liveness analysis:

1.​ Initialize IN and OUT for all program points to empty sets.
2.​ Iterate over the CFG until all IN and OUT sets stabilize (i.e., no further changes occur).

Example

Let’s walk through a small example:

Program:
1: x = 5
2: y = x
3: x = 10
4: print(y)

CFG:

The control-flow graph is linear since there are no branches or loops:

1→2→3→4

Analysis:

1.​ Definitions:

782
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ d1: x = 5
○​ d2: y = x
○​ d3: x = 10
2.​ GEN and KILL:
○​ GEN[d1] = {d1}; KILL[d1] = {d3} (other x definitions)
○​ GEN[d2] = {}; KILL[d2] = {} (no definitions killed)
○​ GEN[d3] = {d3}; KILL[d3] = {d1} (kills previous x definitions)
3.​ Equations:
○​ OUT[1] = GEN[1] ∪ (IN[1] \ KILL[1]) = {d1}
○​ IN[2] = OUT[1] = {d1}
○​ OUT[2] = GEN[2] ∪ (IN[2] \ KILL[2]) = {d1}
○​ IN[3] = OUT[2] = {d1}
○​ OUT[3] = GEN[3] ∪ (IN[3] \ KILL[3]) = {d3}
○​ IN[4] = OUT[3] = {d3}
4.​ Final Results:
○​ IN[1] = {}; OUT[1] = {d1}
○​ IN[2] = {d1}; OUT[2] = {d1}
○​ IN[3] = {d1}; OUT[3] = {d3}
○​ IN[4] = {d3}; OUT[4] = {}

Key Observations

●​ At d4 (print(y)), the definition of y from d2 reaches, but x = 5 (from d1) is killed


by x = 10 (from d3).
●​ Reaching-definition analysis shows which definitions of x and y are live or dead at each
point.

How can you use the results of a reaching-definition analysis to optimize


the variable z away in our original example?
The figure below shows the result of a reaching-definition analysis:

783
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The definition of z at d8 is dead, because it does not reach any instruction that uses variable z.
The only use of z is d8 itself. However, the definition of z at d8 is killed by d5. Because d8 is
dead, so is d5. Once we remove d8 from the CFG, there will be no other use of variable z, and
we can safely eliminate d5, where variable z is defined.

Let's implement a reaching-definition analysis. We will need a core


language to analyze. Can you provide one?
Let's try to use a very simple language, with only five types of instructions: add, mul, lth, geq
and bt. An interpreter for this language is given below:

784
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The semantics of the five instructions in this intermediate representation (IR) are as follows:

1.​ add(dst, src0, src1, next):


○​ Semantics: Compute the sum of the values in src0 and src1, store the result in
dst, and continue with the instruction labeled next.

S0[src0] = v0
S0[src1] = v1
v = v0 + v1
S1 = S0 ∪ {dst: v}
S1 ⊢ interp(next) ⇒ S2

○​ This means the interpretation of add updates the state by adding a new binding
for dst and proceeds to interpret the next instruction.
2.​ mul(dst, src0, src1, next):
○​ Semantics: Compute the product of the values in src0 and src1, store the
result in dst, and continue with the instruction labeled next.

785
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

S0[src0] = v0
S0[src1] = v1
v = v0 × v1
S1 = S0 ∪ {dst: v}
S1 ⊢ interp(next) ⇒ S2

3.​ lth(dst, src0, src1, next):


○​ Semantics: Compare the values in src0 and src1. If src0 is less than src1,
store true in dst. Otherwise, store false. Continue with the instruction labeled
next. Formal description (when src0 < src1):

S0[src0] = v0
S0[src1] = v1
v0 < v1
S1 = S0 ∪ {dst: true}
S1 ⊢ interp(next) ⇒ S2

○​ If the condition v0 < v1 is not satisfied, dst would instead be updated to


false.
4.​ geq(dst, src0, src1, next):
○​ Semantics: Compare the values in src0 and src1. If src0 is greater than or
equal to src1, store true in dst. Otherwise, store false. Continue with the
instruction labeled next. Formal description (when src0 ≥ src1):

S0[src0] = v0
S0[src1] = v1
v0 ≥ v1
S1 = S0 ∪ {dst: true}
S1 ⊢ interp(next) ⇒ S2

○​ If the condition v0 ≥ v1 is not satisfied, dst would instead be updated to


false.
5.​ bt(cond, next_true, next_false):
○​ Semantics: If the value in cond is true, interpret the instruction labeled
next_true. Otherwise, interpret the instruction labeled next_false. Formal
description, when cond is true:

S0[cond] = true
S1 ⊢ interp(next_true) ⇒ S1

○​ When cond is false:

S0[cond] = false

786
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

S1 ⊢ interp(next_false) ⇒ S1

In this representation:

●​ S0 is the initial state (a mapping of variables to their values).


●​ S1 is the state after interpreting the current instruction.
●​ S2 is the state after interpreting subsequent instructions.

Can you implement this semantics in Python?


Sure. Find below an implementation of this interpreter using Python.

Here's some details about our implementation:

787
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Class Design:


○​ We've organized the instructions hierarchically, with Inst as the base class and
specific instruction types (BinOp, Add, Mul, etc.) inheriting from it.
○​ The abstract methods definition and uses encapsulate information about
variable definitions and dependencies, which are useful for program analysis
(e.g., data flow analysis).
2.​ State Management:
○​ The Env class, implemented with a deque, is a way to manage variable bindings.
The last binding will be the current valid association.
3.​ Instruction Evaluation:
○​ Each instruction implements its own eval method, which modifies the
environment according to its semantics. That's an object-oriented style of coding.
4.​ Branching:
○​ The Bt instruction handles conditional branching by using nexts to store the two
possible paths. The get_next method ensures the program execution continues
correctly.
5.​ Recursive Interpretation:
○​ The interp function recursively evaluates the program until there are no more
instructions. This mirrors the abstract machine model.

Example Walkthrough

Let's break down the example step-by-step to confirm its behavior:

Setup

1.​ Initial Environment: {"m": 3, "n": 2, "zero": 0}.


2.​ Instructions:
○​ m_min = Add("answer", "m", "zero") adds m and zero.
○​ n_min = Add("answer", "n", "zero") adds n and zero.
○​ p = Lth("p", "n", "m") compares if n < m.
○​ b = Bt("p", n_min, m_min) branches to n_min if p is true; otherwise, it
branches to m_min.

Execution

●​ Start at p.
○​ eval: p = (2 < 3) = true.
●​ Branch to n_min since p is true.
○​ eval: answer = n + zero = 2.
●​ Since n_min has no successor, the program terminates.

788
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Result

The final environment is {"m": 3, "n": 2, "zero": 0, "p": true, "answer": 2}.

Computing Fibonacci Numbers using our Toy Intermediate Representation:


The figure below shows a program, in our simple intermediate representation, that computes
Fibonacci numbers:

The fib function receives an input n, and creates the necessary instructions to compute the
n-th Fibonacci number!

Reaching definitions is represented by a set of equations. How can we


implement these equations?
Perhaps, the first question is: how can we represent "reaching definitions". Essentially, reaching
definitions are pairs formed by variables and the program points where these variables are
defined. Each instruction exists at a program point L, and if that instruction defines a variable V,
then we say that the definition (V, L) reaches the OUT set of that point. Given these
considerations, we have to implement the following system of equations:

789
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

We can start implementing the IN and OUT sets. These sets contain pairs like (V, L0), meaning
that variable V is defined at L0. We can implement a general "environment" that associates
each program point (the ID of an instruction), with an IN and and OUT set, e.g.:

{'IN_0': {('x', 1), ('y', 2)}, 'OUT_0': {('x', 1)}}

In this case, we use strings to represent the keys of IN and OUT sets. Thus, the IN set at
program point 0 will be accessed by the key "IN_0". Given these considerations, and equation is
an object that we can "evaluate" to produce an updated IN (or OUT) set, e.g.:

from lang import *


from abc import ABC, abstractclassmethod

class DataFlowEq(ABC):
"""
A class that implements a data-flow equation. The key trait of a data-flow
equation is an `eval` method, which evaluates that equation. The evaluation
of an equation might change the environment that associates data-flow facts
with identifiers.
"""
def __init__(self, instruction):
"""
Every data-flow equation is produced out of a program instruction. The
initialization of the data-flow equation verifies if, indeed, the input
object is an instruction.
"""
assert(isinstance(instruction, Inst))
self.inst = instruction

790
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

@abstractclassmethod
def name(self) -> str:
"""
The name of a data-flow equation is used to retrieve the data-flow
facts associated with that equation in the environment. For instance,
imagine that we have an equation like this one below:

"OUT[p] = (v, p) + (IN[p] - (v, _))"

This equation affects OUT[p]. We store OUT[p] in a dictionary. The name


of the equation is used as the key in this dictionary. For instance,
the name of the equation could be 'OUT_p'.
"""
raise NotImplementedError

@abstractclassmethod
def eval_aux(self, data_flow_env) -> set:
"""
This method determines how each concrete equation evaluates itself.
In a way, this design implements the 'template method' pattern. In other
words, the DataFlowEq class implements a concrete method eval, which
calls the abstract method eval_aux. It is the concrete implementation of
eval_aux that determines how the environment is affected by the
evaluation of a given equation.
"""
raise NotImplementedError

def eval(self, data_flow_env) -> bool:


"""
This method implements the abstract evaluation of a data-flow equation.
Notice that the actual semantics of this evaluation will be implemented
by the `èval_aux` method, which is abstract.
"""
old_env = data_flow_env[self.name()]
data_flow_env[self.name()] = self.eval_aux(data_flow_env)
return True if data_flow_env[self.name()] != old_env else False

Equations have two concrete representations: we have equations that compute IN sets, and
equations that compute OUT sets. The former follows below:

def name_in(ID):
"""
The name of an IN set is always ID + _IN. Eg.:
>>> Inst.next_index = 0
>>> add = Add('x', 'a', 'b')
>>> name_in(add.ID)
'IN_0'
"""
return f"IN_{ID}"

class IN_Eq(DataFlowEq):
"""
This abstract class represents all the equations that affect the IN set

791
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

related to some program point.


"""
def name(self):
return name_in(self.inst.ID)

class ReachingDefs_IN_Eq(IN_Eq):
"""
This concrete class implements the meet operation for reaching-definition
analysis. The meet operation produces the IN set of a program point. This
IN set is the union of the OUT set of the predecessors of this point.
"""
def eval_aux(self, data_flow_env):
"""
The evaluation of the meet operation over reaching definitions is the
union of the OUT sets of the predecessors of the instruction.

Example:
>>> Inst.next_index = 0
>>> i0 = Add('x', 'a', 'b')
>>> i1 = Add('x', 'c', 'd')
>>> i2 = Add('y', 'x', 'x')
>>> i0.add_next(i2)
>>> i1.add_next(i2)
>>> df = ReachingDefs_IN_Eq(i2)
>>> sorted(df.eval_aux({'OUT_0': {('x', 0)}, 'OUT_1': {('x', 1)}}))
[('x', 0), ('x', 1)]
"""
solution = set()
for inst in self.inst.preds:
solution = solution.union(data_flow_env[name_out(inst.ID)])
return solution

def __str__(self):
"""
The name of an IN set is always ID + _IN.

Example:
>>> Inst.next_index = 0
>>> i0 = Add('x', 'a', 'b')
>>> i1 = Add('x', 'c', 'd')
>>> i2 = Add('y', 'x', 'x')
>>> i0.add_next(i2)
>>> i1.add_next(i2)
>>> df = ReachingDefs_IN_Eq(i2)
>>> str(df)
'IN_2: Union( OUT_0, OUT_1 )'
"""
succs = ', '.join([name_out(pred.ID) for pred in self.inst.preds])
return f"{self.name()}: Union( {succs} )"

And the equations that compute OUT sets are implemented as follows:

def name_out(ID):
"""

792
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The name of an OUT set is always ID + _OUT. Eg.:


>>> Inst.next_index = 0
>>> add = Add('x', 'a', 'b')
>>> name_out(add.ID)
'OUT_0'
"""
return f"OUT_{ID}"

class OUT_Eq(DataFlowEq):
"""
This abstract class represents all the equations that affect the OUT set
related to some program point.
"""
def name(self):
return name_out(self.inst.ID)

class ReachingDefs_Bin_OUT_Eq(OUT_Eq):
"""
This concrete class implements the equations that affect OUT facts of the
reaching-definitions analysis for binary instructions. These instructions
have three fields: dst, src0 and src1; however, only the former is of
interest for these equations.
"""
def eval_aux(self, data_flow_env):
"""
Evaluates this equation, where:
OUT[p] = (v, p) + (IN[p] - (v, _))

Example:
>>> Inst.next_index = 0
>>> i0 = Add('x', 'a', 'b')
>>> df = ReachingDefs_Bin_OUT_Eq(i0)
>>> sorted(df.eval_aux({'IN_0': {('x', 1), ('y', 2)}}))
[('x', 0), ('y', 2)]
"""
in_set = data_flow_env[name_in(self.inst.ID)]
new_set = {(v, p) for (v, p) in in_set if v != self.inst.dst}
return new_set.union([(self.inst.dst, self.inst.ID)])

def __str__(self):
"""
A string representation of a reaching-defs equation representing
a binary instruction. Eg.:
>>> Inst.next_index = 0
>>> add = Add('x', 'a', 'b')
>>> df = ReachingDefs_Bin_OUT_Eq(add)
>>> str(df)
'OUT_0: (x, 0) + (IN_0 - (x, _))'
"""
kill_set = f" + ({name_in(self.inst.ID)} - ({self.inst.dst}, _))"
gen_set = f"({self.inst.dst}, {self.inst.ID})"
return f"{self.name()}: {gen_set}{kill_set}"

class ReachingDefs_Bt_OUT_Eq(OUT_Eq):
"""

793
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This concrete class implements the equations that affect OUT facts of the
reaching-definitions analysis for branch instructions. These instructions
do not affect reaching definitions at all. Therefore, their equations are
mostly treated as identity functions.
"""
def eval_aux(self, data_flow_env):
"""
Evaluates this equation. Notice that the reaching definition equation
for a branch instruction is simply the identity function.
OUT[p] = IN[p]

Example:
>>> Inst.next_index = 0
>>> i0 = Bt('x')
>>> df = ReachingDefs_Bt_OUT_Eq(i0)
>>> sorted(df.eval_aux({'IN_0': {('x', 1), ('y', 2)}}))
[('x', 1), ('y', 2)]
"""
return data_flow_env[name_in(self.inst.ID)]

def __str__(self):
"""
A string representation of a reaching-defs equation representing a
branch. Eg.:
>>> Inst.next_index = 0
>>> i = Bt('x')
>>> df = ReachingDefs_Bt_OUT_Eq(i)
>>> str(df)
'OUT_0: IN_0'
"""
kill_set = f"{name_in(self.inst.ID)}"
gen_set = f""
return f"{self.name()}: {gen_set}{kill_set}"

And how can we extract equations from instructions?


Extracting equations is what, in data-flow parlance, we call "constraint generation": we are
producing a series of equations (constraints) for every instruction that makes up the program. In
our case, we can extract constraints as follows:

def reaching_defs_constraint_gen(insts):
"""
Builds a list of equations to solve Reaching-Definition Analysis for the
given set of instructions.

Example:
>>> Inst.next_index = 0
>>> i0 = Add('c', 'a', 'b')
>>> i1 = Mul('d', 'c', 'a')
>>> i2 = Lth('e', 'c', 'd')
>>> i0.add_next(i2)
>>> i1.add_next(i2)

794
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

>>> insts = [i0, i1, i2]


>>> sol = [str(eq) for eq in reaching_defs_constraint_gen(insts)]
>>> sol[0] + " " + sol[-1]
'OUT_0: (c, 0) + (IN_0 - (c, _)) IN_2: Union( OUT_0, OUT_1 )'
"""
in0 = [ReachingDefs_Bin_OUT_Eq(i) for i in insts if isinstance(i, BinOp)]
in1 = [ReachingDefs_Bt_OUT_Eq(i) for i in insts if isinstance(i, Bt)]
out = [ReachingDefs_IN_Eq(i) for i in insts]
return in0 + in1 + out

Key Ideas of the Implementation of Reaching-Definition Analysis

1.​ Abstract and Modular Design:


○​ The program uses an abstract base class (DataFlowEq) to define a generic
interface for data-flow equations. This design ensures modularity, allowing
different types of data-flow equations to be implemented independently while
sharing a common interface (eval method).
2.​ Separation of Concerns:
○​ The code differentiates between input (IN) and output (OUT) equations by
defining abstract classes IN_Eq and OUT_Eq. These serve as templates for
implementing specific equations that affect the IN and OUT sets of instructions.
3.​ Reaching Definition Semantics:
○​ The analysis identifies which variable definitions "reach" a particular program
point. Each OUT set is calculated by removing old definitions (kill set) and adding
new definitions (gen set). The IN set at a program point is derived from the union
of the OUT sets of its predecessors.
4.​ Binary Instructions:
○​ For instructions like add, mul, geq, and lth, the ReachingDefs_Bin_OUT_Eq
class evaluates the OUT set. This involves:
■​ Removing any previous definitions of the destination variable (kill
set).
■​ Adding the current instruction's definition of the destination variable (gen
set).
5.​ Branch Instructions:
○​ The ReachingDefs_Bt_OUT_Eq class handles branch instructions (bt). These
instructions do not modify reaching definitions, so the OUT set is simply identical
to the IN set (identity function).
6.​ Meet Operation:
○​ The ReachingDefs_IN_Eq class evaluates the IN set at a program point by
applying a meet operation (union of the OUT sets of predecessor nodes). This
captures the flow of definitions from multiple control flow paths.
7.​ Template Method Pattern:

795
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The eval method in DataFlowEq serves as a template, calling the eval_aux


method implemented in subclasses. This allows the high-level logic for evaluating
equations to remain consistent while enabling specific behaviors for different
instruction types.
8.​ Dynamic Environment Updates:
○​ The environment (data_flow_env) is a dictionary that maps IN and OUT sets
to program points. Each equation updates its corresponding entry in this
environment, ensuring a dynamic and iterative approach to data-flow analysis.
9.​ Equations as Strings:
○​ Each equation class provides a __str__ method to generate a human-readable
representation of the equation. This aids debugging and visualization of the
analysis.
10.​Scalability:
○​ By leveraging abstract and concrete classes, the implementation can be
extended to support additional instruction types or analyses with minimal
changes to the existing code structure.

Example
The figure below shows the equations that are produced for a simple program. The program
computes Fibonacci numbers, but its exact semantics will bear no consequence on this
presentation. All that we need are the equations.

796
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The OUT sets are shown in red, and the IN sets are shown in blue. Once we have all these
equations, we can solve them using chaotic iterations, or any faster algorithm. The figure below
shows a solution to reaching definitions:

How do we interpret a solution to the reaching definition analysis?


A solution to the reaching-definition equations maps every program point to:

1.​ IN sets: Represent definitions that reach the entry of a program point. These are
definitions that may influence the computation at that point.
2.​ OUT sets: Represent definitions that reach the exit of a program point. These are
definitions that may propagate to subsequent program points.

Each definition is represented as a pair (variable, point) where:

●​ variable is the defined variable.


●​ point is the program point where the definition occurs.

Interpretation of the Solution

The solution provides information about how variable definitions flow through the program.
Specifically:

1.​ Definition Reachability: The IN set shows all the definitions that might affect the
current instruction. If a variable is used at a program point, its definition must appear in
the IN set.

797
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Propagation: The OUT set determines which definitions leave the program point and
may affect downstream instructions.
3.​ Control Flow Influence: The IN and OUT sets reflect the possible paths of program
execution in the control-flow graph (CFG).

Relation to Program Execution

Reaching-definition analysis is inherently tied to the possible executions of the program:

●​ It considers all possible control-flow paths (including loops and branches).


●​ The analysis is conservative, meaning it assumes all feasible paths in the CFG could be
executed. This ensures correctness but might overestimate which definitions reach a
point.

For example:

●​ At program point 3, the IN set {(count, 0), (fib, 2), (pred, 1)} means that:
○​ The definition of count at 0.
○​ The definition of fib at 2.
○​ The definition of pred at 1. might influence the computation of aux = zero +
fib.

What Can You Do With This Solution?

1.​ Dead Code Elimination:


○​ If a definition is in the OUT set but is never used (i.e., does not appear in the IN
set of any successor), it can be safely removed.
2.​ Data-Flow Optimizations:
○​ The analysis can help identify redundant computations or definitions that can be
replaced with earlier definitions to save computation.
3.​ Debugging and Verification:
○​ Understanding which definitions reach a point helps ensure variables are
properly initialized and not unexpectedly overwritten.
4.​ Code Refactoring:
○​ Identifying unused or unnecessary definitions allows for code simplification.

Example from the Figure

1.​ At program point 4:


○​ IN[4] = {(count, 0), (fib, 2), (aux, 3), (pred, 1)}
■​ Definitions of count, fib, aux, and pred are available.
○​ OUT[4] = {(count, 0), (fib, 4), (aux, 3), (pred, 1)}
■​ The new definition of fib at 4 replaces the old one from 2.
○​ Interpretation:

798
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ The fib value from 4 propagates, potentially affecting subsequent


instructions.
2.​ At program point 7 (loop condition):
○​ IN[7] = {(count, 6), (fib, 4), (aux, 3), (pred, 5)}
■​ Definitions from the loop body are carried back to the loop header.
○​ OUT[7] = {(repeat, 7), (count, 6), (fib, 4), (aux, 3),
(pred, 5)}
■​ The loop condition assigns a new value to repeat.

Summary

The solution of reaching-definition analysis is a static abstraction of the dynamic flow of


definitions in the program:

1.​ Meaning: It shows which variable definitions are available at each point in the program,
based on the CFG.
2.​ Utility: It supports optimizations, debugging, and program analysis by exposing how
definitions influence computations.
3.​ Execution Relation: It accounts for all possible paths in the CFG, ensuring correctness
even for complex control flows.

What do you mean when you say that the Reaching-Definition Analysis is
conservative?
When I say that reaching-definition analysis is conservative, I mean that it is designed to
over-approximate the possible definitions that might reach each program point. This ensures
that the analysis is safe and correct for all possible program executions. In essence,
conservative analyses guarantee that no potential behaviors of the program are missed, even at
the cost of including some that are not possible in practice.

Conservative Analyses and "May" Properties

"May" Analysis

●​ Reaching-definition analysis is a "may" analysis because it computes all the


definitions that may reach a program point.
●​ It ensures soundness by including all definitions that could possibly influence a
computation, given any feasible path through the control-flow graph (CFG).
●​ The results may include spurious definitions that cannot actually reach a point in any real
execution due to infeasible paths or program semantics.

Example

In reaching-definition analysis:

799
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ If a variable x has definitions at points P1 and P2, and the analysis concludes that both
definitions reach a program point P3, this means:
○​ At least one of these definitions may influence P3 depending on the path
taken.
○​ This does not imply that both definitions simultaneously influence P3 in a single
execution.

Comparison: "May" vs. "Must" Analyses

"Must" Analysis

●​ A "must" analysis computes facts that hold on all possible executions of the program.
●​ In the context of definitions, a "must-reach" analysis would determine which definitions
of a variable are guaranteed to reach a program point on all possible paths.
●​ This is stricter and less inclusive than a "may" analysis.

Key Differences

Aspect "May" Analysis (e.g., Reaching "Must" Analysis


Definitions)

Purpose Includes everything that might Includes only what always happens.
happen.

Results Over-approximation (possibly Under-approximation (safe to exclude


includes spurious results). facts).

Example All definitions that may reach a Definitions that must reach a point.
Property point.

Applications Optimizations where safety is Used in optimizations like constant


critical, like dead code elimination. folding, where certainty is needed.

Example

Consider this CFG:

1: x = 5
2: if (cond) { x = 10 }
3: y = x

●​ "May" Analysis (Reaching Definitions):


○​ At point 3, both x = 5 (from 1) and x = 10 (from 2) may reach 3.
●​ "Must" Analysis:
○​ At point 3, no definition of x must reach because the execution path depends on
the value of cond.

800
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Other Examples of "May" and "Must" Analyses

"May" Analyses

1.​ Reaching Definitions:


○​ Determines all definitions that may reach each program point.
2.​ Live Variables Analysis:
○​ Identifies variables that may be used in the future.

"Must" Analyses

1.​ Available Expressions:


○​ Determines expressions that are guaranteed to have been computed and not
invalidated along all paths to a program point.
2.​ Definite Assignment Analysis:
○​ Checks if a variable is guaranteed to be initialized before its use.

Why Use "May" or "Must" Analyses?

"May" Analyses Are Useful When:

●​ Safety is paramount. Over-approximating ensures that all possible program behaviors


are accounted for, preventing unsafe optimizations.
●​ Examples:
○​ Reaching definitions: Ensures no definition is overlooked when determining
potential variable dependencies.
○​ Pointer aliasing: Determines all possible aliases, ensuring memory operations
are safe.

"Must" Analyses Are Useful When:

●​ Certainty is required to enable aggressive optimizations.


●​ Examples:
○​ Available expressions: Enables replacing computations with previously
computed results.
○​ Definite assignment: Guarantees a variable is initialized before being used,
preventing runtime errors.

Relation Between "May" and "Must" Analyses

●​ Dual Nature: Often, "may" and "must" analyses are duals of each other:
○​ A "must" analysis can sometimes be derived from the complement of a "may"
analysis and vice versa.
●​ Example:
○​ "May reach" definitions describe what might reach a program point.

801
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ "Must reach" definitions describe what is guaranteed to reach a program point.

Summary

●​ Reaching-definition analysis is conservative because it over-approximates potential


behaviors, ensuring correctness for any program execution.
●​ It is a "may" analysis, including all definitions that might reach a program point.
●​ "Must" analyses, by contrast, focus on properties that hold on all executions, enabling
aggressive optimizations.

Can you state out some properties that are true about the results of a
Reaching-Definition Analysis?
Yes, there are many such properties. Find a few below:

Theorem 1 (Affectation):

Given an instruction a = b + c at program point P, only the definitions of b


and c that reach P can affect the value of a.

Proof Sketch:

1.​ By definition, the IN set at P includes all definitions that reach the entry of P.
2.​ Reaching definition analysis is conservative, meaning if a definition does not reach P, it
is guaranteed not to affect the program at P.
3.​ Therefore, only the definitions of b and c in the IN set at P can contribute to the
computation of a.

Utility:

This theorem helps in understanding data provenance and supports optimizations like constant
propagation. If b or c have a single reaching definition, their values can be substituted directly.

Theorem 2 (Dead Definitions):

A definition of a variable x at program point Q is dead if it does not appear in


the IN set of any successor of Q.

Proof Sketch:

1.​ The OUT set at Q represents all definitions that propagate from Q.
2.​ If no successor of Q has (x,Q) in its IN set, the definition at Q is unused downstream.
3.​ Such definitions can be eliminated without changing the program's behavior.

802
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Utility:

This theorem underpins dead code elimination, a key optimization in compilers.

Theorem 3 (Definition Substitution):

If a variable x has exactly one definition (x, Q) that reaches program point P,
then x's value at P is determined solely by the computation at Q.

Proof Sketch:

1.​ Reaching definition analysis tracks all definitions that can reach P.
2.​ If x has exactly one reaching definition (x,Q), no other definitions can interfere or
override x before P.
3.​ Therefore, x's value at P must be the result of the computation at Q.

Utility:

This property supports constant propagation and partial redundancy elimination. For
example, the computation of x at Q can be propagated to P to reduce redundant evaluations.

Theorem 4 (No Spurious Definitions):

If a definition (x, Q) does not appear in the IN set at P, it cannot affect P.

Proof Sketch:

1.​ Definitions not in the IN set at P do not propagate to P due to intervening redefinitions or
control-flow barriers.
2.​ As a result, (x,Q) has no influence on computations at P.

Utility:

This theorem is essential for understanding the scope of definitions and avoiding incorrect
assumptions about variable dependencies.

Theorem 5 (Control Flow and Definitions):

If program point P has multiple predecessors, its IN set for variable x is the
union of the OUT sets of x's definitions from all predecessors.

Proof Sketch:

1.​ The IN set at P is defined as the union of the OUT sets of all predecessor points.
2.​ This ensures that all definitions flowing into P from distinct paths are captured.

803
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Utility:

This theorem explains the control flow-sensitive nature of reaching definition analysis and
helps in reasoning about programs with branches or loops.

Theorem 6 (Loop Influence):

For a variable x defined inside a loop, the definition may reach the loop
header if it is not overwritten by subsequent iterations.

Proof Sketch:

1.​ In a loop, the OUT set of the loop body propagates to the loop header.
2.​ If x's definition is not killed within the loop, it will appear in the IN set of the loop header,
potentially affecting subsequent iterations.

Utility:

This theorem supports loop optimizations like loop invariant code motion and partial
redundancy elimination.

Theorem 7 (Variable Reuse):

At program point P, two definitions (x, Q_1) and (x, Q_2) can coexist in the IN
set if and only if Q1 and Q2 belong to different control-flow paths leading to P.

Proof Sketch:

1.​ Reaching definitions analysis is path-sensitive but merges control-flow paths


conservatively.
2.​ If (x,Q1) and (x,Q2) coexist, P must be reachable via both paths.

Utility:

This property helps in understanding program nondeterminism and ensuring correctness in


optimizations that involve variable reuse.

These theorems show how reaching definition analysis provides powerful insights into program
behavior and supports a variety of optimizations and correctness guarantees. Would you like me
to expand on any specific theorem or its practical applications?

804
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

There are so many analyses, but they are also very similar! Is there a way
to implement them all by just varying small parts of them?
Yes! All these analyses are instances of the Data-Flow Framework! Analyses like reaching
definitions, liveness analysis, and available expressions are examples of analyses that fit
within this data-flow framework. Let me explain what a data-flow framework is, its structure,
and why it’s useful in compiler design.

What is the Data-Flow Framework?

The data-flow framework is a general abstraction used to perform static analyses of programs.
It defines a systematic way to collect information about program behaviors by solving data-flow
equations over a control-flow graph (CFG).

Key Components

1.​ Control-Flow Graph (CFG):


○​ Represents the program as a graph, where:
■​ Nodes are program points (e.g., instructions or blocks of code).
■​ Edges represent control flow (e.g., jumps, branches).
2.​ Data-Flow Values:
○​ These represent the facts being computed during the analysis (e.g., which
definitions reach a point, which variables are live, etc.).
○​ The values belong to a lattice, ensuring there is a clear way to combine values
(e.g., union, intersection).
3.​ Transfer Functions:
○​ These describe how information flows through each node of the CFG. For
example:
■​ In reaching definitions, a transfer function updates the OUT set by
adding new definitions and removing killed ones.
4.​ Direction of Analysis:
○​ Forward Analysis: Information flows in the same direction as program execution
(e.g., reaching definitions, available expressions).
○​ Backward Analysis: Information flows opposite to program execution (e.g.,
liveness analysis).
5.​ Meet Operator:
○​ Combines data-flow facts from multiple control-flow paths. For example:
■​ Union: Used in reaching definitions to combine definitions from different
paths.
■​ Intersection: Used in available expressions to find expressions valid on
all paths.
6.​ Fixed-Point Iteration:
○​ Solves the data-flow equations iteratively until the information stabilizes (reaches
a fixed point).

805
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Structure of the Data-Flow Framework

1.​ Input:
○​ CFG: The control-flow graph of the program.
○​ Analysis Specification: Defines:
■​ Direction (forward or backward).
■​ Transfer functions for nodes.
■​ Meet operator for combining paths.
2.​ Equations:
○​ For each node in the CFG, compute:
■​ IN[n] and OUT[n] for forward analysis.
■​ OUT[n] and IN[n] for backward analysis.

General form for forward analysis:​

OUT[n] = TRANSFER(n, IN[n])


IN[n] = MEET(OUT[p] for p in predecessors[n])

3.​ Fixed-Point Iteration:


○​ Initialize all IN and OUT sets (typically to empty or maximal sets).
○​ Repeatedly apply the equations until no further changes occur (fixed point).
4.​ Output:
○​ The IN and OUT sets for all nodes, representing the analysis results.

Key Properties of the Framework

1.​ Generality:
○​ The framework supports a wide range of analyses, as long as the problem can
be expressed in terms of data flow through the CFG.
2.​ Safety:
○​ The framework guarantees a safe approximation of program behaviors,
typically through over- or under-approximation.
3.​ Convergence:
○​ By using a finite lattice for data-flow values and monotonic transfer functions, the
framework ensures convergence to a fixed point.

Examples of Analyses in the Framework

1. Reaching Definitions (Forward Analysis)

●​ Direction: Forward.
●​ Transfer Function: OUT[n] = GEN[n] ∪ (IN[n] \ KILL[n])
●​ Meet Operator: Union ("may" analysis).

806
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. Liveness Analysis (Backward Analysis)

●​ Direction: Backward.
●​ Transfer Function: IN[n] = USE[n] ∪ (OUT[n] \ DEF[n])
●​ Meet Operator: Union ("may" analysis).

3. Available Expressions (Forward Analysis)

●​ Direction: Forward.
●​ Transfer Function: OUT[n] = GEN[n] ∪ (IN[n] ∩ ~KILL[n])
●​ Meet Operator: Intersection ("must" analysis).

Applications of the Data-Flow Framework

1.​ Optimizations:
○​ Dead code elimination (e.g., from liveness analysis).
○​ Common subexpression elimination (e.g., from available expressions).
○​ Register allocation (e.g., from liveness analysis).
2.​ Program Verification:
○​ Ensuring variables are initialized before use (e.g., definite assignment analysis).
3.​ Code Refactoring:
○​ Identifying redundant computations or unused variables.

Relation to "May" and "Must" Analyses

●​ The data-flow framework can express both may and must analyses:
○​ May Analysis: Typically uses the union operator in the meet (e.g., reaching
definitions, liveness).
○​ Must Analysis: Typically uses the intersection operator in the meet (e.g.,
available expressions).

Summary

The data-flow framework provides a unifying structure for many program analyses. By defining
the control flow, data-flow values, transfer functions, and meet operators, it systematically
computes information about program behavior. This framework is crucial for enabling safe and
effective compiler optimizations.

Who invented this "Data-Flow Framework"?


The data-flow framework as a systematic method for program analysis was developed in the
late 1960s and early 1970s, primarily in the context of optimizing compilers. While it evolved
through contributions from many researchers, the foundational ideas are often credited to a few
key figures:

807
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Contributions to the Data-Flow Framework

1. John Cocke

●​ Known as one of the pioneers of modern compiler optimization.


●​ In the 1960s, Cocke worked on global program optimization at IBM, contributing to
ideas like control-flow analysis and basic data-flow concepts.

2. Gary Kildall

●​ In 1973, Kildall introduced the seminal formalization of data-flow analysis in his paper
A Unified Approach to Global Program Optimization.
●​ Kildall's algorithm provided a systematic way to compute data-flow information using the
concepts of lattices, transfer functions, and fixed-point iteration. This became the
foundation for modern data-flow frameworks.

3. Frances E. Allen

●​ Often considered the "mother of compiler optimization", Frances Allen made


significant contributions to control-flow analysis and loop optimization at IBM.
●​ Her work laid the groundwork for applying data-flow analysis to loop optimizations, such
as loop invariant code motion and strength reduction.

4. Kenneth Zadeck

●​ In the 1970s and 1980s, Zadeck and others refined the practical application of data-flow
frameworks, focusing on efficiency and scalability in compilers.

5. Others

●​ Contributions from researchers like Patrick Cousot and Radhia Cousot (abstract
interpretation) and Aho, Sethi, and Ullman (Dragon Book) helped formalize and
popularize data-flow frameworks in the broader programming languages and compiler
community.

Significance of Kildall's Work

Kildall's 1973 paper is often considered the formal birth of the data-flow framework because:

1.​ It provided the theoretical foundation for solving data-flow equations iteratively using a
fixed-point computation.
2.​ It introduced the use of lattices to model data-flow facts and guarantee convergence.
3.​ It unified multiple data-flow analyses (e.g., reaching definitions, liveness, available
expressions) under a common theoretical framework.

Why Is It Important?

808
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The data-flow framework revolutionized compiler design by enabling:

1.​ Global Optimizations: Efficiently propagating information across the entire program, not
just within basic blocks.
2.​ Generalization: A single mathematical structure for many analyses, making compilers
modular and extensible.
3.​ Theoretical Guarantees: Convergence and correctness via well-defined mathematical
properties like monotonicity and finite lattices.

Legacy

The framework is now a core part of compiler theory and has influenced other areas of program
analysis, including abstract interpretation and model checking. Researchers like Allen and
Kildall have left a lasting mark on how compilers analyze and optimize code.

Is this the same Kildall from the TV show? What else did he do for
computer science?
Yes, Gary Kildall, the same visionary computer scientist who contributed to the development of
the data-flow framework, is also famously known for his pioneering work in personal
computing and for founding the company Digital Research Inc. (DRI). He was indeed one of
the hosts of the PBF Series Computer Chronicles.

Beyond his foundational work in compiler theory, Kildall made significant contributions across
various domains in computer science and was an advocate for science outreach. Here's a
summary of his contributions:

1. Compiler Theory and Data-Flow Analysis

809
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Kildall's 1973 paper, A Unified Approach to Global Program Optimization, laid the
groundwork for data-flow frameworks, as discussed earlier.
●​ His work provided theoretical tools still used in modern compiler design.

2. Operating Systems: CP/M

●​ Kildall is best known for creating CP/M (Control Program for Microcomputers) in
1974.
○​ CP/M was one of the first operating systems for personal computers and became
a de facto standard in the early days of microcomputing.
○​ It introduced the concept of a BIOS (Basic Input/Output System), which
allowed CP/M to run on various hardware platforms, paving the way for
cross-platform software development.
●​ CP/M heavily influenced MS-DOS, which Microsoft developed after a controversial
decision by IBM not to adopt Kildall's system.

3. Programming Languages: PL/M

●​ Kildall designed PL/M (Programming Language for Microcomputers) in 1973.


○​ PL/M was a high-level language specifically optimized for early microcomputers,
combining the power of assembly with the readability of high-level languages like
PL/I.
○​ It became widely used for embedded systems development.

4. Early Graphical User Interfaces (GUIs)

●​ Kildall foresaw the potential of graphical interfaces for personal computing and
developed GEM (Graphical Environment Manager).
○​ GEM was an early GUI for PCs and inspired subsequent developments in
graphical operating systems.

5. Networking and CD-ROM Technology

●​ Kildall developed one of the first implementations of network file sharing with MP/M (an
extension of CP/M for multitasking).
●​ He contributed to the development of CD-ROM technology, demonstrating its potential
for data distribution and multimedia applications.

6. Outreach and Public Engagement

●​ In the 1980s, Kildall hosted the PBS series The Computer Chronicles:
○​ This popular TV show explained computer technology to a broad audience,
making concepts accessible to both experts and laypeople.

810
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Kildall's approachable style and passion for teaching helped demystify personal
computing during its infancy.
●​ He was a strong advocate for democratizing access to computing and believed in the
power of computers to transform society.

7. Patents and Entrepreneurial Ventures

●​ Kildall held multiple patents related to software and computer systems.


●​ He founded Digital Research Inc., which became a major player in the early personal
computing industry.

8. Legacy and Influence

Despite his achievements, Kildall's work is often overshadowed by more publicized figures like
Bill Gates and Steve Jobs. Nevertheless, his contributions to computer science remain
profound:

●​ Standardization: His work on CP/M and BIOS established conventions that shaped the
PC industry.
●​ Vision: Kildall's foresight in areas like cross-platform software, GUIs, and networking
influenced countless innovations.
●​ Science Outreach: His efforts on The Computer Chronicles showcased his dedication
to educating the public about technology.

Tragic Underrating of His Role

Unfortunately, Kildall's contributions were somewhat eclipsed by IBM's decision to license


Microsoft's MS-DOS instead of CP/M for their PCs. This pivotal moment reshaped the industry,
but it does not diminish Kildall's legacy as a brilliant innovator and thinker.

Conclusion

Gary Kildall was a polymath in computer science, contributing to fields as diverse as compiler
theory, operating systems, networking, and science communication. His passion for both
innovation and outreach left an indelible mark on the history of computing. Would you like to
explore any specific aspect of his work further?

811
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Static Single-Assignment Form


Do you know how constant propagation works?
Constant propagation is a common optimization in compilers. Here's an overview of what it is
and how it works:

What is Constant Propagation?

Constant propagation is an optimization technique that identifies and replaces expressions


involving constant values with their computed results. The goal is to reduce the runtime
computation by using precomputed constant values.

How Does Constant Propagation Work?

The process involves the following steps:

1.​ Analysis Phase:


○​ The compiler analyzes the code to determine which variables have constant
values at specific points in the program.
○​ This is done through a data-flow analysis called constant propagation analysis.
It determines for each variable and each program point whether the variable
holds a constant value.
2.​ Replacement Phase:
○​ If a variable is found to have a constant value at a particular program point, the
compiler replaces the variable with that constant.
3.​ Code Simplification:
○​ After replacing constants, the compiler may simplify expressions. For example:​
int x = 5; int y = x + 2; // y = 7 after constant propagation.
○​ In this example, x + 2 is replaced with 7.
4.​ Iterative Process:
○​ The propagation and analysis process is often iterative. Changes in one part of
the program can affect the constant values in another part, so the analysis is
repeated until no further changes occur (a fixed point is reached).

Example

Consider the following code:

int x = 10;
int y = x + 5;
int z = y * 2;

812
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

After constant propagation:

int x = 10;
int y = 15; // Replaced x + 5 with 15
int z = 30; // Replaced y * 2 with 30

Benefits of Constant Propagation

●​ Improved Performance: It reduces runtime computations.


●​ Smaller Code Size: By simplifying expressions, the generated code may have fewer
instructions.
●​ Enable Further Optimizations: Simplified code can expose opportunities for other
optimizations, such as dead code elimination.

Challenges in Constant Propagation

●​ Control Flow: In programs with loops or conditionals, constant propagation can become
more complex.
●​ Aliasing and Pointers: When pointers or aliasing are involved, it can be hard to
determine if a variable truly holds a constant value.

How does the compiler know that a variable is a constant? Is it via some
data-flow analysis?
Yes, the compiler uses data-flow analysis to determine whether a variable is constant at
specific points in the program. The specific type of data-flow analysis used for this purpose is
called constant propagation analysis. Here's how it works in detail:

Key Idea of Data-Flow Analysis

The compiler tracks the possible values of variables as they "flow" through the control flow
graph (CFG) of the program. For constant propagation, the goal is to identify variables that are
provably constant at each point in the program.

Steps of the Analysis

1.​ Control Flow Graph (CFG) Construction:


○​ The program is represented as a directed graph where nodes are basic blocks of
instructions, and edges represent control flow between them.
2.​ Lattice of Values:
○​ Each variable is associated with a lattice that represents its state. The lattice for
constant propagation typically has three values:
■​ ⊤ (Top): The variable may hold any value (unknown).
■​ Constant (c): The variable holds a specific, known constant value.
■​ ⊥ (Bottom): The variable is uninitialized or undefined.

813
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Transfer Functions:


○​ Each instruction in the program has a transfer function that describes how it
modifies the lattice value of a variable. For example:
■​ x = 5: Sets x to the constant 5.
■​ x = y + 3: If y is constant, x becomes the result of y + 3. If y is not
constant, x becomes ⊤ (unknown).
4.​ Join Operation:
○​ At control flow merges (e.g., the end of an if statement), the compiler uses a
join operation to combine information from multiple paths.
○​ The join operation typically works as follows:
■​ If a variable has the same constant value on all paths, it remains
constant.
■​ If the variable has different values (or is unknown on some paths), it
becomes ⊤ (unknown).
5.​ Worklist Algorithm:
○​ The analysis is typically performed using a worklist algorithm, which iteratively
updates the lattice values for variables until a fixed point is reached (i.e., no more
changes occur).

Could you give me an example showing how this data-flow analysis works?
Yes, the figure below represents the process of constant propagation analysis:

814
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Key Elements in the Figure

1.​ Control-Flow Graph (CFG):


○​ The nodes in the graph correspond to program statements, and edges represent
the possible flow of control between them.
2.​ Abstract States:
○​ Each edge is labeled with a mapping of variables to their abstract state at that
program point.
○​ The states include:
■​ NAC (Not a Constant): The variable may hold different values or an
unknown value.
■​ UNDEF (Undefined): The variable has not been assigned any value yet.
■​ Concrete Value (e.g., 1, 11): The variable holds a known constant.
3.​ Updates:
○​ As the analysis progresses through the CFG, the abstract state for each variable
is updated based on the semantics of the program statements.

How the data-flow analysis works:

The analysis works as follows:

1.​ Initialization:
○​ Variables are initially assigned UNDEF until they are assigned specific values in
the program.
2.​ Data-Flow Analysis:
○​ The figure tracks how constants propagate through the program. For example:
■​ a = 1 sets a to the constant 1.
■​ c = a + 10 propagates the constant value 11 to c.
■​ When control flow merges (e.g., at the loop), the state reflects the most
conservative approximation using a join operation:
■​ If the variable is constant on one path but not on the other, the
result is NAC.
3.​ Iterative Refinement:
○​ The process continues iteratively, updating states until reaching a fixed point
where no further changes occur.
○​ In loops, variables like a and b may become NAC because their values cannot be
determined to be constant across iterations.

Interpreting Specific Details in Your Figure

●​ At the start, variables are undefined (UNDEF).


●​ As the program executes:
○​ a becomes 1 (a constant).

815
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ c becomes 11 (from a + 10).


●​ In the loop:
○​ a becomes NAC due to the update a = a + 1.
○​ b also becomes NAC because it depends on a and x, both of which cannot be
proven constant.

How Does the Analysis Relate to Code Execution?

Constant propagation analysis does not execute the code but symbolically evaluates it using the
CFG and abstract states. It uses transfer functions and the lattice of values (as described
earlier) to approximate the values of variables.

This implementation of constant propagation seems a bit wasteful. Do you


see the problem?
Yes, this implementation is not ideal. It is too "dense"! The dense implementation of constant
propagation tracks the abstract state of every variable at every program point in the control
flow graph (CFG). This means that for a program with:

●​ |V| variables (the total number of variables in the program), and


●​ |P| program points (edges in the CFG),

the dense approach requires maintaining O(|V| * |P|) space to store all this information.

Why is this expensive?

1.​ Growth with Program Size:


○​ As the number of variables (|V|) increases, the storage requirements grow
linearly with it.
○​ Similarly, as the size of the program increases (more edges in the CFG, |P|), the
memory usage grows linearly as well.
2.​ Wasted Computation:
○​ Not all variables are updated or used at every program point. However, in the
dense approach, the compiler still computes and stores information for all
variables at every single point, even if most of them are irrelevant.
○​ For example:
■​ If a variable a is assigned a constant value early in the program and
never changes, we still track its state across every program point, even
though it's unnecessary.
3.​ Large Programs Amplify the Issue:
○​ For real-world programs with many variables and a large CFG, the storage and
computation overhead become significant.

816
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If you have 1,000 variables and 10,000 program points, the dense
implementation would require space proportional to 10,000,000 entries, even
though only a fraction of these might be relevant!
4.​ Memory Usage:
○​ Modern programs are large and may have thousands of variables, making this
approach infeasible due to high memory consumption.
○​ The space used to store abstract states could otherwise be used for more critical
tasks, like storing actual program data or performing other optimizations.
5.​ Inefficiency for Sparse Data:
○​ In many cases, the number of variables that actually change at any given
program point is small. This makes the dense representation overkill because
we're maintaining and updating information that isn't necessary.

Consider an Analogy:

Imagine you’re tracking the position of all the students in a university across a campus map at
every second of the day. For most students, their position won’t change very often, but you’d still
waste time and storage keeping track of everyone constantly, instead of just focusing on the few
who are actually moving.

In summary, the dense approach is expensive because it doesn't take advantage of the
sparsity of actual changes in variable states, leading to significant overhead in both space and
computation.

Would it be possible to simplify this analysis, reducing this asymptotic


complexity?
Yeap, there are some tricks we can play :)

Reducing the Complexity of Constant Propagation

The inefficiency of the dense approach comes from unnecessarily tracking the state of every
variable at every program point. But here's an important observation: whether a variable is
constant or not depends only on the point where it is defined—not on every subsequent
program point.

The Key Idea:

1.​ A Variable's Definition Determines Its Abstract State:


○​ The property of a variable being a constant or not constant is fixed at the point
where it is assigned a value.
○​ For instance:

817
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ If a = 1, then a is constant everywhere in the program, as long as it is


not reassigned later.
■​ If a = input (user-provided value), then a is not constant anywhere,
because we can't know its value.
2.​ Variables with Unique Names Have Fixed Properties:
○​ Imagine every variable in the program has a unique name that reflects its specific
assignment. In this case:
■​ The constant/non-constant property is fixed for the entire live range of
the variable (the range of the program where the variable is used after its
definition).
■​ This means there is no need to keep re-evaluating the variable's state at
every program point—it’s determined at its definition.
3.​ Implication:
○​ Instead of associating the abstract state (e.g., CONSTANT or NOT CONSTANT)
with both the variable and the program point, we can associate it with the
variable alone.
○​ This significantly reduces the amount of information we need to track, because
the same abstract state is invariant throughout the variable's lifetime.

A Simplified Approach to Constant Propagation

By focusing only on where variables are defined:

●​ We eliminate redundancy: We no longer need to store and recompute information


about a variable at every program point.
●​ We streamline the analysis: If the variable’s name uniquely identifies its definition, we
know its constant/non-constant status at all program points where it is used.

Analogy for Intuition

Imagine you’re labeling jars in a kitchen:

●​ If every jar has a unique label based on its contents (e.g., “sugar,” “flour”), you always
know what’s inside by looking at the label.
●​ There’s no need to check every cabinet or recipe step to figure out what’s in each
jar—it’s consistent because of the unique label.

Similarly, if every variable in a program is uniquely tied to its definition, its property (constant or
not) is immediately clear and doesn’t need to be re-evaluated everywhere.

But, how can we ensure that every variable has a single name? Renaming
variables in straight-line code is easy. But what about code with branches?
Yes, renaming variables within straight-line code (code without branches) is fairly easy. See how
that could be done in the figure below:

818
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Let's break down the figure and explain why renaming variables in straight-line code is simple
but becomes challenging when control flow introduces branches.

Straight-line Code

In the first example:

L0: a = read()
L1: b = read()
L2: a = a + b
L3: ret a

We see a simple sequence of instructions without any branching. Renaming variables here is
straightforward: just rename each definition (a = ..., b = ...) with a unique counter, that
you increment every time it is added to a definition. The sequence then becomes:

L0: a0 = read()
L1: b0 = read()
L2: a1 = a0 + b0
L3: ret a1

Each variable has a unique name (a0, a1, b0), making it clear where values are defined and
used. There's no ambiguity or overlap.

Code with Branches

In the second example:

L0: a = read()
L1: b = read()
L2: brq a b L4
L3: b = b + 1
L4: ret b

819
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The control flow diverges at L2 with a conditional branch (brq a b L4). Depending on the
condition:

●​ If the branch goes to L3, b is updated (b = b + 1).


●​ If the branch goes to L4 directly, b remains unchanged.

When we try to rename variables, the complication arises at L4:

●​ After L3, the updated value of b is stored in b1.


●​ If L3 is skipped, the value of b0 should be used.

At L4, it's unclear whether to use b0 (from before the branch) or b1 (from L3). We cannot
assign a unique name to b here because it depends on the program's execution path. This
ambiguity demonstrates why branches complicate variable renaming.

So, we could rename variables within straight-line code with counters. How
would you implement that?
Here's the Python implementation of functions that rename variables by adding counters to their
names:

def initialize(variables):
"""
Initializes the Count and Stack data structures for renaming variables.

Args:
variables: A list of all variables in the program.

Returns:
Count: A dictionary mapping each variable to its count (initially 0).
Stack: A dictionary mapping each variable to a stack containing [0].
"""
Count = {var: 0 for var in variables}
Stack = {var: [0] for var in variables}
return Count, Stack

def rename_basic_block(block, Count, Stack):


"""
Renames variables in a basic block to ensure single assignment.

Args:
block: A list of instructions in the basic block.
Count: A dictionary mapping each variable to its current count.

820
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Stack: A dictionary mapping each variable to a stack of


assigned indices.

Returns:
A renamed version of the block.
"""
for instruction in block:
# Replace all uses of variables in the instruction
for x in get_used_variables(instruction):
i = top(Stack[x])
replace_variable_use(instruction, x, f"{x}{i}")

# Replace all definitions of variables in the instruction


for a in get_defined_variables(instruction):
Count[a] += 1
i = Count[a]
Stack[a].append(i)
replace_variable_definition(instruction, a, f"{a}{i}")

Explanation

●​ initialize:
○​ Sets up the Count and Stack data structures. Count tracks how many times a
variable has been defined so far, and Stack keeps the history of indices for each
variable.
●​ rename_basic_block:
○​ Processes a list of instructions in a basic block. For each instruction:
■​ It retrieves the variables that are used and defined
(get_used_variables and get_defined_variables).
■​ It replaces each use and definition with a unique name based on the
current index from Stack and updates Count and Stack accordingly.

Placeholders for External Functions

●​ get_used_variables(instruction): Returns the list of variables used in the given


instruction.
●​ get_defined_variables(instruction): Returns the list of variables defined in
the given instruction.
●​ replace_variable_use(instruction, x, xi): Replaces the use of variable x
with xi in the given instruction.
●​ replace_variable_definition(instruction, a, ai): Replaces the definition
of variable a with ai in the given instruction.

821
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

But the problem with control flow still remains. How can we deal with
programs that have branches?
When programs have branches, we encounter a problem: a variable may have multiple
definitions, depending on the execution path. As an example, consider the figure below:

For example, in the program provided:

1.​ At L1, b is defined (b0 = read()).


2.​ At L3, b is redefined (b1 = b0 + 1) if the branch condition is true.
3.​ At L4, when control flow merges, we need to determine which value of b to use:
○​ If the program followed the L3 path, we need b1.
○​ If it skipped L3, we need b0.

The Need for a Multiplexer

To handle this ambiguity, we introduce the concept of a multiplexer:

●​ A multiplexer selects one of several inputs based on a condition.


●​ In this case, we need a mechanism that:
○​ Chooses b1 if control flows through L3.
○​ Chooses b0 otherwise.

This "multiplexer" doesn’t exist as a physical construct in the program; instead, it exists in the
abstract representation of the program.

Introducing the Phi-Function

In the abstract representation (used in compilers), this multiplexer is represented as a


phi-function. A phi-function:

822
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Combines multiple possible values of a variable into a single new variable, depending on
the control flow.
●​ Is placed at points where control flow merges (e.g., at L4).

Example with Phi-Function: The phi-function for b at L4 looks like this:

b2 = phi(b1, b0)

●​ b2 is the new variable representing the value of b at L4.


●​ If control flowed through L3, b2 = b1.
●​ If control skipped L3, b2 = b0.

Semantics of Phi-Functions

The phi-function’s semantics are abstract:

●​ It represents a value chosen based on the runtime control flow path.


●​ Once the program is lowered to assembly, phi-functions are implemented using
conditional moves, jumps, or similar constructs.

Why Phi-Functions Work

1.​ They Preserve Single Assignment:


○​ Every variable still has a unique name, even when control flow merges.
2.​ They Handle Ambiguity:
○​ Phi-functions clearly express how a variable’s value depends on the program's
control flow.
3.​ They Stay Abstract:
○​ Phi-functions exist only in the intermediate representation (IR). They’re
eliminated during code generation.

823
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What happens if we have multiple phi-functions at the beginning of some


basic block? What's the semantics of multiple phi-functions?
Phi-functions are weird animals, indeed! Perhaps the figure below will clarify what happens
when we have multiple phi-functions at the beginning of a basic block:

This semantics of phi-functions can be summarized in two steps:

1.​ Reading Variables: When control flow reaches a join point with phi-functions, we
determine the incoming path that led to this point. Each phi-function then reads the
variable from the corresponding input column based on this path. For example, if the
flow comes from the second branch in the figure, we select values like a1a1, b1b1, and
c1c1.​

2.​ Assigning Variables: After reading the required values, the phi-functions assign these
values in parallel to their respective target variables. This ensures the semantics remains
consistent even when variables are permuted, solving the so-called "swap problem."

Implementation in Python

The provided Python code below captures this behavior using two classes: Phi and PhiBlock:

824
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Phi Class: This models an individual phi-function, mapping a destination variable to a selection
of source variables depending on the program path. Example:​

a = Phi("a", ["a0", "a1", "a2"])

●​ Here, the variable a will be assigned the value of a0, a1, or a2, depending on the
incoming path.​

825
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

PhiBlock Class: This groups multiple phi-functions into a single entity, ensuring all
phi-functions at a join point are evaluated in parallel. For example, in the figure, a,b,ca, b, c are
updated together based on the active incoming path. Example:​

aa = PhiBlock([a0, a1], [10, 31])

●​ In this case, the phi-block ensures that values from a selected column are copied
simultaneously to their target variables.

Key Points from the Figure

●​ Parallel Assignment: Phi-functions in a block are updated simultaneously, which avoids


the potential for inconsistency or variable swaps. For instance,
(a,b,c):=(a1,b1,c1)(a, b, c) := (a1, b1, c1) if the second branch is
taken.​

●​ Dynamic Path Resolution: The program's path dictates which set of source variables is
selected.

Practical Example

The example from the code illustrates how PhiBlock resolves phi-functions dynamically:

a0 = Phi("a0", ["a0", "a1"])


a1 = Phi("a1", ["a1", "a0"])
aa = PhiBlock([a0, a1], [10, 31])

Given an environment:

e.set("a0", 1)
e.set("a1", 3)

If the path ID is 10, the assignments are resolved as:

a0 = a0
a1 = a1

If the path ID is 31, the assignments swap:

a0 = a1
a1 = a0

This structured implementation ensures correctness and flexibility in handling control flow joins
in a representation that we call Static Single-Assignment form!

826
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

What is Static Single-Assignment Form?


Static Single-Assignment (SSA) form is an intermediate representation used in compilers
where every variable in the program is assigned a value exactly once, and every variable is
uniquely named. This property of single assignment makes data-flow analysis and program
optimization more straightforward and efficient.

Key Properties of SSA Form

1.​ Single Definition per Variable:


○​ In SSA form, each variable is defined (assigned a value) only once in the
program.
○​ If a variable is assigned a new value, it is given a new, unique name. This
eliminates ambiguity about which definition a use of the variable refers to.
2.​ Dominance of Definitions over Uses:
○​ In SSA form, every definition of a variable dominates all its uses.
○​ Dominance is a concept from control-flow graphs (CFGs):
■​ A node A dominates node B if every path from the start of the program to
B must pass through A.
■​ In simpler terms, a variable's definition occurs before its use, regardless
of the control flow.

Dominance in the Context of CFGs

To understand dominance, consider the control-flow graph (CFG) of a program:

●​ Nodes in the CFG represent basic blocks (sequences of instructions with no branching
except at the end).
●​ Edges represent possible paths of control flow between blocks.

A node D dominates a node U if:

●​ D appears on every path from the start of the program to U.

In SSA form:

●​ If a variable x is defined in a block D, then all uses of x occur in blocks dominated by D.


This ensures that the variable’s value is well-defined at all its use sites.

Conclusion

SSA form ensures that:

●​ Every variable has exactly one definition site, making data flow clear and unambiguous.

827
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Every definition dominates its uses, ensuring that variables are always defined before
they are used.

This representation is not only theoretically elegant but also highly practical, as it simplifies
many compiler optimizations and analyses.

When we were talking about memory allocation, we wrote a visitor to


collect global variables. The assumption that variables had different names
was necessary. Do you remember why?
Having variables with the same name can be problematic when collecting global variables
because it introduces ambiguity about which instance of the variable is being referenced,
especially when the same name is used in different scopes. For example, a variable declared
inside a function (local) might have the same name as a variable declared outside the function
(global), leading to confusion about whether the visitor should classify it as global or local. This
shadowing can result in incorrect classification, as the visitor might mistakenly treat a local
variable as global, or vice versa, without distinguishing the scope in which the variable is
declared.

Would the problem disappear if we rename the variables, so that every


definition has a unique name?
This kind of variable renaming would effectively put the program in the so-called Static
Single-Assignment Format (SSA form). Converting the program to SSA (Static Single
Assignment) form effectively resolves the problem of variable shadowing by ensuring that
each variable is assigned exactly once, which means no two variables will share the same
name within a scope.

Here's why SSA form solves the problem:

Key Features of SSA

1.​ Unique Naming: In SSA form, every variable is given a unique name at each point of
assignment, even if the same logical variable (like x) is reassigned multiple times in
different scopes or branches. Instead of overwriting a variable, SSA introduces a new
version of it.
2.​ No Shadowing: Because each assignment produces a new, uniquely named variable,
there's no risk of shadowing. In your language, even if the same variable is used in
nested scopes, each instance will have a different name in SSA form.
3.​ Global vs. Local Identification: In SSA form, since all variables have distinct names, it
becomes straightforward to differentiate between global and local variables. Global
variables will remain consistent across function definitions and scopes, while local
variables will have unique, scope-specific names.

828
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How SSA Form Would Help


Consider this SML-like program on the left side of the figure below. In SSA form, it would be
transformed into something like the program on the right side:

Now, the outer x1 and the inner x2 have unique names, making it easy for your visitor to
classify them correctly. The outer x1 can be classified as a global variable, and x2 as local.

How SSA Form Works in Your Case

In your visitor code, after converting to SSA form:

●​ Global variables would be the ones that are introduced outside of functions (in your
case, outside of anonymous Fn expressions or Let bindings inside functions).
●​ Local variables would be the ones declared inside a function or inner scope and would
have distinct names in SSA form.

You wouldn't need to worry about shadowing or scope tracking anymore, as each variable will
already have a unique name.

Example: Applying SSA to Your Visitor

Once you've transformed your program into SSA form, you could modify your visitor to easily
classify global variables. Since SSA guarantees unique names, it can directly look at where
variables are defined without worrying about shadowing.

Conclusion

Yes, converting the program to SSA form is a very elegant solution for dealing with the
problem of variable shadowing. It simplifies the task of distinguishing global and local variables
because each variable has a unique name, eliminating the ambiguity caused by shadowing.

829
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

This makes your visitor's job much easier, allowing it to correctly classify variables without
needing to track scopes explicitly.

Is the SSA Form used in practice?


Yes, SSA-form (Static Single Assignment) is widely used in modern compilers! It’s a
foundational intermediate representation (IR) in many compiler frameworks because of its
simplicity and power for enabling optimizations. Here's why and how it is used:

Examples of Compilers That Use SSA-Form

1.​ LLVM (Low-Level Virtual Machine):


○​ One of the most popular compiler infrastructures today, LLVM uses SSA-form as
its core IR. The SSA-form in LLVM simplifies the implementation of many
advanced optimizations.
2.​ GCC (GNU Compiler Collection):
○​ GCC switched to an SSA-based representation (GIMPLE) for its middle-end
optimizations, leading to better performance and code maintainability.
3.​ JavaScript JIT Compilers:
○​ Just-In-Time (JIT) compilers for JavaScript, such as those in V8 (Chrome) and
SpiderMonkey (Firefox), often use SSA-form for optimizing dynamic language
code.
4.​ JVM (Java Virtual Machine) and Dalvik/ART (Android):
○​ The Java and Android runtimes use SSA-form during bytecode-to-machine-code
translation to optimize execution.
5.​ Other Notable Compilers:
○​ Microsoft’s .NET CLR JIT compiler.
○​ Rust’s MIR (Mid-Level IR) in the Rust compiler (rustc).
○​ Julia’s LLVM-based JIT.

So, can I see phi-functions in the LLVM Intermediate Representation?


Variables in the LLVM IR (Intermediate Representation) are always in Static Single
Assignment (SSA) form. This means every variable in LLVM IR has a single definition, and
phi-functions are used to resolve control-flow merges. However, variables mapped to memory
are not in SSA form, meaning that we can have stores updating the same variable in different
program sites! The figure below shows how SSA characterizes the LLVM IR:

830
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. LLVM Without Optimizations (No Phi-Functions Yet)

When the program is compiled without optimizations, variables are often mapped to memory
locations (e.g., stack slots), and phi-functions are not really necessary (for memory is not in SSA
form! Only virtual registers!). So, in our figure, when the program is compiled without the
-mem2reg pass:

●​ Variables like sum, i, and j are stored in memory.


●​ Operations on these variables involve explicit loads and stores.
●​ LLVM doesn't need phi-functions yet because variables aren't in SSA—they're
manipulated in memory.

2. Applying mem2reg: Introducing SSA Form

When the mem2reg pass is applied, LLVM starts mapping variables onto virtual registers:

●​ Variables previously stored in memory are promoted to register-like values.


●​ Every definition of a variable gets a unique name, and phi-functions are introduced at
control-flow merge points to resolve ambiguity.

831
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In the loop of the C program:

●​ i and j are updated in each iteration.


●​ At the start of each iteration, the new values of i and j depend on whether the control
flow comes from:
○​ The start of the loop (initial values of i and j).
○​ Or the back edge (updated values from the previous iteration).

LLVM inserts phi-functions at the start of the loop to select the correct value for these variables.
The SSA representation after mem2reg might look like:

The Role of mem2reg

The mem2reg pass is a key step in LLVM:

●​ Before mem2reg: Variables are in memory. LLVM does not represent memory locations
in SSA form: only virtual registers.
●​ After mem2reg: Variables are promoted to SSA form with phi-functions resolving
control-flow merges.

The transformation enables LLVM to leverage the full power of SSA for optimization while
maintaining the ability to represent code without SSA when necessary.

Conclusion

Variables in the LLVM IR are inherently SSA-based. After applying transformations like
mem2reg we have more variables represented as virtual registers. The phi-functions introduced
during this process handle control-flow merges, ensuring the correctness of data flow in SSA
form. This design is essential for LLVM's optimizations and efficient code generation.

Can you go back to constant propagation and show how SSA-Form


simplifies it?
So, let's see: constant propagation is an optimization that identifies and replaces variables with
their constant values wherever possible. The efficiency of this analysis depends on how
data-flow information is tracked throughout the program. Static Single-Assignment (SSA) form
significantly simplifies this process, transforming constant propagation from a dense analysis
into a sparse one.

Key Idea: Definitions Determine Constants

In SSA form:

832
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Every variable has a single definition site.


●​ Whether a variable is constant is determined solely by its definition, independent of its
uses or the program points where it is used.

This property allows us to map the constant status of a variable directly to its name in SSA form,
rather than to a pair of the variable and a specific program point.

Dense vs. Sparse Analysis

Dense Analysis (Non-SSA Representation):

●​ In a traditional program representation, a variable might have multiple definitions, and


each use of the variable depends on the control flow.
●​ To track whether a variable is constant, we need to store the constant status for every
variable at every program point.
●​ This results in a storage complexity of O(∣V∣⋅∣P∣), where:
○​ |V| is the number of variables.
○​ |P| is the number of program points in the control-flow graph.

Sparse Analysis (SSA Representation):

●​ In SSA form, each variable is uniquely defined, and its constant status is tied to that
definition.
●​ Instead of tracking the constant status for every variable at every program point, we only
need to track it once per variable, at its definition.
●​ This reduces the storage complexity to O(∣V∣), as the number of definitions is equal to
the number of unique variables.

The figure below illustrates the benefit of the SSA representation, showing the differences
between the dense and the sparse implementations of constant propagation:

833
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How SSA Achieves Sparsity

1.​ Single Assignment:


○​ Each variable name corresponds to exactly one definition. This removes the
need to disambiguate between multiple definitions of the same variable at
different points in the program.
2.​ Phi-Functions:
○​ At control-flow merge points, phi-functions ensure that variables retain unique
definitions. The constant status of variables passed into a phi-function is
determined by their originating definitions, maintaining sparsity.
3.​ Efficient Mapping:
○​ Constant propagation in SSA form requires only a single mapping of variable
names to their constant status, without considering the program point or
control-flow context.

Advantages of SSA for Constant Propagation

●​ Reduced Storage Requirements:


○​ In SSA, we no longer need to store O(∣V∣⋅∣P∣) data-flow facts, significantly
reducing memory usage.
○​ The attached figure illustrates this reduction by showing how constant status is
mapped directly to variable names in SSA, making the analysis sparse.
●​ Simpler Data-Flow Analysis:
○​ By removing the ambiguity of multiple definitions and program point
dependencies, SSA allows constant propagation to focus solely on the
relationships between definitions and their uses.

834
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Improved Performance:
○​ Sparse analysis reduces the computational overhead of traversing the
control-flow graph to update constant statuses for each program point.

Conclusion

SSA form transforms constant propagation from a dense, program-point-dependent analysis


into a sparse, definition-driven analysis. By tying the constant status of a variable directly to its
unique definition, SSA eliminates the need to track variables across program points, enabling
more efficient and scalable optimizations.

But we are still left with the problem of how to convert a program to SSA
Form if the program has branches. What would be the steps to do that?
Converting a program into Static Single Assignment (SSA) form involves a series of
transformations to ensure that every variable in the program has a single definition and that
every definition of a variable dominates all its uses. This transformation requires addressing
the complexities introduced by branching control flow, where a variable may have multiple
potential definitions depending on the execution path.

Steps to Convert a Program to SSA Form

1.​ Insert Phi-Functions:


○​ Purpose: Phi-functions are introduced at points in the control flow where multiple
paths merge, and a variable might have different values depending on the path
taken.
○​ Placement: Phi-functions are placed in blocks where control flow merges,
ensuring that each variable has a single definition. The phi-function acts as a
"selector" that chooses the correct value of a variable based on the incoming
control-flow edge.
○​ Key SSA Property: The domination property applies to phi-functions, but with
an important nuance:
■​ A variable used in a phi-function is not considered used in the block
containing the phi-function itself.
■​ Instead, the variable is "used" on the edge leading to the block. This
distinction is crucial because it ensures that the definition of the variable
still dominates its use in the CFG.
2.​ Rename Variables:
○​ Purpose: Each definition of a variable is given a unique name to ensure that the
SSA property is preserved throughout the program.
○​ Approach: This renaming is performed systematically, often using a stack of
counters to track the current name for each variable as the control flow is
traversed.

835
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Ensuring the Core Property: During renaming, every use of a variable is


explicitly linked to its dominating definition. For variables used in phi-functions,
the renaming ensures that the correct version of the variable is selected based
on the incoming control-flow edge.

Essential Properties of the Process

1.​ Dominance of Definitions:​

○​ Every definition of a variable dominates all its uses, including those in


phi-functions, where the dominance relationship is based on control-flow edges.
2.​ Unique Variable Names:​

○​ After renaming, each variable in the program has a single definition site and a
unique name.
3.​ Preservation of Semantics:​

○​ The transformation preserves the program's original semantics. Phi-functions are


abstract constructs that ensure correctness during analysis but are replaced by
real instructions during code generation.

Conclusion

The process of converting a program to SSA form is grounded in ensuring the dominance of
variable definitions over their uses. This requires careful handling of phi-functions, which use
variables based on control-flow edges rather than within blocks. These transformations enable
powerful compiler optimizations by creating a representation where data flow is explicit and
unambiguous.

This core property: definition sites dominate its uses, seems to break with
the edges of phi-functions, doesn't it?
Yes, that's a bit weird. The parameters of a phi-function are treated differently from standard
uses of variables:

●​ A variable used in a phi-function is associated with a specific incoming edge to the block,
not the block as a whole.
●​ This edge-based usage ensures that the definition of the variable dominates its "use" in
the control-flow graph, preserving the SSA property.

The figure below illustrates the special case of uses of variables within phi-functions:

836
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Ok, but where should phi-functions exist in the SSA-Form program?


Phi-functions are placed in the control-flow graph (CFG) at nodes that might be reached by
multiple definitions of a variable. Specifically, a phi-function for a variable b should exist at a
node z if all the following conditions are met:

1.​ Multiple Definitions:


○​ There is a block x that contains a definition of b.
○​ There is a different block y (y ≠ x) that also contains a definition of b.
2.​ Reaching Paths:
○​ There is at least one nonempty path Pxz of edges from block x to block z.
○​ There is at least one nonempty path Pyz of edges from block y to block z.
3.​ Path Independence:
○​ The paths Pxz and Pyz do not share any nodes other than z itself.
4.​ Avoid Premature Convergence:
○​ The node z does not appear earlier within both Pxz and Pyz, though it may
appear within one of them individually.

837
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

These conditions ensure that a phi-function is placed at z only when z is the first point in the
CFG where control flow from x and y converges and both definitions of b are still live. This
placement guarantees the SSA property by resolving ambiguity about which definition of b to
use at z.

Hum... if we were to use this criteria to insert phi-functions, I am afraid the


complexity would be kinda high. Do you agree?
Let’s analyze the complexity of determining where phi-functions should be inserted based on the
criteria we discussed earlier. While the process isn’t as complex as examining all possible paths
explicitly, it still involves a significant amount of computation, particularly for large programs.

Key Insight: Using Depth-First Search (DFS) for Reachability

1.​ For Each Variable:


○​ There are ∣V∣ variables in the program.
○​ For each variable, we need to determine all the blocks it might reach in the
control-flow graph (CFG).
2.​ For Each Block Pair:
○​ A definition of a variable in block x might reach block z, forming a path Pxz.
○​ We can use a depth-first search (DFS) to determine all the blocks reachable
from x, ensuring we find Pxz.
3.​ Complexity per DFS:
○​ In the worst case, for a dense graph with ∣V∣ blocks and ∣V∣*|V| edges, a single
DFS takes O(∣V∣2) time.
4.​ Total Complexity:
○​ Since we perform this DFS for each of the ∣V∣ variables, the total complexity is:
O(|V|*∣V∣2)=O(∣V∣3)

Why This Complexity is High

●​ The cubic complexity O(∣V∣3) becomes impractical for large programs with many blocks
and variables.
●​ While a single DFS is manageable, performing ∣V∣ DFS traversals quickly adds up in
terms of both time and memory usage.

838
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Implications

Even though this method avoids explicitly enumerating all paths Pxz and Pyz, the need to
perform multiple graph traversals for each variable still makes it computationally expensive. This
inefficiency highlights the need for a more targeted approach to determine where phi-functions
should be placed.

Is there a better approach to place phi-functions then?


Yes, there is! Actually, there are many ways to do it! The original algorithm for placing
phi-functions efficiently is based on the notion of dominance frontiers. This algorithm was
introduced by Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F.
Kenneth Zadeck in their seminal 1991 paper, Efficiently Computing Static Single Assignment
Form and the Control Dependence Graph. This algorithm revolutionized how SSA form is
constructed and is still a cornerstone of modern compiler design.

Key Intuition Behind the Algorithm

The core insight of the algorithm is to leverage dominance relationships in the control-flow
graph (CFG) to efficiently determine where phi-functions are required, without exploring all
possible paths explicitly. The intuition is based on the concept of the dominance frontier.

What is a Dominance Frontier?

The dominance frontier (DF) of a node x in a CFG is the set of nodes z such that:

1.​ x dominates a predecessor of z (i.e., there’s at least one control-flow edge entering z
that comes from a block dominated by x).
2.​ x does not dominate z itself.

In other words:

●​ The dominance frontier identifies where control flow merges, and the dominance of a
definition no longer extends unambiguously to all subsequent blocks.
●​ These are precisely the locations where multiple definitions of a variable could reach and
where phi-functions are needed to resolve ambiguity.

Key Intuition: Phi-Functions and the Dominance Frontier

●​ If a variable b is defined in block x, and control flow can reach block z through multiple
paths (e.g., one path where b was defined in x, and another where it was defined
elsewhere), then z lies in the dominance frontier of x.
●​ By computing the dominance frontiers of all blocks, the algorithm efficiently identifies all
points where phi-functions are necessary for every variable.

839
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Iterated Dominance Frontiers

The algorithm uses an extension of dominance frontiers, known as iterated dominance


frontiers, to handle cases where multiple definitions propagate through the CFG:

1.​ Start with blocks where a variable is directly defined.


2.​ Repeatedly compute the dominance frontier of these blocks to capture all the locations
where phi-functions are required, iterating until no new phi-function placements are
discovered.

This iterative process ensures that all necessary phi-functions are placed, even in complex
CFGs with nested and interconnected control flow.

Why is This Approach Efficient?

1.​ Avoids Explicit Path Exploration:


○​ Instead of exploring every path in the CFG, the algorithm relies on dominance
frontiers, which are computed locally for each block.
2.​ Scales with CFG Complexity:
○​ The approach works well even for large and dense CFGs, significantly reducing
the computational cost compared to naive methods.

Conclusion

The algorithm introduced by Cytron et al. is grounded in the observation that the dominance
frontier of a block concisely captures where ambiguity arises in variable definitions. By iterating
over these frontiers, the algorithm efficiently determines the placement of phi-functions, paving
the way for SSA form construction. This elegant use of dominance relationships has made the
algorithm a foundational tool in modern compiler design.

What's dominance? How do we compute it?


In a rooted, directed graph, dominance is a fundamental concept used in compiler optimization
to analyze control flow. A node d in the graph dominates another node n if every path from the
root node to n passes through d.

●​ The root node always dominates itself, as there is no path to any other node that
bypasses it.
●​ Intuitively, if a node d dominates a node n, then d must be encountered in the control
flow before n on any possible execution path.

Example

840
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Computing Dominators Using Equations

To determine the set of dominators for each node in the CFG, we can use the following iterative
algorithm based on equations:

Equations to Compute Dominators

1.​ Base Case (Root Node):


○​ The root node (s0) dominates itself: D[s0]=s0
2.​ General Case (Other Nodes):
○​ For any node n ≠ s0, the dominators of n are:
■​ The node n itself, plus:
■​ The intersection of the dominator sets of all its predecessors:
D[n]=n + (∩p∈pred[n])D[p]

Iterative Computation

●​ Initialization:
○​ For the root node s0: D[s0] = {s0}.
○​ For all other nodes n: D[n] is initialized to all nodes in the CFG.​
D[n] = {all nodes in the CFG} for n ≠ s0
●​ Iteration:
○​ For each node n, repeatedly update D[n] using the equation: D[n] = {n} ∪
(intersection of D[p] for each p in pred[n])
○​ The intersection operator ensures that the dominator set for n becomes
progressively smaller with each iteration.
●​ Termination:
○​ The algorithm stops when no dominator set changes during an iteration.

Why This Algorithm Works

1.​ Convergence:
○​ Each update to D[n] only makes the set smaller, eventually stabilizing when no
further updates occur.
2.​ Correctness:

841
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ By starting with the largest possible sets and refining them iteratively, the
algorithm ensures that all nodes satisfying the dominance property are included
in D[n] while eliminating any nodes that do not dominate n.

Example

The figure below shows, for each node of a rooted CFG, its set of dominators. Thus, {a}, being
the root, is only dominated by itself. Node b, in turn, is dominated by nodes a and b itself, and
so on and so forth:

Key Insights

●​ Dominance as a Global Property:


○​ Dominance relationships depend on the entire structure of the CFG, not just local
connections.
●​ Iterative Refinement:
○​ The use of the intersection operator ensures that the dominator set for a node n
reflects the paths to n through its predecessors.

Practical Importance

842
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Dominators are critical in many compiler optimizations:

●​ They are used to build dominator trees, which represent the hierarchical dominance
relationships in a CFG.
●​ They help identify natural loops, critical for loop optimizations.
●​ They are the basis for computing dominance frontiers, which are key to placing
phi-functions in SSA form.

Now that we have the dominance relations, how do we compute the


dominance frontier of a CFG node?
The dominance frontier of a node x is computed using a combination of local dominance
information and information propagated from its children in the dominator tree.

Definition of Dominance Frontier

1.​ Strict Dominance:


○​ A node x strictly dominates a node w if x dominates w and x != w.
2.​ Dominance Frontier:
○​ The dominance frontier of a node x is the set of all nodes w such that:
■​ x dominates a predecessor of w, and
■​ x does not strictly dominate w.

Recursive Equation for Dominance Frontier:​



DF[x] = DF_local[x] | {DF_up[c] for c in children[x]}

Components of the Dominance Frontier

1.​ Local Dominance Frontier (DF_local[x]):


○​ DF_local[x] = {w for w in successors[x] if not
strictly_dominates(x, w)}
○​ This set includes all successors of x that are not strictly dominated by x.
2.​ Upward Propagation (DF_up[c]):
○​ DF_up[c] = {w for w in DF[c] if not strictly_dominates(x, w)}
○​ This set includes all nodes in the dominance frontier of a child c that are not
strictly dominated by x.
3.​ Children in the Dominator Tree (children[x]):
○​ These are the nodes directly dominated by x in the dominator tree.

Recursive Computation

The dominance frontier is computed recursively:

843
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

DF[x] = DF_local[x].union(
w for c in children[x] for w in DF_up[c]
)

Example Walkthrough

For a node e with children f and g in the dominator tree, see its dominance frontier in the figure
below:

If a variable is defined at a node e, then we need to insert phi-functions in


the dominance frontier of e. Why?

Why Phi-Functions Are Inserted on the Dominance Frontier

The dominance frontier of a node in the control-flow graph (CFG) identifies the points where
control flow merges and ambiguity about variable definitions can arise. Let’s use the figure to
understand why the dominance frontier is the place to insert phi-functions, assuming a
well-formed program where every variable is defined before it is used. Let's use the figure below
to provide some intuition on why the dominance frontier is the place to insert phi-functions:

844
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Example: Variable v Defined at Node e

1.​ Dominance of Node e:


○​ The figure shows that node e contains a definition of the variable v.
○​ Node e dominates nodes f, g, and h. This means that all paths to these nodes
pass through e, and the definition of v at e is the only one that can reach any use
of v in these blocks.
2.​ No Phi-Functions in Dominated Nodes:
○​ Since e dominates f, g, and h, there’s no ambiguity about which definition of v is
used in these nodes.
○​ Therefore, it is unnecessary to insert phi-functions in f, g, or h.

Ambiguity at Node k

1.​ Non-Dominated Block:


○​ Node k is not dominated by e. This means there exists a path to k that does not
pass through e.
2.​ Multiple Paths to k:
○​ The block k can be reached via two distinct paths:
■​ One path goes through e, carrying the definition of v from e.
■​ Another path bypasses e, potentially carrying a different value of v (e.g.,
defined in nodes i or j).
3.​ Need for a Phi-Function:
○​ Since the use of v in k could come from multiple definitions, we need to resolve
this ambiguity.
○​ A phi-function is inserted at k to "join" the potential definitions of v from:
■​ v:e (defined at e),

845
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

■​ v:i (defined at i), and


■​ v:j (defined at j).

The phi-function looks like this: v = phi(v:e, v:i, v:j)

Intuition for the Dominance Frontier

●​ Node k lies in the dominance frontier of node e because:


1.​ e dominates a predecessor of k.
2.​ e does not strictly dominate k itself.
●​ The dominance frontier captures precisely where definitions of variables, such as v from
e, must be joined with other potential definitions.

Summary

Phi-functions are inserted on the dominance frontier of a node because these are the first points
where control flow from different paths merges, and ambiguity about variable definitions can
arise. In the case of node e:

●​ Nodes f, g, and h require no phi-functions because e dominates them.


●​ Node k requires a phi-function to resolve which definition of v applies, as k lies in the
dominance frontier of e.

This placement ensures that every use of v in the program can be unambiguously resolved to
the correct definition.

Could you provide some Python code to compute the dominance frontier of
a node?
Here's the Python code, assuming a class-based structure for handling the dominance tree and
related computations:

class DominanceTree:
def __init__(self, cfg, idom_map, dom_tree):
"""
Initialize the dominance tree representation.

Args:
cfg: The control-flow graph.
idom_map: A dictionary mapping each node to its immediate
dominator.
dom_tree: A dictionary mapping each node to its children in the
dominator tree.
"""

846
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

self.cfg = cfg
self.idom_map = idom_map
self.dom_tree = dom_tree
self.dominance_frontier = {node: set() for node in cfg.nodes}

def idom(self, node):


"""
Returns the immediate dominator of the given node.

Args:
node: The node for which to find the immediate dominator.

Returns:
The immediate dominator of the node.
"""
return self.idom_map.get(node, None)

def dominates(self, dom_node, target_node):


"""
Checks if `dom_node` dominates `target_node`.

Args:
dom_node: The potential dominator node.
target_node: The target node.

Returns:
True if `dom_node` dominates `target_node`, False otherwise.
"""
# This should be implemented based on the dominator tree structure
pass

def compute_dominance_frontier(self, node):


"""
Computes the dominance frontier for the given node.

Args:
node: The node for which to compute the dominance frontier.
"""
S = set()

# Local dominance frontier


for succ in self.cfg.successors(node):
if self.idom(succ) != node:
S.add(succ)

# Propagate from children in the dominator tree


for child in self.dom_tree.get(node, []):
self.compute_dominance_frontier(child)
for w in self.dominance_frontier[child]:

847
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

if not self.dominates(node, w) or node == w:


S.add(w)

self.dominance_frontier[node] = S

Explanation

1.​ Method Signatures:


○​ idom(node): Returns the immediate dominator of a node.
○​ dominates(dom_node, target_node): Checks if dom_node dominates
target_node.
○​ compute_dominance_frontier(node): Recursively computes the
dominance frontier for node using the provided pseudo-code.
2.​ Data Structures:
○​ self.cfg: Represents the control-flow graph, assumed to have methods like
successors(node).
○​ self.idom_map: Maps each node to its immediate dominator.
○​ self.dom_tree: Represents the dominator tree, mapping each node to its
children.
○​ self.dominance_frontier: A dictionary mapping nodes to their dominance
frontiers.
3.​ Recursive Traversal:
○​ The function computes DF[node] for a node by:
■​ Adding successors that are not strictly dominated by node.
■​ Recursively processing children in the dominator tree and propagating
their dominance frontier contributions.

But if we insert a phi-function, we are actually creating a new definition of a


variable. Wouldn't that lead to a "chain-reaction"?
You are right! When constructing SSA form, the placement of phi-functions is guided by the
dominance frontier. However, placing phi-functions can create new definitions of variables.
These new definitions can themselves influence where additional phi-functions are needed,
leading to a "chain reaction" of updates. To ensure all necessary phi-functions are placed, we
must compute the iterated dominance frontier.

The Problem: New Definitions Trigger New Requirements

1.​ Inserting a Phi-Function Creates a New Definition:


○​ When a phi-function for a variable v is inserted at a node, it introduces a new
definition of v.

848
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ This new definition might need to be "joined" with other definitions downstream in
the control-flow graph (CFG).
2.​ Chain Reaction of Phi-Function Placement:
○​ The new definition from the phi-function could make other nodes meet the criteria
for needing a phi-function.
○​ This process continues until no further phi-functions are required, forming a
"chain reaction."
3.​ Example:​
Suppose a variable v is defined at node e, and its dominance frontier includes node k. A
phi-function is placed at k: v = phi(v:e, v:j, v:i)
○​ This phi-function introduces a new definition v:k.
○​ If another node (e.g., l) lies in the dominance frontier of k, and there is a use of
v in l, then we must insert another phi-function at l to resolve the ambiguity.

Iterated Dominance Frontier

To handle this chain reaction systematically, we compute the iterated dominance frontier. This
concept ensures that all necessary phi-functions are placed by iterating the dominance frontier
computation until no new definitions require further phi-functions.

Definition: The iterated dominance frontier (IDF) for a variable v is the transitive
closure of the dominance frontier:

1.​ Start with the blocks where v is originally defined.


2.​ Compute the dominance frontier for these blocks.
3.​ Add any new blocks to the set of definitions and recompute their dominance frontiers.
4.​ Repeat until no new blocks are added.

Why the IDF is Necessary

1.​ Ensures Correctness:


○​ The IDF ensures that all control-flow joins where ambiguity arises are handled,
even when new definitions propagate the need for phi-functions further
downstream.
2.​ Efficient Placement:
○​ Instead of recalculating phi-function placement globally, the IDF focuses the
computation on affected nodes, avoiding redundant work.
3.​ Handles Complex CFGs:
○​ In CFGs with nested or interconnected control flow, the IDF guarantees that the
SSA property (every definition dominates all uses) is maintained across the entire
program.

Example that shows the iterated dominance frontier

849
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

The figure below shows the iterated dominance frontier of four different nodes:

Conclusion

The iterated dominance frontier is a critical concept for constructing SSA form. By iteratively
placing phi-functions and recomputing dominance frontiers, it handles the cascading effect of
new definitions, ensuring that every use of a variable is unambiguously tied to a unique
dominating definition. This systematic approach is what makes SSA construction efficient and
robust for complex programs.

So, if a variable x is defined at a node f, we will need to insert phi-functions


in the iterated dominance frontier of f?
Yes, exactly! The figure below illustrates the concept of the iterated dominance frontier. The
figure emphasizes the cascading effect of new definitions introduced by phi-functions,
demonstrating why the iterated dominance frontier is a systematic solution to maintaining SSA
form in all control-flow scenarios:

850
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

When a variable x is defined at a specific node (in this case, node f), phi-functions must be
inserted at nodes within the iterated dominance frontier (IDF) of f. This is necessary to ensure
that every use of x is properly dominated by its unique definition, as required by the SSA form.

How the Figure Represents the IDF

1.​ Node f:
○​ The initial definition of x is placed here.
2.​ First Pass (Dominance Frontier):
○​ The dominance frontier of f is calculated. Nodes d and h are identified, as they
are control-flow merge points where definitions from f reach multiple paths.
3.​ Second Pass (Iterated Dominance Frontier):
○​ Adding the phi-functions at d and h creates new definitions of x, which propagate
further.
○​ The dominance frontier of these newly added definitions expands the scope to
nodes like e, l, and k, requiring additional phi-functions.
4.​ Iterative Process:
○​ This process continues until no new nodes meet the criteria for needing
phi-functions. In the figure, nodes d, h, e, l, and k collectively form the iterated
dominance frontier of f.

851
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Why This Process Is Necessary

●​ Guarantees SSA Correctness: Without the IDF, some uses of x would not be
dominated by their definitions, breaking the core SSA Property.
●​ Handles Complex Control Flow: CFGs with loops and branches (like in this example)
require iterative placement of phi-functions to ensure all potential paths are covered.
●​ Minimizes Redundancy: The IDF places phi-functions only where absolutely necessary,
avoiding unnecessary overhead.

Could you write some Python code to insert these definitions?


Below is a Python "pseudo-code" that does just that. This high-level Python code assumes the
existence of helper functions and data structures like dominance_frontier, predecessors,
and methods to manage definitions and insert phi-functions. It focuses on clarity and modularity.

class CFG:
def __init__(self):
# A dictionary mapping each node to its set of original variable
# definitions
self.Aorig = {}
# A dictionary mapping each node to its dominance frontier
self.DF = {}
# A dictionary to store inserted phi-functions for each node
self.Aphi = {}
# A dictionary mapping variables to their definition sites
self.defsites = {}
# A graph representation where each node has a list of predecessors
self.predecessors = {}

def place_phi_functions(self):
# Step 1: Initialize defsites
for node, variables in self.Aorig.items():
for var in variables:
if var not in self.defsites:
self.defsites[var] = set()
self.defsites[var].add(node)

# Step 2: Process each variable to insert phi-functions


for var in self.defsites:
W = list(self.defsites[var]) # Worklist of definition sites
while W:
n = W.pop() # Remove a node from the worklist
# Iterate over the dominance frontier of n
for y in self.DF[n]:
if var not in self.Aphi.get(y, set()):
self.insert_phi(y, var)
self.Aphi.setdefault(y, set()).add(var)
# Add y to the worklist if var is not originally

852
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

# defined there
if var not in self.Aorig.get(y, set()):
W.append(y)

def insert_phi(self, block, var):


# Mock implementation for inserting a phi-function
# In a real implementation, this would modify the IR or CFG
num_predecessors = len(self.predecessors.get(block, []))
phi_function = f"{var} = φ({', '.join([var] * num_predecessors)})"
print(f"Inserting phi-function at block {block}: {phi_function}")

Explanation:

1.​ Initialization:
○​ Aorig: Stores the original variables defined at each node.
○​ defsites: Maps each variable to the nodes where it is defined.
○​ DF: Precomputed dominance frontier for each node.
○​ Aphi: Keeps track of nodes where phi-functions are inserted.
2.​ Worklist Algorithm:
○​ For each variable, a worklist W is initialized with its definition sites.
○​ The dominance frontier of each node in the worklist is examined, and
phi-functions are inserted as needed.
3.​ Insert Phi-Functions:
○​ The insert_phi method simulates the insertion of phi-functions by generating
a string with the appropriate number of arguments based on the node's
predecessors.
4.​ Mock CFG:
○​ An example CFG structure is provided with nodes, dominance frontiers, and
predecessor relationships.

How do you know that this algorithm terminates?


Great question! The termination of this algorithm can be analyzed by considering the following
key points:

1. Bounded Number of Nodes in the CFG

The algorithm iterates over nodes in the control flow graph (CFG). Since the CFG has a finite
number of nodes, the possible iterations are inherently bounded. Each node can only be added
to the worklist WW a limited number of times.

2. Phi-Functions Are Inserted Only Once per Variable per Node


For each variable a, the algorithm ensures that a phi-function for a is inserted at a node y only
once. This is guaranteed by the condition: if var not in self.Aphi.get(y, set()):

853
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Once a phi-function is inserted at y, the variable a is added to Aphi[y], ensuring that


the same (y,a) pair is not revisited in future iterations.

3. Worklist Is Finite

●​ The worklist W is initialized with the definition sites of a variable a, which is finite because
there are a finite number of nodes in the CFG and each node can define only a finite
number of variables.
●​ Nodes are only added back to the worklist if:
○​ A phi-function is inserted at a node y, and
○​ The variable a is not already defined in Aorig[y].
●​ Thus, the additions to W are limited by the size of the CFG and the number of variables.

4. Each Node’s Dominance Frontier Is Finite

●​ For a given node n, the dominance frontier DF[n] contains a finite number of nodes
because it is derived from the CFG, which has a finite size. Therefore, the inner loop:
for y in DF[n]:iterates over a finite set of nodes.

5. Progress Is Made in Each Iteration

●​ In each iteration, at least one node nn is removed from the worklist W. This ensures that
the algorithm progresses and does not get stuck in an infinite loop.

High-Level Reasoning

●​ The algorithm operates on a finite CFG.


●​ It ensures that phi-functions are inserted at most once per variable per node.
●​ The worklist size is bounded, and each iteration removes a node from the worklist,
making progress toward termination.

Example to Reinforce Understanding

Imagine a CFG with 10 nodes, where each node can define up to 3 variables:

●​ There are at most 10×3=30 definition sites.


●​ For each definition site, the dominance frontier contains at most 10 nodes (a worst-case
upper bound).
●​ Each variable can trigger at most 30×10=300 phi-function checks across all nodes.
●​ Since phi-functions are inserted only once per variable per node, the algorithm
terminates after a finite number of steps.

By this reasoning, the algorithm is guaranteed to terminate.

854
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Recall that we have a function to rename variables within basic blocks:


rename_basic_block. How can we rename variables in the entire
program, perhaps reusing that function?
Here's a Python implementation of global variable renaming, using the previously defined
rename_basic_block function and maintaining consistency with its structure:

def rename(node, CFG, Count, Stack):


"""
Renames variables in the program starting from a given node to ensure
single assignment.

Args:
node: The current node in the control flow graph (CFG) being
processed.
CFG: The control flow graph represented as a dictionary where keys
are nodes and values are lists of successors.
Count: A dictionary mapping each variable to its current count.
Stack: A dictionary mapping each variable to a stack of assigned
indices.
"""
# Rename variables in the current basic block
rename_basic_block(node['instructions'], Count, Stack)

# Process phi-functions in successors


for successor in node['successors']:
for phi in successor['phi_functions']:
variable = phi['operand']
predecessor_index = successor['predecessors'].index(node)
i = top(Stack[variable])
phi['operands'][predecessor_index] = f"{variable}{i}"

# Recur for children in the CFG


for child in node['children']:
rename(child, CFG, Count, Stack)

# Pop variable stacks for all variables defined in the current block
for instruction in node['instructions']:
for variable in get_defined_variables(instruction):
Stack[variable].pop()

# Helper functions for stack operations and CFG structure


def top(stack):
"""
Returns the top element of a stack.

Args:

855
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

stack: A list representing the stack.

Returns:
The top element of the stack.
"""
return stack[-1]

Key Points:

1.​ CFG Representation: The control flow graph (CFG) is represented as a dictionary where
each node contains its instructions, successors, predecessors, and phi-functions.
2.​ Phi-Function Update: For each successor, the phi_functions field is updated with
the correct renamed variables.
3.​ Recursive Traversal: The function traverses the CFG recursively, processing each node
and its children.
4.​ Stack Cleanup: After processing a node, the stacks of variables defined in that node are
popped to maintain the correct state.

Walking over an example:


The figure below, taken from Appel's book, Modern Compiler Implementation in Java, illustrates
how variables are renamed after the insertion of phi-functions:

856
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

I would like to take a look into some real-world implementation of SSA


construction. Could you suggest something to me?
Yes, let's take a look into LLVM's PromoteMemoryToRegister.cpp. This code is
responsible for promoting memory references to register references, which includes inserting
phi-functions to convert code into Static Single Assignment (SSA) form. This process involves
identifying variables stored in memory (e.g., those allocated with alloca) and replacing their
memory operations with register operations.

An important part of this transformation is inserting phi-functions at the appropriate points in the
control flow graph (CFG). The algorithm uses the concept of the dominance frontier to
determine where these phi-functions should be placed. By iterating over the dominance frontiers
with a worklist, the algorithm ensures that phi-functions are inserted in all necessary locations to
accurately represent the variable's flow in SSA form. You can find the interface of many of the
key data structures in the LLVM code base, e.g:

●​ Iterated Dominance Frontier


●​ Dominators

This approach aligns with the standard method of placing phi-functions in SSA construction,
ensuring that variables have a single assignment point while correctly modeling the program's
control flow. However, the code has changed over time. It's much easier to recognize the
algorithm we just described in the old implementation of PromoteMemoryToRegister.cpp.

The original implementation of SSA Construction in LLVM talks about


"pruned SSA Form". What's that?
The concepts of Minimal SSA Form and Pruned SSA Form were first introduced by Cytron et
al. in their seminal 1991 paper "Efficiently Computing Static Single Assignment Form and the
Control Dependence Graph."

Minimal SSA Form

Minimal SSA form refers to the placement of phi-functions such that the program satisfies the
SSA property—each variable has a single, unique definition—even if some of the phi-functions
inserted are unnecessary. The algorithm ensures that phi-functions are inserted at every point
required by the dominance frontier of a definition. This placement is considered "minimal"
according to the definition of SSA because it guarantees the fewest possible phi-functions that
uphold the correctness of SSA form. However, this approach does not consider whether the
variable is actually live at the program point where the phi-function is inserted.

As a result, minimal SSA form can include phi-functions for variables that are not live (i.e.,
no longer used after their definition). These dead variables may create unnecessary
overhead during optimization and code generation. For example, in the control flow graph

857
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

below, a phi-function for variable i would exist at L1 simply because it is in the dominance
frontier of L2, and there exists a definition of i there, even if i is not used later.

Pruned SSA Form

Pruned SSA form improves upon minimal SSA form by adding a liveness check during the
phi-function placement phase. In this version, phi-functions are only inserted at points where the
variable is both required by the dominance frontier and alive. A variable is alive at a program
point if there exists a path from that point to a later use of the variable without an intervening
redefinition.

By eliminating unnecessary phi-functions for dead variables, pruned SSA form produces a more
compact and efficient SSA representation. This optimization is particularly important in
large-scale programs, as it can reduce the number of phi-functions and simplify subsequent
compiler optimizations.

Practical Implications

While minimal SSA form is simpler to compute, pruned SSA form is often preferred in practical
compilers because it avoids redundant operations and improves efficiency. The distinction
between these forms is important for compiler developers to understand when implementing
SSA construction algorithms.

Summary

858
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Minimal SSA Form ensures correctness with the smallest number of phi-functions
required by the dominance frontier, regardless of liveness.
●​ Pruned SSA Form refines this by checking variable liveness, reducing the number of
phi-functions and avoiding those for dead variables.

The distinction between these two forms underscores the importance of combining correctness
with efficiency in compiler design.

But one thing is missing: how can we implement phi-functions in assembly


code?
To eliminate phi-functions in Static Single Assignment (SSA) form, we replace them with explicit
move instructions. This transformation is commonly required during the translation from SSA to
a non-SSA intermediate representation. The process can be divided into two key steps:

1.​ Replacing Phi-Functions with Copies​


The standard strategy for eliminating phi-functions is to insert copy instructions at the
origin of each control-flow edge that provides a value to the phi-function. For example, if
we have a phi-function such as b2 = phi(b1, ...) in block L4, we replace it with a
copy b2 = b1 along the edge L3→L4. Note that inserting an instruction "on an edge"
conceptually means placing it at the end of the source block of that edge, such as L3 in
this case. This ensures that the value flows correctly to the target block without
disrupting program semantics.

2.​ Eliminating Critical Edges​


While straightforward, the approach described above can lead to complications when
there is no suitable insertion point for the copies, as occurs with critical edges. A critical
edge exists between a block with multiple successors and a block with multiple
predecessors. For example, in the left CFG of the second figure, inserting a copy at L2
before the edge to L4 may result in redundant computations if the execution flows to L3
instead. To address this, we eliminate the critical edge by splitting it, as shown in the
right CFG of the second figure. The edge L2→L4 is replaced by a new intermediate
block, which serves as a proper insertion point for the copy instruction, thus avoiding
redundancy and maintaining the efficiency of the generated code.

859
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Have you ever heard of a problem called the "Lost Copy"?


The lost-copy problem is a well-known issue in SSA-elimination that arises when phi-related
variables have overlapping live ranges. Two variables, v0 and v1, are considered phi-related
under the following conditions:

1.​ There exists a phi-function v0=phi(...,v1,...).


2.​ There exists a phi-function ...=phi(...,v0,...,v1,...).
3.​ There exists a variable u that is phi-related to both v0 and v1.

An SSA program is said to be in Conventional SSA Form (CSSA) if all phi-related variables
have non-overlapping live ranges. When this condition is met, SSA-elimination typically
proceeds without issues. However, problems can arise when the CSSA form is broken. Let’s
explore the issue using the figure:

Example Walkthrough

860
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ (a): Original Program​


The program reads a value, increments it in a loop, and prints the second-to-last value.​

2.​ (b): Program in SSA Form​


The SSA transformation introduces unique variable versions for each assignment. Here,
x1, x2, and x3 are created to ensure the single-assignment property. The phi-function at
block L0 combines x1 and x2.​

3.​ (c): SSA-Elimination in CSSA Form​


When the program is in CSSA, eliminating phi-functions by introducing copies works
correctly. For instance, the phi-function at block L0 is replaced by copying x1 to x3 in L1,
ensuring correctness.​

4.​ (d): Breaking CSSA Form through Copy Propagation​


Copy propagation introduces overlapping live ranges for x2 and x3, both of which are
phi-related. Now, x2 and x3 are simultaneously live, violating CSSA.​

5.​ (e): SSA-Elimination in Non-CSSA Form​


In this case, eliminating the phi-function results in overwriting x3 with the value of x2,
which leads to incorrect behavior. The original value of x3 needed for later computations

861
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

is lost.​

6.​ (f): Resolving the Issue by Breaking the Critical Edge​


To address the problem, we break the critical edge L2→L3 by introducing a new block.
This block serves as a proper insertion point for the copy x3=x2, ensuring that the
overlapping live ranges do not cause a loss of value.

By ensuring that CSSA form is preserved or by eliminating critical edges before


SSA-elimination, we can avoid the lost-copy problem and maintain the correctness of the
generated code.

What about the "Swap-Problem", have you heard of it?


The swap problem occurs when a phi-function argument is defined by another phi-function
within the same block. This creates a situation where phi-related variables have overlapping live
ranges, breaking the assumptions of Conventional SSA Form (CSSA). As with the lost-copy
problem, this overlap complicates the process of eliminating phi-functions. Let’s explore this
issue using the example:

Example Walkthrough

862
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ (a): Original Program​


The original program swaps the contents of two variables, x and y. This operation is a
fundamental routine in sorting algorithms and other applications.​

2.​ (b): Program in SSA Form​


When converted to SSA form, unique versions of variables are introduced. The program
adheres to CSSA, ensuring no overlapping live ranges for phi-related variables.
Eliminating phi-functions by replacing them with move instructions would work
seamlessly here.​

3.​ (c): Breaking CSSA with Copy Propagation​


After applying the compiler optimization called copy propagation, the program loses its
CSSA property. Variables x3 and y3 become phi-related with overlapping live ranges.
This change introduces a challenge for phi-function elimination.​

4.​ (d): Incorrect SSA-Elimination in Non-CSSA Form​


If we attempt to eliminate phi-functions by replacing them with copies, the semantics of

863
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

the program are broken. Specifically, x3 is overwritten before its value is copied to y3,
leading to incorrect results.​

5.​ (e): Breaking Critical Edges Does Not Solve the Problem​
Unlike the lost-copy problem, breaking critical edges does not resolve the swap problem.
Even after splitting edges, the issue of overwriting x3 before copying it to y3 persists.​

6.​ (f): Correct Solution Using a Temporary Variable​


The solution is to introduce a temporary variable t1. This temporary allows us to
implement the operation as a parallel copy:​
(x3,y3):=(y3,x3)​
Using t1, the process becomes:​

○​ t1 = x3
○​ x3 = y3
○​ y3 = t1

Alternative Solution: Reverting to CSSA Form

Another approach is to restore the program to CSSA form. This involves inserting additional
temporary locations before and after each phi-function, effectively isolating the live ranges of
phi-related variables. However, this method can be computationally expensive and may
introduce significant overhead.

Implementing Parallel Copies

Eliminating phi-functions inherently requires implementing parallel copies. Parallel copies can
be realized using a combination of swap and copy instructions. Architectures may provide
specialized instructions to swap the contents of two registers, or we can use techniques like the
3-xor trick for efficient swapping.

Conclusion

The swap problem demonstrates the importance of preserving CSSA properties during
SSA-elimination. While breaking critical edges can address simpler issues like the lost-copy
problem, the swap problem necessitates careful handling through temporary variables or
parallel copy implementations. By addressing these challenges, compilers ensure the
correctness and efficiency of the transformed code.

So, was the SSA form invented in Cytron et al's paper?


Well, perhaps a bit before that. Static Single Assignment (SSA) form as we know it today was
introduced by Barry K. Rosen, Keith H. Wegman, and F. Kenneth Zadeck in their 1988 paper
titled "Global Value Numbers and Redundant Computations". Nevertheless, even before that
people were already aware of the benefits of keeping a program representation where every

864
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

variable has a single name, as we can see in the paper "Code Motion of Control Structures in
High-Level Languages".

This work was presented at the 15th ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages (POPL). The authors developed SSA form as a way to simplify
program analysis and optimization by transforming the program so that each variable is
assigned exactly once. This transformation made data flow analysis more efficient and enabled
powerful optimizations like constant propagation, dead code elimination, and common
subexpression elimination.

Their algorithm was further refined by Cytron et al. in 1991 in their paper "Efficiently Computing
Static Single Assignment Form and the Control Dependence Graph". Cytron’s work introduced
the notion of dominance frontiers as a mechanism to compute the placement of phi-functions
efficiently, which is a critical step in converting programs to SSA form. This refinement
significantly reduced the complexity of SSA construction, making it practical for real-world
compilers.

In short:

●​ 1988: SSA form was invented by Rosen, Wegman, and Zadeck.


●​ 1991: Cytron et al. introduced dominance frontiers and efficient SSA construction
algorithms, popularizing its use in modern compilers.

Zadeck seems to have done quite a lot in this domain! Are you familiar with
his work?
Yes, F. Kenneth Zadeck is a highly influential figure in computer science, particularly in the
areas of program analysis, compiler optimization, and formal methods. His work has had a
lasting impact on both theoretical and practical aspects of compiler design, software
engineering, and optimization.

865
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Below are some of his most notable contributions:

1. Static Single Assignment (SSA) Form (with Barry K. Rosen and Keith H.
Wegman)

●​ SSA form is one of Zadeck's most famous contributions. As mentioned earlier, SSA
simplifies the process of program optimization by ensuring that each variable is assigned
only once. This form enables efficient data flow analysis and various optimizations, such
as constant propagation, dead code elimination, and more. Zadeck, along with his
collaborators, introduced the original idea and demonstrated its power in simplifying
compiler optimization tasks.

2. Global Value Numbering (GVN)

●​ Zadeck also contributed to the development of Global Value Numbering (GVN), an


optimization technique used in compilers to detect common subexpressions. The idea
behind GVN is to assign a "value number" to each expression in the program, with
equivalent expressions receiving the same number. By identifying common
subexpressions and eliminating redundant computations, GVN helps improve the
efficiency of generated code. This technique has been widely used in modern compilers.

3. Control Dependence Graph (CDG) and Dominator Tree

●​ In the context of compiler optimizations, Zadeck made contributions to the control


dependence graph (CDG), which helps optimize control flow in programs. The CDG is

866
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

used in conjunction with the dominance tree to perform optimizations like loop invariant
code motion and predicate splitting. His work in this area enhanced how compilers
analyze control flow and improve the execution of programs.

4. Data Flow Analysis

●​ Zadeck’s research also focused on data flow analysis, which involves tracking the flow
of data through a program to optimize it. This includes techniques like reaching
definitions, live variable analysis, and available expressions. His contributions in this
area have made it easier to reason about how data moves through programs and to
apply optimizations based on that analysis.

5. Contributions to Formal Methods

●​ Zadeck's contributions extend beyond compiler optimization into the field of formal
methods, where he has worked on rigorous techniques for specifying, analyzing, and
verifying software systems. His work on program correctness and formal verification has
influenced how we think about program reliability and security.

6. Work on Abstract Interpretation

●​ Zadeck made contributions to abstract interpretation, a method for analyzing programs


by approximating the set of possible values that variables can take during execution.
This approach is fundamental for performing static analysis of programs and finding
errors or inefficiencies in the code without running it. Abstract interpretation has been
widely used in various verification and optimization tools.

7. Influence in Compiler Construction

●​ Zadeck’s work has had a significant influence on compiler construction and optimization
tools used in the industry. Many modern compilers, such as GCC, LLVM, and others,
incorporate ideas from his research. His contributions have helped shape the
development of efficient and reliable compilers used in a wide range of applications, from
general-purpose programming to embedded systems and high-performance computing.

Key Papers:

●​ "Global Value Numbers and Redundant Computations" (1988): This paper,


co-authored with Rosen and Wegman, introduces SSA form and value numbering, which
are central to modern compiler optimization techniques.
●​ "Efficiently Computing Static Single Assignment Form and the Control
Dependence Graph" (1991): Co-authored with Cytron et al., this paper presents an
efficient algorithm for constructing SSA form and introduces the concept of dominance
frontiers.

867
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Recognition:

●​ Zadeck’s work has been recognized by the computer science community through
numerous awards, citations, and honors. His research has had a profound impact on
how compilers optimize programs and how software reliability is ensured.

In summary, F. Kenneth Zadeck is a foundational figure in the development of compiler


optimization techniques, particularly in the areas of SSA form, global value numbering,
data flow analysis, and control dependence graphs. His contributions have made modern
compilers more efficient and reliable, and his work continues to influence the design and
analysis of software systems today.

868
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

SSA-Based Register Allocation


What is Register Allocation?
Register allocation is the process of mapping the variables that a program manipulates to
physical locations, either registers or memory. Memory is abundant but significantly slower than
registers. Registers, in contrast, are few in number—modern architectures typically have
between 16 and 32 general-purpose registers (e.g., 16 in x86-64 and 32 in RISC-V)—but they
provide much faster access. To optimize program performance, it is essential to map frequently
used variables to registers whenever possible. The figure below illustrates the impact of register
allocation on the binary that the compiler produces for a program:

This task is more than a simple mapping; it combines several complex decision problems with
interesting theoretical foundations:

1.​ Register Assignment Problem​


This problem asks whether it is possible to compile a program using K registers such

869
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

that every variable remains in one or more registers throughout its lifetime. This involves
ensuring that the allocation does not exceed the available number of registers at any
point in the program.​

2.​ Spilling Problem​


When the number of live variables exceeds K, some variables must be "spilled" to
memory. The spilling problem asks whether a program can be compiled using K registers
and no more than L spilled variables, with memory accessed using load and store
instructions to retrieve and save their values.​

3.​ Memory Access Problem​


Beyond determining which variables to spill, this problem focuses on minimizing the
performance cost associated with spilling. Specifically, it asks whether a program can be
compiled using K registers while keeping the number of memory accesses below given
thresholds: S store instructions and L load instructions.​

4.​ Register Coalescing Problem​


During program execution, copy instructions like v0 := v1 are common, where a
variable is assigned the value of another. The goal of register coalescing is to assign
both variables to the same register whenever possible, eliminating the need for such
copy instructions. The problem asks whether it is possible to compile a program using K
registers while ensuring that no more than M move instructions remain in the final code
after allocation.

These problems are interrelated and, in many cases, computationally challenging. Solving them
efficiently is critical to generating high-performance code. Modern compilers use a mix of
heuristics and algorithmic approaches—often based on graph coloring, linear programming, and
other optimization techniques—to tackle these challenges and deliver code that makes the most
efficient use of the limited registers available.

Is register allocation an important optimization problem?


Register allocation is one of the most critical optimizations in modern compilers, as it has a
profound impact on the performance of the generated code. John Hennessy and David
Patterson emphasize this in their seminal work, Computer Architecture: A Quantitative
Approach, describing register allocation as "one of the most important—if not the most
important—of the optimizations." This statement underscores the central role of register
allocation in achieving efficient, high-performing programs.

Why is Register Allocation So Important?

1.​ Performance Impact of Registers vs. Memory​


Registers are orders of magnitude faster than memory. Accessing a register typically

870
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

takes a single clock cycle, while accessing memory can take tens to hundreds of cycles,
depending on cache performance. By ensuring that frequently accessed variables are
kept in registers, register allocation significantly reduces the time spent on memory
operations, boosting execution speed.​

2.​ Enabling Other Optimizations​


Many compiler optimizations, such as instruction scheduling, loop unrolling, and inlining,
are most effective when the variables involved are stored in registers. If variables are
spilled to memory, the benefits of these optimizations are often diminished or negated
due to the increased cost of accessing spilled variables. Register allocation, therefore,
acts as a foundation for amplifying the effectiveness of other optimizations.​

3.​ Limited Register Resources​


While registers are fast, they are also a scarce resource, especially in architectures with
a small number of general-purpose registers, such as x86-64. This scarcity makes
efficient register allocation a challenging combinatorial problem, where the compiler must
carefully balance the demands of multiple variables to maximize performance while
minimizing spills.​

4.​ Direct Influence on Code Quality​


Poor register allocation leads to excessive spilling, resulting in additional memory
operations that slow down program execution. Conversely, effective register allocation
minimizes memory access, reduces instruction count, and optimizes CPU pipeline
utilization, directly contributing to faster and more compact code.​

5.​ Impact on Power Efficiency​


Beyond performance, register allocation also affects energy consumption. Memory
operations are not only slower but also more power-hungry compared to register
accesses. Efficient use of registers thus plays a role in reducing the energy footprint of
programs, a critical consideration for mobile and embedded systems.

In summary, register allocation is a keystone of compiler design. Its importance stems not only
from the performance gains it directly enables but also from its role in unlocking the potential of
other optimizations. As Hennessy and Patterson argue, register allocation is foundational to the
pursuit of efficient, optimized code—a goal at the heart of every modern compiler.

What are the typical approaches to solve register allocation?


Probably register allocation is the optimization problem in compiler construction that has
inspired the most diverse set of solutions among any other compilation challenge. These
solutions are grounded in different mathematical models and strategies, each with unique
trade-offs between efficiency and optimality. Below are some of the methods used to tackle
register allocation:

871
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1. Linear Scan Register Allocation

This method is particularly well-suited for just-in-time (JIT) compilers and other scenarios where
compilation speed is critical. Linear scan treats register allocation as a coloring problem on
interval graphs, where the intervals represent the live ranges of variables. The algorithm assigns
registers by scanning live ranges linearly, reusing registers whenever possible. While not as
optimal as more complex methods, linear scan is fast and straightforward, making it a popular
choice in practical compilers like LLVM and Java's HotSpot VM.

2. Graph-Coloring-Based Allocation

Graph coloring is one of the most well-known approaches to register allocation. Variables' live
ranges are represented as vertices in an interference graph, with edges connecting variables
whose live ranges overlap. Allocating registers then becomes a graph-coloring problem, where
the goal is to assign colors (registers) to vertices without adjacent vertices sharing the same
color. This approach, pioneered by Chaitin, remains a cornerstone of register allocation,
although it can be computationally expensive due to the NP-hard nature of graph coloring.

3. Integer Linear Programming (ILP) Solutions

ILP-based register allocation formulates the problem as an integer optimization task, where
constraints model the relationships between variables, registers, and memory. The objective is
to minimize cost functions, such as the number of spills or memory accesses. Although ILP
provides optimal solutions, its high computational complexity makes it suitable primarily for
scenarios where compilation time is less critical, such as ahead-of-time (AOT) compilation for
embedded systems.

4. Register Allocation by Puzzle Solving

This approach conceptualizes register allocation as a puzzle. The live ranges of variables are
treated as pieces that must be fit onto a board representing the bank of registers. This method
visualizes and resolves allocation challenges as a series of placement decisions, making it
particularly appealing for solving complex allocation problems that involve registers with multiple
sizes, as in the x86 architecture, where registers such as AH and AL can be combined to form
AX, e.g.:

872
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5. Partitioned Boolean Quadratic Programming (PBQP)

PBQP is a combinatorial optimization technique tailored to register allocation. It models the


problem as a quadratic programming task where the goal is to minimize a cost function
associated with variable assignment. PBQP excels in handling practical constraints like spill
costs, coalescing opportunities, and architecture-specific requirements, and it has been
implemented successfully in compilers like LLVM.

6. Constraint Solving

Constraint-based solutions, such as the one detailed by Roberto Castaneda Lozano in


"Constraint-based Register Allocation and Instruction Scheduling," approach register allocation
by encoding the problem as a set of constraints. These constraints capture relationships
between live ranges, registers, and memory. The solver then finds an assignment that satisfies
all constraints while optimizing performance metrics. This method is particularly versatile and
can integrate seamlessly with instruction scheduling for more holistic optimization.

7. Other Notable Approaches


There are, indeed, many other techniques to solve register allocation. More about them can be
found in publicly available surveys, some of which we briefly mention below:

●​ Heuristic-Based Allocation: Some compilers employ heuristics to approximate optimal


solutions. These methods are often used in combination with other strategies to achieve
a balance between compilation speed and code quality.
●​ Machine Learning Techniques: Emerging approaches use machine learning to predict
optimal register assignments based on features extracted from the program and
architecture.
●​ Simulated Annealing and Genetic Algorithms: These probabilistic methods explore
the search space of register assignments to find near-optimal solutions. While
computationally intensive, they are effective for solving highly constrained allocation
problems.

A Rich Landscape of Solutions

The diversity of approaches to register allocation reflects its complexity and central importance
in compiler design. From theoretical models like graph coloring and ILP to practical methods like
linear scan and heuristic-based algorithms, the choice of approach depends on the specific
requirements of the compilation process, such as speed, code quality, and target architecture
constraints. As research continues, new methods and hybrid solutions will likely emerge, further
enriching the toolbox available to compiler developers.

873
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Is register allocation done before or after SSA elimination?


Most compilers adopt Post-SSA Register Allocation (also called "Traditional Register
Allocation"), performing allocation after SSA elimination, but some explore Pre-SSA Register
Allocation (also called "SSA-Based Register Allocation"), where allocation happens before
eliminating SSA. Each approach has its own advantages, making the choice dependent on the
compiler's design goals and the target architecture's characteristics.

Advantages of Post-SSA Register Allocation

1.​ Extensive Algorithmic Knowledge:​


Post-SSA register allocation benefits from a wealth of established research and
practical experience.
○​ Numerous algorithms for register allocation are described at this level in
textbooks.
○​ Many high-quality implementations are available in open-source projects and
industrial compilers.
○​ Developers can leverage this "folk knowledge" for robust and well-understood
solutions.
2.​ Simpler SSA Elimination:
○​ Performing SSA elimination after register allocation simplifies the elimination
process.
○​ The algorithm does not need to handle the challenges of mapping registers to the
arguments and results of φ functions.
○​ Problems like lost-copy or swap issues (arising when two variables exchange
values during SSA elimination) are easier to manage in this approach.

Advantages of Pre-SSA Register Allocation

1.​ Polynomial-Time Register Assignment:​


In SSA form, the interference graph is chordal, meaning it has no induced cycles of
length greater than three.
○​ This property enables efficient, polynomial-time solutions to the register
assignment problem when all registers are of similar size.
2.​ Shorter Live Ranges:

874
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ SSA form inherently shortens variable live ranges, reducing the overlap between
live ranges in the interference graph.
○​ Shorter live ranges increase the allocator’s flexibility, allowing it to map different
parts of the same variable more effectively.
3.​ Decoupled Allocation and Spilling:
○​ Pre-SSA allocation enables a two-phase strategy:
■​ First, spill variables until the program reaches a state where register
assignment is feasible.
■​ Then, perform register assignment separately, focusing on mapping
variables to physical registers without additional spill concerns.

Choosing Between Pre-SSA and Post-SSA Allocation

The decision between pre-SSA and post-SSA allocation depends on the compiler's design
priorities:

●​ Post-SSA is more common due to its simplicity and the extensive body of knowledge
available.
●​ Pre-SSA is advantageous in scenarios where the interference graph's chordal nature
can be exploited for efficiency or where shorter live ranges are critical.

Both approaches demonstrate the flexibility and adaptability of modern compiler design in
addressing the challenges of register allocation.

You mentioned that register assignment has a polynomial-time solution for


SSA-form programs. How's that possible?
To explain why register assignment for SSA-form programs has a polynomial-time solution, let’s
break it down step by step:

1.​ Defining the Interference Graph:​


The interference graph of a program represents the relationships between variables
based on their live ranges:
○​ Each vertex in the graph corresponds to a variable.
○​ An edge connects two vertices if the corresponding variables have overlapping
live ranges (i.e., they are simultaneously "live").

875
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Chordal Property of SSA Interference Graphs:​


The interference graph of an SSA-form program is chordal.
○​ A chordal graph is one in which there are no induced cycles of length greater
than three.

○​ An induced cycle is a cycle in the graph such that no edges exist between
non-consecutive vertices in the cycle.

876
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ Efficient Coloring of Chordal Graphs:​


Chordal graphs have a key property that simplifies the register assignment problem:
○​ These graphs can be colored using an algorithm with time complexity O(E),
where |E| is the number of edges in the graph.
○​ Graph coloring involves assigning "colors" (representing registers) to vertices
such that no two connected vertices share the same color.
4.​ Connection to Register Assignment:​
The problem of determining whether an SSA-form program can be compiled using K
registers (ie, without spilling) corresponds directly to determining whether the
interference graph can be colored with K colors:

877
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5.​ Polynomial-Time Solution:​


Since graph coloring for chordal graphs has a linear-time solution relative to the number
of edges, determining whether an SSA-form program can fit within K registers is
efficiently solvable in polynomial time.

This efficient solution for SSA-form programs makes it a valuable property in register allocation,
especially when leveraging the simplicity and structure of SSA. Yet, notice that we are talking
only about register assignment! Problems like the minimization of spilling or the maximization of
coalescing remain NP-complete.

But what do you do with the phi-functions? The phi-related variables are
mapped to registers, but we still need to implement these instructions with
actual assembly instructions.

Handling φ-functions during SSA-based register allocation is indeed one of the trickiest parts of
the whole thing. After register allocation, φ-functions need to be translated into actual assembly
instructions, which typically involves implementing parallel copies. Here’s how this is managed:

1.​ Translating φ-functions to Parallel Copies:​


A φ(a, b) function, for example, means that the value of a or b should be assigned to
a target variable, depending on the control flow path. This requires us to generate copy

878
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

instructions that move values to their assigned registers. If the parameters of the
phi-functions are associated with registers, then we must take these locations into
consideration when choosing the right instructions to replace phi-functions:

2.​ Algorithms for Efficient Parallel Copies:​


The process of implementing these parallel copies has been extensively studied. For
example, the paper "SSA Elimination After Register Allocation" discusses algorithms to
convert φ-functions into a sequence of move instructions while preserving correctness
and minimizing additional register usage.​

3.​ Swapping Register Contents Without Temporaries:​


A key optimization is to avoid introducing temporary locations (e.g., in memory) when
swapping the contents of two registers. Many architectures provide hardware support for
this.
○​ Temporary location: parallel copies can be implemented with a temporary
location.
○​ The xor trick: Three xor instructions can swap two register values efficiently
without a temporary:
○​ Special Instructions: Some architectures directly support a swap instruction to
exchange the values of two registers. For instance, the x86 architecture
provides a xchg instruction for this purpose.

879
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ Preserving Register Allocation Results:​


The goal is to implement these copies without requiring additional registers beyond
those assigned during allocation. This ensures the efficiency of the register allocation
process is maintained. If the hardware or algorithm cannot guarantee this, a fallback
might involve introducing temporary locations. In fact, if swaps are not allowed, then the
problem of deciding if a program can be compiled with K registers (without spilling) is NP
complete, as seen in the paper Register Allocation after Classical SSA Elimination is
NP-complete.

In practice, handling φ-functions efficiently after register allocation combines clever algorithmic
design with low-level architectural features to balance correctness and performance:

880
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Can you provide some intuition on why the interference graphs of


SSA-form programs are chordal?
To understand why the interference graphs of SSA-form programs are chordal, we need to
explore the relationship between live ranges and dominance trees in SSA form:

1.​ Intersection Graphs:​


An intersection graph is constructed as follows:
○​ For a set S of sets, each set s∈S corresponds to a vertex in the graph.
○​ An edge exists between two vertices if the corresponding sets intersect, e.g.: s0
∩ s1 ≠ ∅.
○​ The figure below shows an example of an intersection graph:

2.​ Chordal Graphs:​


Another way to define chordal graphs is through the intersection graph of subtrees of
a tree, as demonstrated in the paper "The Intersection Graphs of Subtrees in Trees are
Exactly the Chordal Graphs":
○​ If S is a set of subtrees of a tree, the intersection graph of these subtrees is
chordal.
○​ Conversely, if a graph is chordal, then it can be represented as the intersection
graph of subtrees in some tree.

881
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

3.​ SSA Form and Dominance Trees:​


In SSA form, each variable has a well-defined live range, which can be visualized as a
connected region of program points where the variable is live. Importantly:
○​ The live ranges of variables correspond to subtrees in the dominance tree of the
program.
○​ The dominance tree encodes the control flow hierarchy of the program, ensuring
that live ranges respect dominance relationships.
4.​ Interference Graph of SSA-form Programs:​
The interference graph of an SSA-form program represents overlaps between the live
ranges of variables:
○​ A vertex corresponds to a variable.
○​ An edge exists if two variables have overlapping live ranges.

882
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

5.​ Since the interference graph is the intersection graph of live range subtrees on the
dominance tree, and the intersection graph of subtrees of a tree is chordal, the
interference graph of an SSA-form program is chordal.

But how can we use all that to build a register allocator? You mention that
there are several problems to be solved...
Let's use the register allocation algorithm described in the paper "Register Allocation via the
Coloring of Chordal Graphs". The figure below highlights the key steps of the algorithm.

High-Level View:

This algorithm has been designed to work on chordal interference graphs. It can be applied onto
a non-SSA form program, as long as its interference graph is chordal. The algorithm consists of
two main phases:

1.​ Mapping to memory: the algorithm decides which variables, if any, will be mapped to
memory. This is done by gauging the size of the maximum clique in the interference

883
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

graph. Whenever a variable is mapped to memory, it's removed from the program, and
accesses to it are replaced with loads and stores.
2.​ Mapping to registers: at this point, the program is guaranteed to be K-colorable, and
variables are mapped to "colors" (registers) via a greedy coloring algorithm. Because the
maximum clique in the graph has no more than K registers in this phase, spilling is no
longer necessary.

Here's a breakdown of how the steps contribute to the overall process:

1.​ Build: The Build stage is where the interference graph is created. Imagine each
variable as a node in a graph. If two variables' live ranges overlap (meaning they are
"alive" at the same time and thus cannot occupy the same register), an edge is drawn
between their corresponding nodes. This graph is the core structure for determining
which variables can potentially share registers and which cannot.
2.​ MaxClique: Next, the MaxClique stage does a clever analysis to understand the
maximum register pressure. It does this by:
○​ Using an algorithm called Maximum Cardinality Search to create a special
ordering of the nodes.
○​ Then, using that order to identify the size of the largest clique (a fully connected
sub-graph where every node is connected to every other node) in the graph. This
size represents the maximum number of variables simultaneously "live" and thus,
potentially competing for registers.
3.​ Spill: If the size of the maximum clique found is greater than the number of available
registers (K), then the graph is un-colorable in the number of available registers. The
Spill step kicks in. This is where the algorithm removes nodes from the graph that
prevent the graph from being K-colorable. To "remove" a variable we insert loads/stores,
to map it to memory. This introduces the concept of spilling. The algorithm identifies the
best nodes to spill, and introduces code to save these variables to memory, freeing up a
register. The algorithm continues spilling nodes and re-building the interference graph in
a cycle, until a graph is found where the maximum clique is no greater than K.
4.​ Color: Once the maximum clique size is at or below K, the Color stage begins. Here, the
algorithm greedily assigns registers to variables in the order determined in the
MaxClique step. Since the graph is now chordal and the maximal clique is of size K or
less, the greedy coloring algorithm guarantees a coloring using a maximum of K
registers.
5.​ Coalesce: The Coalesce stage is an optimization step. It tries to merge the nodes of
variables related by copy instructions (like v := u). If v and u are never live at the
same time, they can use the same register. This reduces the number of required
registers.
6.​ SSA Elimination: The SSA Elimination step happens towards the end and transforms
the phi-functions into parallel copy instructions, which can be directly executed in the
machine.

Key Points & Insights:

884
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Chordal Graph Advantage: The algorithm relies heavily on the structure of chordal
graphs because they have the nice property that optimal coloring can be achieved in
polynomial time.
●​ Iterative Approach: The algorithm is iterative due to the spill process. If the initial
interference graph is too dense (more live ranges than registers), the algorithm needs to
remove variables (through spilling), rebuild, and try again.

How is it possible to run spilling separate from register assignment in


chordal graphs?
This is a nice consequence of the fact that the SSA-form programs have chordal interference
graphs. Chordal graphs possess several advantageous characteristics for register allocation:

●​ Efficient Chromatic Number Determination: The chromatic number of a chordal graph


(the minimum number of colors needed to color the vertices such that no adjacent
vertices share the same color) can be determined in polynomial time. As we have
already seen, the chromatic number directly corresponds to the minimum number of
registers required to allocate all variables without spilling.​

●​ Decoupled Approach: This efficient determination of the chromatic number enables a


decoupled approach to register allocation:​

1.​ Spilling: Initially, we can aggressively spill variables to reduce the number of live
variables at any given point (MaxLive) until it aligns with the desired register
budget (K). This effectively reduces the size and complexity of the interference
graph.​

2.​ Register Assignment: Subsequently, we perform register assignment on the


reduced interference graph. Since the graph remains chordal, we can efficiently
determine the register allocation using techniques like graph coloring algorithms,
ensuring optimal register usage within the reduced register budget.

This decoupled approach simplifies the register allocation process by separating the potentially
complex and iterative spilling decisions from the register assignment itself. The main advantage
of this approach is simplicity: we can do register assignment without worrying about mapping
the excess of variables to memory as Florent Bouchez explains in his paper, "A Study of Spilling
and Coalescing in Register Allocation as Two Separate Phases".

How does the build phase work?


The build phase in register allocation for SSA-form programs involves constructing an
interference graph. This graph visually represents the constraints on register assignments.
Here's a breakdown of the process:

885
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Liveness Analysis: The initial step is to perform liveness analysis on the SSA-form
program. Liveness analysis determines which variables are "live" at each program point.
A variable is considered live at a point if its value might be used in the future, as we had
seen in our class about data-flow analyses.​

2.​ Interference Graph Construction:


○​ Traverse the program, examining each program point.
○​ At each point, identify all live variables.
○​ For every pair of live variables at that point, add an undirected edge between
their corresponding nodes in the interference graph. This edge signifies that
these variables cannot be assigned to the same register, as they are both
potentially in use at that point.

In essence, the build phase translates the liveness information into a graphical representation of
register allocation conflicts:

Notice that we can still obtain chordal interference graphs, even if the original program is not in
SSA form. The figure above shows an example of such a situation. Indeed, many real-world
programs have chordal interference graphs, even after SSA elimination.

If the SSA-form simplifies static analyses, does it simplify liveness analysis


too?
It does! Checking if two variables interfere on an SSA-form program can be done very quickly!
Additionally, discovering the program points where variables are alive has also a non-iterative
solution for SSA-form programs. Liveness analysis in SSA-form programs can be efficiently
computed due to the unique properties of the Static Single Assignment (SSA) form.

Key SSA Property:

886
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ Every use of a variable is dominated by its unique definition. This means that if a
variable is used at a certain point, its definition must have occurred earlier in the
program's execution path.

Algorithm:

We can leverage this property to perform a backward traversal of the Control Flow Graph (CFG)
of the SSA-form program.

def live_analysis(program):
"""
Computes liveness information for each statement in the
SSA-form program.

Args:
program: The SSA-form program represented as a data structure
(e.g., a list of statements, where each statement has
'uses' and 'defs' attributes).

Returns:
A dictionary where keys are statements and values are sets of
live variables at the entry of each statement.
"""

live_vars = {}
for statement in program:
live_vars[statement] = set()

# Backward traversal
for statement in reversed(program):
for use_var in statement.uses:
live_vars[statement].add(use_var)
for pred_stmt in statement.predecessors:
# If not defined in predecessor
if use_var not in pred_stmt.defs:
live_vars[pred_stmt].add(use_var)

return live_vars

Explanation:

1.​ Initialization:
○​ Create an empty dictionary live_vars to store the set of live variables at the
entry of each statement.
2.​ Backward Traversal:

887
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ Iterate through the program statements in reverse order (from the end to the
beginning).
3.​ Mark Uses as Live:
○​ For each variable used in the current statement, add it to the live_vars set for
that statement.
4.​ Propagate Liveness:
○​ For each predecessor statement of the current statement:
■​ If the used variable is not defined in the predecessor statement:
■​ Add the used variable to the live_vars set of the predecessor
statement. This ensures that the variable remains live along the
execution path leading to the current statement.

And how do we get the size of the maximum clique in the chordal graph?
In chordal graphs, the size of the maximum clique can be efficiently determined by leveraging
the concept of simplicial vertices and simplicial elimination orderings.

Simplicial Vertex:

●​ A vertex in a graph is simplicial if its neighbors form a complete subgraph (i.e., a


clique). As an example, each yellow node in the figure below is simplicial:

888
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Simplicial Elimination Ordering:

●​ A Simplicial Elimination Ordering of a graph G is a bijection σ: V(G) → {1, …, |V|}, such


that every vertex vi is a simplicial vertex in the subgraph induced by {v1, …, vi}.
●​ An undirected graph without self-loops is chordal if, and only if, it has a simplicial
elimination ordering, as seen in the paper "On Rigid Circuit Graphs".
●​ Continuing with our example, the sequence T7, R1, R2, T1, R5, R4, T8, R6 and T9
forms a simplicial elimination ordering. Consider R2, for instance. The two nodes that
come before, T7 and R1, are both neighbors of R2, and both are connected. Thus, {T7,
R1, R2} forms a clique.

889
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Finding the Maximum Clique Size:

1.​ Determine a Simplicial Elimination Ordering:


○​ One efficient algorithm to find a simplicial elimination ordering is Maximum
Cardinality Search (MCS).

def maximum_cardinality_search(graph):
"""
Computes a simplicial elimination ordering using
Maximum Cardinality Search.

Args:
graph: The input graph represented as an adjacency
list or matrix.

Returns:
A list representing the simplicial elimination ordering.
"""
n = len(graph)
labels = [0] * n
ordering = []

for _ in range(n):
max_label_vertex = max(range(n), key=lambda v: labels[v])
ordering.append(max_label_vertex)
for neighbor in graph[max_label_vertex]:
labels[neighbor] += 1
del graph[max_label_vertex]

return ordering

2.​ Calculate Maximum Clique Size:


○​ During the MCS, keep track of the maximum label assigned to any vertex.

890
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ The maximum clique size in the chordal graph is equal to this maximum
label plus 1!

Rationale:

●​ The MCS algorithm assigns labels to vertices based on the number of already-ordered
neighbors.
●​ In a chordal graph, the maximum label assigned to a vertex during MCS represents the
size of the maximum clique that includes that vertex.

The sequence of figures below show how maximum cardinality search will find a simplicial
elimination ordering for the interference graph of our running example:

891
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

In this example, the simplicial elimination ordering will be T7, R1, R2, T1, R5, R4, T8, R6 and
T9, as we had seen before. If we remove the nodes in the reverse simplicial ordering, then
whenever we remove a node, all its neighbors (in the remaining graph) form a clique

892
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Ok, I can find out the size of the largest clique of a chordal interference
graph in polynomial time. But how does that help me with spilling?
Sebastian Hack has a cute theorem in his PhD Dissertation that helps with that!

1.​ The Clique Theorem in SSA Interference Graphs:​


As demonstrated in Sebastian Hack's PhD thesis (Page 42), for an SSA-form program P
with interference graph G=(V,E):
○​ For any clique C={c1,c2,…,cn} in G, there exists a program point L where all the
variables represented by the nodes ci are simultaneously live.
○​ The size of the maximum clique in G corresponds to the minimum number of
registers required to compile P without spilling.
2.​ Using the Clique Size to Manage Register Pressure:​
If the largest clique in the graph has more than K nodes (where K is the number of
available registers), spilling is necessary to reduce register pressure. Remember: in the
chordal graph, identifying cliques with more than K nodes is a trivial problem: it suffices
to find the largest counters created by the maximum cardinality search. In our example,
we would have a clique of size four:

3.​ Challenges in Spilling:​


While finding the size of the largest clique is polynomial-time for chordal graphs,
determining the minimal set of variables to spill to make the graph K-colorable is an
NP-complete problem, even for chordal graphs. This is known as the "Maximum
K-Colorable Subgraph Problem for Chordal Graphs."
○​ In practical terms, this means:
■​ We can efficiently determine if spilling is needed (by comparing MaxLive
with K).
■​ However, choosing which variables to spill to reduce the graph’s
chromatic number is computationally hard. We need to resort to
heuristics.

893
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

4.​ Iterative Spilling and Graph Reconstruction:​


Because of this complexity, register allocation typically involves an iterative process:
○​ Build the interference graph.
○​ Analyze live ranges and cliques to assess register pressure.
○​ Perform spilling to reduce pressure.
○​ Rebuild the interference graph with reduced live ranges and repeat until
MaxLive≤K:

By using the size of the largest clique as a guide, we can strategically reduce register pressure,
enabling efficient register allocation while balancing computational constraints.

But if we end up having to spill, which nodes do we choose?


When deciding which nodes to spill in a chordal interference graph, the goal is to minimize the
"damage" caused by spilling a variable (i.e., mapping it to memory). To achieve this, we
compute the spill cost of each variable using heuristics. The spill cost estimates the expense
of spilling a variable based on its usage in the program.

Spill Cost Formula

The following formula, adapted from Andrew Appel's book Modern Compiler Implementation in
Java, provides a structured way to compute spill costs:

def Spill_Cost(v, interference_graph):


cost = 0
# Iterate over all blocks where 'v' is defined/used
for block in v.blocks:
# Number of defs and uses in the block
SB = block.num_defs_and_uses
# Loop nesting factor of the block
N = block.loop_nesting_factor
# Degree of 'v' in the interference graph
D = interference_graph.degree(v)
cost += (SB * 10**N) / D
return cost

894
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

●​ SB: Number of uses and definitions in a given block BB.


●​ N: Loop nesting factor of BB; deeper loops imply higher execution frequency.
●​ D: Degree of the variable vv in the interference graph; higher degrees mean more
overlapping live ranges.

Variables with lower spill costs are prioritized for spilling, as they are expected to cause less
runtime overhead. The figure below shows how this formula could be applied to our running
example:

Spilling and Code Transformation

Once a variable is chosen for spilling:

1.​ Insert Loads and Stores:


○​ Replace the variable’s usage with memory loads and definitions with memory
stores.
○​ Ensure the semantics of the program remain consistent.
2.​ Rebuild the Interference Graph:
○​ With live ranges altered by the newly inserted loads and stores, the interference
graph changes.
3.​ Restart Register Allocation:
○​ Check the size of the largest clique in the updated graph.
○​ Spill more variables if necessary, and repeat the process until the graph is
K-colorable. The figure below shows this process of spilling and rebuilding the
interference graph of our running example:

895
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Iterative Process

Spilling is part of an iterative process:

●​ Compute spill costs for all variables.


●​ Select the variable(s) with the lowest spill cost for spilling.
●​ Transform the program with appropriate loads and stores.
●​ Rebuild the interference graph and check if further spilling is needed.

By systematically choosing spill candidates with minimal impact, we ensure efficient register
allocation while minimizing the runtime cost of spilling.

896
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Once we are done with spilling, how do we assign registers to the


variables?
The beauty of the decoupled approach is that once we lower the size of the largest clique to the
number of available registers, we know we don't need to worry about spilling anymore!

To assign registers to variables, we can use greedy coloring! Here is a mock Python
implementation of the greedy coloring algorithm:

def Greedy_Coloring(G, S):


"""
Assign colors to vertices in a graph G using a sequence S.
Parameters:
G: tuple (V, E), where V is the set of vertices and
E is the set of edges
S: list, sequence of vertices in V
Returns:
m: dictionary mapping vertices to colors
"""
V, E = G
# Initialize all vertices with no color (None)
m = {v: None for v in V}

for v in S:
# Find colors used by neighbors of v
neighbor_colors = {m[neighbor] for neighbor in V \
if (v, neighbor) in E or (neighbor, v) in E}
# Assign the lowest available color
m[v] = next(c for c in range(len(V)) \
if c not in neighbor_colors)

return m

Why Greedy Coloring Works Optimally for Chordal Graphs

897
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

For chordal graphs, if we follow a Simplicial Elimination Ordering (SEO), greedy coloring
yields an optimal coloring.

●​ Simplicial Node: A node whose neighbors form a clique.


●​ SEO: An ordering of the graph's vertices such that each vertex, when removed along
with its incident edges, leaves a subgraph where its remaining neighbors form a clique.

Key Insights for Chordal Graphs

1.​ Clique Properties:


○​ At any point in the SEO, the neighbors of a node v already colored form a clique.
○​ All nodes in a clique must receive different colors.
○​ Therefore, if v has M already-colored neighbors, it will be assigned color M+1.
2.​ Chromatic Number:
○​ For chordal graphs, the chromatic number (minimum number of colors needed)
equals the size of the largest clique.

Applying Greedy Coloring to Register Allocation

After spilling, the interference graph is guaranteed to be K-colorable, where K is the number of
available registers. Using greedy coloring:

●​ Each color corresponds to a register.


●​ Variables are assigned registers such that no two variables interfering with each other
share the same register.

This process ensures a valid and efficient register assignment for the program without additional
spilling. Thus, notice that it's vital to reduce the size of the largest clique to at most K, the
number of registers in the target architecture. For instance, if we get back to the interference
graph of our original example, you will see that we will need at least four different colors:

898
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

However, if we try the same greedy coloring algorithm on the graph that is produced after the
spilling phase (assuming three registers), then we are guaranteed to find a coloring with at most
three colors, e.g.:

899
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

How can we implement coalescing in this register allocation algorithm?


Coalescing in this register allocation algorithm aims to eliminate unnecessary copy instructions
by assigning the same register to the source and destination of a copy. This optimization is
achieved while ensuring that the interference graph remains K-colorable, where K is the number
of available registers.

Post-Assignment Coalescing

We adopt a best-effort coalescing strategy, as described in "Register Allocation via the


Coloring of Chordal Graphs." This method leverages the flexibility of unused colors in local
neighborhoods of the interference graph to coalesce copy instructions.

The algorithm modifies the interference graph GG and attempts to merge nodes corresponding
to variables involved in copy instructions while maintaining colorability.

Algorithm: Best Effort Coalescing

Below is the algorithm in mock Python:

def Best_Effort_Coalescing(L, G, K):


"""
Perform best-effort coalescing on the interference graph.
Parameters:
L: list of copy instructions (e.g., ["x = y", ...])
G: tuple (V, E), where V is the set of variables and E is the
set of edges
K: number of available colors (registers)
Returns:
G_prime: the coalesced interference graph

900
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

"""
V, E = G
# Start with a copy of G
G_prime = (V.copy(), E.copy())

for copy in L:
# Extract variables from "x = y"
x, y = parse_copy_instruction(copy)

# Sets of colors used in the neighborhoods of x and y


Sx = {color_of(v, G_prime) for v in neighbors(x, G_prime)}
Sy = {color_of(v, G_prime) for v in neighbors(y, G_prime)}

# Find an unused color in the neighborhood of x and y


for c in range(K):
if c not in Sx.union(Sy):
# Merge nodes x and y into a new node xy with color c
xy = f"{x}_{y}" # Name the new node

# Add xy to the graph


G_prime[0].add(xy)
G_prime[1].update({(xy, v) for v in \
neighbors(x, G_prime).union(neighbors(y,
G_prime))})

# Replace x and y in copy instructions with xy


L = [instr.replace(x, xy).replace(y, xy) \
for instr in L]

# Remove x and y from the graph


G_prime[0].discard(x)
G_prime[0].discard(y)
G_prime[1] = {(u, v) for u, v in G_prime[1] \
if u != x and u != y and v != x and v != y}

break # Move to the next copy instruction

return G_prime

Explanation of the Algorithm

1.​ Input: The list of copy instructions L, the interference graph G, and the number of
registers K.

901
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2.​ Finding Unused Colors: For each copy instruction x=y, compute the colors used by the
neighbors of x and y in the graph. If there exists a color c that is unused in these
neighborhoods, proceed with coalescing.
3.​ Coalescing:
○​ Create a new node xy to represent the merged variables.
○​ Assign xy a color cc and update the graph to reflect this merge.
○​ Replace occurrences of x and y in the copy instructions with xy.
○​ Remove x and y from the graph.
4.​ Repeat: Process all copy instructions in L.

The figure below shows the effect of eliminating copies in the program after register assignment:

902
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Benefits and Trade-offs

●​ Benefits: This approach is simple and efficient. It improves runtime performance by


reducing the number of copy instructions executed.
●​ Limitations: Unlike more complex methods (e.g., Hack and Goos's graph recoloring
algorithm), best-effort coalescing does not propagate changes throughout the graph,
potentially missing further coalescing opportunities.

We are implementing coalescing in a way that will never cause spilling.


Does that mean that avoiding spilling is more important than removing copy
instructions?
Yes, prioritizing the avoidance of spilling over removing copy instructions reflects a deliberate
trade-off in the coalescing strategy. Here's why this decision is significant:

1.​ Performance Impact of Spilling:


○​ Spilling introduces memory accesses (loads and stores) to manage variables that
cannot fit into registers. These accesses are much more expensive than
register-to-register operations.
○​ Excessive spilling can degrade runtime performance significantly, particularly in
loops or frequently executed code.
2.​ Guaranteeing K-Colorability:
○​ The coalescing algorithm must ensure the interference graph remains
KK-colorable, where KK is the number of available registers.
○​ If coalescing causes the graph to exceed KK colors, it may force additional spills,
undermining the purpose of register allocation.
3.​ Minimal Impact of Copy Instructions:
○​ Copy instructions (e.g., x=y) are cheaper than memory operations since they
involve only registers.
○​ While reducing copy instructions improves performance, the relative cost savings
are much smaller compared to avoiding spills.

Balancing the Trade-Off:

The strategy of best-effort coalescing reflects this balance:

●​ It opportunistically removes copy instructions only when it is safe to do so (i.e., without


risking spills).
●​ It avoids propagating changes that might destabilize the interference graph or require
further modifications.

Conclusion:

903
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Yes, avoiding spilling is considered more important than removing copy instructions in many
different academic works, such as Bouchez' "On the Complexity of Register Coalescing". This
approach ensures stable and predictable performance by minimizing expensive memory
accesses while still trying to reduce copy instructions when feasible. It reflects a practical
compromise between aggressive optimization (copy removal) and maintaining efficient register
allocation.

But, can't spilling happen during the SSA elimination phase? What if the
arguments of phi-functions end up mapped to different stack slots?
It depends on the program's representation! This issue is discussed in detail in the paper "SSA
Elimination after Register Allocation". Here's the breakdown:

1.​ Non-CSSA Programs:


○​ If the program is not in CSSA form, spilling may be necessary during SSA
elimination.
○​ Consider the example in the figure below, where φ-related variables like a and a2
have overlapping live ranges. These variables must be assigned different stack
slots during spilling, which complicates SSA elimination.

904
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ If the target architecture does not support direct memory-to-memory transfers,


temporary registers are required to implement these copies. If all registers are
already occupied, additional spilling becomes necessary to free a temporary
register. If necessary to avoid the additional spilling, one can use a stack slot as
the scratch location, a bit like CraneLift does.
2.​ Conventional Static Single Assignment (CSSA) Form:
○​ If the program is in CSSA form, spilling during SSA elimination is unlikely.
○​ In CSSA, the arguments of a φ-function can either be mapped to different
registers or to the same memory slot. For example, as shown in the figure below,,

905
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

splitting the live range of a variable (e.g., inserting a copy like a3 = a2) ensures
that φ-related variables are handled cleanly.

○​ The result is that all variables related to a φ-function can be naturally assigned
distinct slots, and no extra memory transfer instructions are needed during SSA
elimination.

Why Spilling May Occur in Non-CSSA Programs

906
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

1.​ Overlapping Live Ranges:​


Variables with overlapping live ranges may require different stack slots, necessitating
additional moves.
2.​ Temporary Registers for Memory Transfers:​
Architectures that lack direct memory transfer instructions may require temporary
registers to implement copies. If no registers are free, spilling is unavoidable.

Benefits of CSSA for SSA Elimination

CSSA simplifies SSA elimination by ensuring that:

●​ Arguments of φ-functions are naturally assigned the same stack slot if they are ever
spilled.
●​ Memory transfers can thus be entirely avoided, reducing the need for further spilling.

In the end, by converting a program to CSSA form before spilling, the complexity and overhead
associated with SSA elimination can be greatly reduced.

What are these stack slots you mentioned? I mean: where are spilled
variables stored?
Spilled variables are mapped to stack memory. Spilling occurs when there are not enough
registers to hold all the live variables at a given point in the program. The compiler generates
additional load and store instructions to move the spilled variable’s value between the stack and
registers as needed.

1. Can a local variable (within a function's scope) be mapped to the stack or


stay in a register?

Both mappings are possible! This is determined by the optimization level and by the number of
available registers:

●​ At -O0 (no optimization):


○​ Compilers like GCC and Clang prioritize simplicity and debugging over
performance.
○​ All local variables are typically mapped to the stack, even if they are never
spilled. This ensures predictable memory locations, making debugging easier.
●​ At -O1 and higher:
○​ The compiler performs optimizations, including register allocation.
○​ Frequently used variables or those with short live ranges are kept in registers,
while less critical variables may remain in stack memory or be spilled during
register pressure. The figure below shows a snapshot of a typical stack on
x86-32:

907
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

2. What about global variables?

Global variables are not spilled to the stack. Instead, they are stored in static memory, which is
allocated in the program’s data segment. Here’s why:

●​ Global variables have a lifetime that spans the entire execution of the program.
●​ They are accessible from anywhere in the program, so they are placed in a fixed
memory location (e.g., .data or .bss segments).

Finding a spill location for a global variable is unnecessary because the memory location of
global variables is already predetermined. We can still operate with them in register, e.g., bring
them to registers and keep them in registers for as long as possible. But ultimately, they need to
be stored back into their original memory location. In other words, any affectation on the global
variable must change the state of the static memory that holds it.

4. What about heap-allocated variables?

Heap-allocated variables are managed differently:

●​ These variables are explicitly allocated and freed by the programmer (e.g., using
malloc/free in C or new/delete in C++).
●​ Their values reside in the heap, and only pointers to them are usually manipulated in
registers.

It doesn’t make sense to talk about spilling a heap-allocated variable itself because spilling
refers to moving a register-held value to memory. However, the pointer to a heap-allocated
variable could be spilled if it is stored in a register and register pressure is too high.

Summary

908
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

If we assume that we are using a compiler like gcc or clang, than that's what happens:

1.​ Local Variables:


○​ At -O0: Mapped to the stack for simplicity.
○​ At -O1 and higher: May stay in registers unless spilled to the stack.
2.​ Global Variables: Stored in static memory, never spilled.
3.​ Heap-Allocated Variables: Reside in the heap; only pointers to them may be spilled to
the stack.

Spilling is a concept closely tied to register allocation and applies to values temporarily held in
registers, not to variables that are permanently in memory (like globals or heap-allocated
variables).

Are There Real-World Register Allocators That Run on SSA-form


Programs?

While most real-world register allocators operate on post-SSA programs—where φ-functions


are replaced by actual assembly instructions (e.g., copies and swaps)—there are notable
examples of allocators that work directly on SSA-form programs. One such example is the Go
compiler's register allocator. There is a nice description of the Go Register Allocator in this
online article by Vladimir Makarov, who is one of the maintainers of the GCC Register Allocator.

The Go Compiler’s Register Allocator

The Go compiler performs register allocation directly on SSA-form programs. The


implementation can be observed in the regalloc.go file. However, the algorithm used differs from
the chordal graph-based methods described in these lecture notes. Instead, the Go compiler
employs an approach that resembles linear-scan register allocation with second-chance
bin-packing, as described by Traub et al. in the paper "Quality and Speed in Linear-Scan
Register Allocation". The key distinction is that Traub’s algorithm works on non-SSA programs,
while Go’s allocator operates directly in SSA form.

Handling φ-functions in the Go Compiler

The Go compiler handles φ-functions similarly to the approach discussed in these lecture notes:

●​ It ensures that all operands of a φ-function can be mapped to the same physical location,
whether in a register or memory.
●​ If the program is not in Conventional Static Single Assignment (CSSA) form, the Go
compiler splits the live ranges of φ-related variables to resolve potential conflicts.

Spill Management and Stack Allocation

909
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

For spilled variables, the Go compiler uses the stackalloc.go file to assign stack slots. A few
notable aspects include:

●​ Shared Stack Slots: The Go compiler optimizes stack usage by allowing multiple SSA
values to share the same stack slot, provided their lifetimes do not overlap.
●​ Interference Graph for Optimization: To minimize stack allocation costs, the Go
compiler leverages the interference graph of SSA variables, ensuring that stack slots are
reused efficiently.

Further Reading

For a detailed explanation of the Go compiler's register allocation process, Vladimir Makarov’s
article offers an excellent overview. Makarov highlights the unique aspects of Go’s SSA-based
allocation strategy and its practical implications.

How Did People Start Doing Register Allocation on SSA-form Programs?


The history of register allocation on SSA-form programs is a fascinating interplay between
theory and practice, with critical contributions over several decades.

1. Chaitin’s Foundational Work (1981)

In 1981, Gregory Chaitin introduced the idea of modeling register allocation as a graph
coloring problem in his seminal paper, Register Allocation via Coloring.

●​ He demonstrated that the general register allocation problem is NP-complete, setting


the stage for decades of research into approximations and optimizations.

2. Observations on Real-world Interference Graphs (2003–2005)

●​ Christian Andersson (2003):​


Andersson observed that many interference graphs from real-world programs are
Perfect Graphs. Perfect graphs are significant because their chromatic number equals
the size of their largest clique, simplifying the graph coloring process. He published his
results in the paper "Register Allocation by Optimal Graph Coloring".​

●​ Fernando Pereira and Jens Palsberg (2005):​


In their paper, Register Allocation via the Coloring of Chordal Graphs, Pereira and
Palsberg noted that most interference graphs of real-life programs tend to be Chordal
Graphs.
○​ For instance, 95% of methods in the Java 1.5 library had chordal interference
graphs when compiled with the JoeQ compiler.

910
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

○​ However, at the time, they did not yet show that SSA-form programs inherently
produce chordal interference graphs.

3. The Breakthrough: SSA Interference Graphs Are Chordal (2005–2006)

●​ Philip Brisk et al. (2005):​


In their paper, Polynomial-time Graph Coloring Register Allocation, Brisk and his
collaborators proved that strict SSA-form programs have Perfect Interference
Graphs.
●​ Sebastian Hack’s Contributions:
○​ Hack made the critical breakthrough in understanding the structure of SSA
interference graphs. In his technical report Interference Graphs of Programs in
SSA-form (2005), he proved that strict SSA-form programs have Chordal
Interference Graphs, a stronger result than Brisk’s.
○​ A strict SSA-form program ensures that every path from the program’s start to
any use of a variable passes through a definition of that variable.
○​ Hack’s results were published in 2006 in the paper Register Allocation for
Programs in SSA-form, and later formalized in Optimal Register Allocation for
SSA-form Programs in Polynomial Time.
●​ Florent Bouchez’s Independent Discovery:​
Around the same time, Florent Bouchez and his collaborators independently proved that
SSA interference graphs are chordal in his technical report, Register Allocation and Spill
Complexity under SSA.
○​ Bouchez’s team also explored related complexities, such as coalescing and
spilling, publishing insights in papers like Register Allocation: What Does the
NP-Completeness Proof of Chaitin et al. Really Prove?. The fact is that G.

911
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Chaitin's proof of NP-completeness for register allocation only works for general
programs: it would not work for SSA-form programs.

Significance of These Discoveries

1.​ Chordal Graphs and Efficient Coloring:​


Chordal graphs can be optimally colored in polynomial time, using a simplicial
elimination ordering. This discovery meant that register allocation for SSA-form
programs, previously considered NP-complete in general, could be performed efficiently.​

2.​ Impact on Modern Compilers:​


These insights laid the foundation for many modern register allocation techniques,
significantly improving compilation times and reducing spill costs.

By connecting theoretical breakthroughs with practical observations, the history of SSA-based


register allocation highlights the deep synergy between compiler design and graph theory.
Would you like me to expand on any specific part or add more details?

So, it all started with the paper of Gregory Chaitin, in 1981?


Well, in a way. That was one of the first works to connect register allocation with graph coloring.
And after that paper, there was a flurry of research and algorithms on register allocation. In this
sense, Chaitin could be considered the father of graph coloring-based register allocation. Here’s
a short bio and summary of Gregory Chaitin’s contributions to computer science (and science in
general):

Gregory Chaitin is a pioneering figure in both theoretical computer science and mathematics.
Born in 1947 in the United States, Chaitin demonstrated exceptional talent in mathematics and
computer science from a young age. He is widely recognized for his groundbreaking work in
algorithmic information theory (AIT), a field that blends computation with information theory
to explore the limits of what can be computed and described. Over the years, Chaitin has made
contributions to both theoretical and practical aspects of computer science.

912
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

Chaitin spent much of his career at IBM Research, where his ideas influenced numerous areas
of computer science. In addition to his technical achievements, he is a prolific author and
engaging speaker, bringing abstract concepts to broader audiences through books and lectures.

Contributions to Science

1.​ Algorithmic Information Theory (AIT):


○​ Chaitin’s work in AIT builds on ideas from Alan Turing and Claude Shannon. He
introduced the concept of Chaitin’s constant (Ω), a real number representing
the probability that a randomly generated program will halt.
2.​ Register Allocation via Graph Coloring:
○​ In his seminal 1981 paper, Register Allocation via Coloring, Chaitin introduced
the idea of modeling the register allocation problem in compilers as a graph
coloring problem.
○​ His approach demonstrated how interference between variables in a program
could be represented as a graph, with variables as nodes and edges
representing conflicts. Many real-world compilers currently follow his approach.
For an industrial-quality example, take a look into the implementation of the
interference graph, used in the HotSpot compiler. A relatively simpler version of
Chaitin's allocator can also be found in the graph-coloring-based register
allocator used in the MIR compiler.
○​ Chaitin showed that the general register allocation problem is NP-complete, but
his algorithm provided practical solutions that have been foundational to modern
compiler design.
3.​ Philosophy and Metamathematics:
○​ Chaitin has extended his work beyond computation into the philosophy of
mathematics, addressing fundamental questions about the nature of
mathematical truth and the incompleteness of formal systems. His books, such

913
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

as Meta Math! The Quest for Omega, make these ideas accessible to broader
audiences.

Legacy

Gregory Chaitin’s work has left an indelible mark on both theoretical computer science and
practical software engineering. His contributions to register allocation have shaped compiler
technology, making programs run faster and more efficiently. Simultaneously, his works in AIT
have deepened our understanding of computation, randomness, and mathematical truth,
inspiring many researchers.

914
This material is freely available and must not be sold or used for commercial purposes.
For typos or suggestions, please contact Fernando Pereira at [email protected].

915

You might also like